SlideShare una empresa de Scribd logo
1 de 55
Querying Nested
JSON Data Using
N1QL and
Couchbase
TriNUG Data SIG
6/6/2018
Who is this guy?
• Brant Burnett - @btburnett3
• Systems Architect at CenterEdge Software
• .NET since 1.0, SQL Server since 7.0
• MCSD, MCDBA
• Experience from desktop apps to large
scale cloud services
NoSQL Credentials
• Couchbase user since 2012 (v1.8)
• Couchbase Community Expert
• Open source contributions:
• Couchbase .NET SDK
• Couchbase.Extensions for .NET Core
• Couchbase LINQ provider (Linq2Couchbase)
• CouchbaseFakeIt
• couchbase-index-manager
Content
Attributions
• Matthew Groves
Couchbase Developer Advocate
@mgroves
crosscuttingconcerns.com
What is Couchbase
• NoSQL document database
• Get and set documents by key
• Imagine a giant folder full of JSON files
• If you know the filename, you can get or
update the content
• Additional features:
• Query using N1QL (SQL-based)
• Map-Reduce Views
• Full Text Search
• Analytics (Preview in 5.5)
• Eventing (5.5)
• Couchbase is not CouchDB
Why Couchbase
• Scalability
• Availability
• Performance
• Agility
Agenda
Introduction to N1QL
Working with JSON Types
Joins In N1QL
Indexing in Couchbase
Query Optimization
Introduction to N1QL Pronounced “nickel”
What’s a Bucket?
• Large collection of JSON
documents
• Every document may have a
different schema
• Documents are accessed by a
string called the key
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"Name": "Jane Smith",
"DOB": "1990-01-30",
"type": "customer"
}
Document Key: customer-CBL2015
So how do I query data from a bucket?
This Photo by Unknown Author is licensed under CC BY-SA
{
"Name": "John Smith",
"DOB": "1990-06-29",
"type": "customer"
}
Document Key: customer-CBL2016
SELECT Name, DOB FROM Bucket
WHERE type = 'customer' AND Name LIKE '%Smith'
ORDER BY Name
[{
"Name": "Jane Smith",
"DOB": "1990-01-30"
},
{
"Name": "John Smith",
"DOB": "1990-06-29"
}]
{
"Name": "Jane Smith",
"DOB": "1990-01-30",
"type": "customer"
}
Document Key: customer-CBL2015
What other SQL features are
supported?
• Aggregation (MIN, MAX, SUM, AVG, COUNT, etc)
• GROUP BY/HAVING
• OFFSET/LIMIT
• Subqueries
• UNION/INTERSECT/EXCEPT
• Joins (more details to come…)
• UPDATE/INSERT/DELETE/UPSERT
Accessing Nested Objects
Key: airport_3484
{
"airportname": "Los Angeles Intl",
"city": "Los Angeles",
"country": "United States",
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
"tz": "America/Los_Angeles"
}
SELECT *
FROM `travel-sample`
WHERE type = 'airport' AND geo.alt < 1000
SELECT *
FROM `travel-sample`
WHERE type = 'route'
AND schedules[0].day = 1
O backtick, backtick!
Wherefore art thou a
backtick?
• ANSI SQL delimits identifiers with double quotes
• SELECT * FROM "table-name"
• T-SQL also delimits identifiers with square
brackets
• SELECT * FROM [table-name]
• Both of these are used in JSON!
• {"array": ["string1", "string2"]}
• So, N1QL uses the backtick instead
• SELECT * FROM `bucket-name`
This Photo by Unknown Author is licensed under CC BY-NC-ND
Working with
JSON Types
Strings
Supported by JSON
Collation is always
case sensitive
1
Literals are delimited with
either double or single
quotes
x = 'my string here’
x = "my string here"
2
Various supporting
functions
•String concatenation (||)
•LENGTH
•LOWER
•CONTAINS
•TRIM, etc…
3
Numbers
Supported by JSON
1
Literals are included
without delimiters
x = 123456.05124
2
Various supporting
functions
• Arithmetic operators
• ABS
• CEIL
• SQRT
• TRUNC, etc...
3
Booleans
Supported by JSON
1
Literals are true and
false
x = true
2
Various supporting
operators
• NOT
• AND
• OR
3
Arrays
Supported by JSON
1
Literals are comma
delimited and surrounded
by square brackets
[1, 2, 3, "a"]
2
Various supporting
functions
• subqueries
• ARRAY_CONTAINS
• ARRAY_AVG
• ARRAY_INSERT
• ARRAY_LENGTH, etc…
3
Objects
Supported by JSON
1
Literals are comma
delimited key/value pairs
surrounded by curly
braces
{"key": "value"}
2
Various supporting
functions
• OBJECT _NAMES
• OBJECT_PAIRS
• OBJECT_VALUES, etc…
3
Nulls
Supported by JSON
1
Literal is the word null,
no delimiters
{"key": null}
2
Various supporting
operators and
functions
• IS NULL
• IS NOT NULL
• IFNULL, etc…
3
Missing attributes
Supported by JSON
Can’t be explicitly
declared
Similar to undefined in
Javascript
1
No literal, simply don’t
include an attribute in
an object
{}
2
Various supporting
operators and functions
• IS MISSING
• IS NOT MISSING
• IFMISSING
• IFMISSINGORNULL, etc…
3
Date/times
Not officially supported by
JSON
Can be stored using other data
types
Usually either ISO8601 string
or number of milliseconds
since the Unix epoch
1
Literal depends on the data
type
"2018-04-06T19:26:29.000Z"
1528140389000
2
Various supporting functions
• STR_TO_MILLIS
• CLOCK_MILLIS, ClOCK_STR
• DATE_PART_STR, DATE_PART_MILLIS
• DATE_DIFF_STR, DATE_DIFF_MILLIS
• etc…
3
Joins In N1QL
Key: route_10000
{
"airline": "AF",
"airlineid": "airline_137",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
}
],
"sourceairport": "TLV",
"stops": 0,
"type": "route"
}
Referenced 1:N Relationship
Key: airline_137
{
"callsign": "AIRFRANS",
"country": "France",
"iata": "AF",
"icao": "AFR",
"id": 137,
"name": "Air France",
"type": "airline"
}
Joining by Primary Key
SELECT route.sourceairport, route.destinationairport, airline.name
FROM `travel-sample` AS route
INNER JOIN `travel-sample` AS airline
ON route.airlineid = META(airline).id
WHERE route.type = 'route'
ORDER BY route.sourceairport, route.destinationairport, airline.name
Key: route_10000
{
"airline": "AF",
"airlineid": "airline_137",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
}
],
"sourceairport": "TLV",
"stops": 0,
"type": "route"
}
Referenced 1:N Relationship
Key: airline_137
{
"callsign": "AIRFRANS",
"country": "France",
"iata": "AF",
"icao": "AFR",
"id": 137,
"name": "Air France",
"type": "airline"
}
Joining by Attributes
SELECT route.sourceairport, route.destinationairport, airline.name
FROM `travel-sample` AS route
INNER JOIN `travel-sample` AS airline
ON route.airline = airline.iata AND airline.type = 'airline'
WHERE route.type = 'route'
ORDER BY route.sourceairport, route.destinationairport, airline.name
Key: route_10000
{
"airline": "AF",
"airlineid": "airline_137",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
}
],
"sourceairport": "TLV",
"stops": 0,
"type": "route"
}
Embedded 1:N Relationship
Flattening Embedded Lists
SELECT route.sourceairport, route.destinationairport, schedule.utc
FROM `travel-sample` AS route
UNNEST route.schedule AS schedule
WHERE route.type = 'route' AND schedule.day = 0
ORDER BY route.sourceairport, route.destinationairport, schedule.utc
My data’s not flat, why are my queries?
This Photo by Unknown Author is licensed under CC BY-SA
Key: route_50490
{
"airline": "SQ",
"airlineid": "airline_4435",
"destinationairport": "ORD",
"distance": 2802.1171926467396,
"equipment": "320",
"id": 50490,
"schedule": [{
"day": 0,
"flight": "SQ279",
"utc": "15:13:00"
}, {
"day": 0,
"flight": "SQ835",
"utc": "21:10:00"
}],
"sourceairport": "LAX",
"stops": 0,
"type": "route"
}
Referenced 1:N Relationship
Key: airport_3484
{
"airportname": "Los Angeles Intl",
"city": "Los Angeles",
"country": "United States",
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
"tz": "America/Los_Angeles"
}
Nesting (a.k.a. LINQ GroupJoin)
SELECT
airport.*,
(SELECT RAW r2.destinationairport FROM routes AS r2) AS destinations
FROM `travel-sample` AS airport
INNER NEST `travel-sample` AS routes
ON airport.faa = routes.sourceairport AND routes.type = 'route'
WHERE airport.type = 'airport'
AND airport.airportname LIKE 'Los Angeles%'
Nesting (a.k.a. LINQ GroupJoin)
{
"airportname": "Los Angeles Intl",
"city": "Los Angeles",
"country": "United States",
"destinations": ["PHX", "SEA", "MCO", "ATL", "SYD", "YYZ", "LIM", "LHR", "IND", "CLE", "..."],
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
"tz": "America/Los_Angeles"
}
Indexing in
Couchbase
Global Secondary Indexes
a.k.a. GSI
The Primary Index
CREATE PRIMARY INDEX ON bucket SELECT * FROM bucket
Single Attribute Index
CREATE INDEX docsByName
ON bucket (name)
SELECT * FROM bucket
WHERE name LIKE 'A%'
SELECT * FROM bucket
WHERE name >= 'A' AND name < 'N'
Multiple Attribute Index
CREATE INDEX docsByNames ON bucket
(lastName, firstName)
SELECT * FROM bucket
WHERE lastName LIKE 'A%'
SELECT * FROM bucket
WHERE lastName = 'Burnett'
AND firstName LIKE ‘B%'
Expression Index
CREATE INDEX docsByName ON bucket
(LOWER(lastName), LOWER(firstName))
SELECT * FROM bucket
WHERE LOWER(lastName) LIKE 'a%'
SELECT * FROM bucket
WHERE LOWER(lastName) = 'burnett'
AND LOWER(firstName) LIKE 'b%'
Filtered Index
CREATE INDEX custsByName ON bucket
(LOWER(lastName), LOWER(firstName))
WHERE type = 'customer'
SELECT * FROM bucket
WHERE LOWER(lastName) LIKE 'a%'
AND type = 'customer'
SELECT * FROM bucket
WHERE LOWER(lastName) = 'burnett'
AND LOWER(firstName) LIKE 'b%'
AND type = 'customer'
Array Index
CREATE INDEX custsByNickName ON bucket
(DISTINCT ARRAY p FOR p IN nickNames END)
WHERE type = 'customer’
SELECT * FROM bucket
WHERE ANY p IN nickNames SATISFIES p = 'Buzz' END
AND type = 'customer'
CREATE INDEX custsByNickName ON bucket
(DISTINCT ARRAY LOWER(p) FOR p IN nickNames END)
WHERE type = 'customer’
SELECT * FROM bucket
WHERE ANY p IN nickNames SATISFIES LOWER(p) = 'buzz' END
AND type = 'customer'
Index Nodes
Node B
Index Architecture
Data Nodes
Node A
DCP
DCP
Index 1
Index 2
Replica
Index 3
Index 1
Replica
Index 2 Index 4
Deferring Index Build
CREATE INDEX docsByName
ON bucket (name)
WITH {"defer_build": true}
CREATE INDEX docsByNames
ON bucket (lastName, firstName)
WITH {"defer_build": true}
BUILD INDEX ON bucket
(docsByName, docsByNames)
Replicated Index
CREATE INDEX custsByName ON bucket
(LOWER(lastName), LOWER(firstName))
WHERE type = 'customer'
WITH {"num_replica": 1}
SELECT * FROM bucket
WHERE LOWER(lastName) LIKE 'a%’
AND type = 'customer'
SELECT * FROM bucket
WHERE LOWER(lastName) = 'burnett'
AND LOWER(firstName) LIKE 'b%’
AND type = 'customer'
Partitioned Index
CREATE INDEX custsByName ON bucket
(LOWER(lastName), LOWER(firstName))
WHERE type = 'customer'
PARTITION BY hash(tenantId)
WITH {"num_replica": 1}
SELECT * FROM bucket
WHERE LOWER(lastName) LIKE 'a%’
AND type = 'customer'
SELECT * FROM bucket
WHERE LOWER(lastName) = 'burnett'
AND tenantId = 123456
AND type = 'customer'
Query Optimization
Index Selection Criteria
• All predicates on the index must be included in
the query
• The first index expression must be in the
predicate
• Chooses the index with the most matching
expressions
• If more than one option, chooses one at random
for load balancing
• Does not use statistics for optimization (yet…)
Query Node
Query Process (a simplified subset)
Data Nodes
Index Node
1. Incoming Query 7. Query Result
2. Query Plan
7. Filter, Sort, Agg, etc
Live Demo!
This should be interesting…
This Photo by Unknown Author is licensed under CC BY-NC-SA
Nested Loop vs Hash Join in C#
Nested Loop Join
IEnumerable<RouteAirlines> Join(
IList<Route> routes, IList<Airline> airlines)
{
foreach (var route in routes)
{
var routeAirlines = new RouteAirlines
{
Route = route,
Airlines = new List<Airline>()
};
foreach (var airline in airlines)
{
if (airline.Iata == route.Airline) {
routeAirlines.Add(airline);
}
}
yield return routeAirlines;
}
}
Hash Join
IEnumerable<RouteAirlines> Join(
IList<Route> routes, IList<Airline> airlines)
{
var hashTable = airlines.ToLookup(p => p.Iata);
foreach (var route in routes)
{
var routeAirlines = new RouteAirlines
{
Route = route,
Airlines = hashTable[route.Airline].ToList()
};
yield return routeAirlines;
}
}
N1QL Hash Join
SELECT route.sourceairport, route.destinationairport, airline.name
FROM `travel-sample` AS route
INNER JOIN `travel-sample` AS airline USE HASH(build)
ON route.airline = airline.iata AND airline.type = 'airline'
WHERE route.type = 'route'
ORDER BY route.sourceairport, route.destinationairport, airline.name
Key Optimization Takeaways
Make sure fetch
is no larger than
necessary
1
Design covering
indexes where
possible
2
Watch out for
pagination
3
Consider USE
HASH where
applicable
4
Keep joins to a
minimum
5
Thanks for Coming!
Questions?
This Photo by Unknown Author is licensed under CC BY-NC-ND

Más contenido relacionado

La actualidad más candente

MySQL developing Store Procedure
MySQL developing Store ProcedureMySQL developing Store Procedure
MySQL developing Store Procedure
Marco Tusa
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
MongoDB
 

La actualidad más candente (20)

Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingUsing Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
SQL Joins and Query Optimization
SQL Joins and Query OptimizationSQL Joins and Query Optimization
SQL Joins and Query Optimization
 
Polymorphic Table Functions in SQL
Polymorphic Table Functions in SQLPolymorphic Table Functions in SQL
Polymorphic Table Functions in SQL
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
ElasticSearch at berlinbuzzwords 2010
ElasticSearch at berlinbuzzwords 2010ElasticSearch at berlinbuzzwords 2010
ElasticSearch at berlinbuzzwords 2010
 
Sql query patterns, optimized
Sql query patterns, optimizedSql query patterns, optimized
Sql query patterns, optimized
 
MySQL developing Store Procedure
MySQL developing Store ProcedureMySQL developing Store Procedure
MySQL developing Store Procedure
 
Performance Management in Oracle 12c
Performance Management in Oracle 12cPerformance Management in Oracle 12c
Performance Management in Oracle 12c
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesData Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive Approaches
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Rule Engine Evaluation for Complex Event Processing
Rule Engine Evaluation for Complex Event ProcessingRule Engine Evaluation for Complex Event Processing
Rule Engine Evaluation for Complex Event Processing
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizations
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 

Similar a Querying Nested JSON Data Using N1QL and Couchbase

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 

Similar a Querying Nested JSON Data Using N1QL and Couchbase (20)

Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDB
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using Couchbase
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Socket.io
Socket.ioSocket.io
Socket.io
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
What are customers building with new Bing Maps capabilities
What are customers building with new Bing Maps capabilitiesWhat are customers building with new Bing Maps capabilities
What are customers building with new Bing Maps capabilities
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
Presentation
PresentationPresentation
Presentation
 
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
CARTO ENGINE
CARTO ENGINECARTO ENGINE
CARTO ENGINE
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
 

Último

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 

Querying Nested JSON Data Using N1QL and Couchbase

  • 1. Querying Nested JSON Data Using N1QL and Couchbase TriNUG Data SIG 6/6/2018
  • 2. Who is this guy? • Brant Burnett - @btburnett3 • Systems Architect at CenterEdge Software • .NET since 1.0, SQL Server since 7.0 • MCSD, MCDBA • Experience from desktop apps to large scale cloud services
  • 3. NoSQL Credentials • Couchbase user since 2012 (v1.8) • Couchbase Community Expert • Open source contributions: • Couchbase .NET SDK • Couchbase.Extensions for .NET Core • Couchbase LINQ provider (Linq2Couchbase) • CouchbaseFakeIt • couchbase-index-manager
  • 4. Content Attributions • Matthew Groves Couchbase Developer Advocate @mgroves crosscuttingconcerns.com
  • 5. What is Couchbase • NoSQL document database • Get and set documents by key • Imagine a giant folder full of JSON files • If you know the filename, you can get or update the content • Additional features: • Query using N1QL (SQL-based) • Map-Reduce Views • Full Text Search • Analytics (Preview in 5.5) • Eventing (5.5) • Couchbase is not CouchDB
  • 6. Why Couchbase • Scalability • Availability • Performance • Agility
  • 7. Agenda Introduction to N1QL Working with JSON Types Joins In N1QL Indexing in Couchbase Query Optimization
  • 8. Introduction to N1QL Pronounced “nickel”
  • 9. What’s a Bucket? • Large collection of JSON documents • Every document may have a different schema • Documents are accessed by a string called the key
  • 10. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name": "Jane Smith", "DOB": "1990-01-30", "type": "customer" } Document Key: customer-CBL2015
  • 11. So how do I query data from a bucket? This Photo by Unknown Author is licensed under CC BY-SA
  • 12. { "Name": "John Smith", "DOB": "1990-06-29", "type": "customer" } Document Key: customer-CBL2016 SELECT Name, DOB FROM Bucket WHERE type = 'customer' AND Name LIKE '%Smith' ORDER BY Name [{ "Name": "Jane Smith", "DOB": "1990-01-30" }, { "Name": "John Smith", "DOB": "1990-06-29" }] { "Name": "Jane Smith", "DOB": "1990-01-30", "type": "customer" } Document Key: customer-CBL2015
  • 13. What other SQL features are supported? • Aggregation (MIN, MAX, SUM, AVG, COUNT, etc) • GROUP BY/HAVING • OFFSET/LIMIT • Subqueries • UNION/INTERSECT/EXCEPT • Joins (more details to come…) • UPDATE/INSERT/DELETE/UPSERT
  • 14. Accessing Nested Objects Key: airport_3484 { "airportname": "Los Angeles Intl", "city": "Los Angeles", "country": "United States", "faa": "LAX", "geo": { "alt": 126, "lat": 33.942536, "lon": -118.408075 }, "icao": "KLAX", "id": 3484, "type": "airport", "tz": "America/Los_Angeles" } SELECT * FROM `travel-sample` WHERE type = 'airport' AND geo.alt < 1000 SELECT * FROM `travel-sample` WHERE type = 'route' AND schedules[0].day = 1
  • 15. O backtick, backtick! Wherefore art thou a backtick? • ANSI SQL delimits identifiers with double quotes • SELECT * FROM "table-name" • T-SQL also delimits identifiers with square brackets • SELECT * FROM [table-name] • Both of these are used in JSON! • {"array": ["string1", "string2"]} • So, N1QL uses the backtick instead • SELECT * FROM `bucket-name` This Photo by Unknown Author is licensed under CC BY-NC-ND
  • 17. Strings Supported by JSON Collation is always case sensitive 1 Literals are delimited with either double or single quotes x = 'my string here’ x = "my string here" 2 Various supporting functions •String concatenation (||) •LENGTH •LOWER •CONTAINS •TRIM, etc… 3
  • 18. Numbers Supported by JSON 1 Literals are included without delimiters x = 123456.05124 2 Various supporting functions • Arithmetic operators • ABS • CEIL • SQRT • TRUNC, etc... 3
  • 19. Booleans Supported by JSON 1 Literals are true and false x = true 2 Various supporting operators • NOT • AND • OR 3
  • 20. Arrays Supported by JSON 1 Literals are comma delimited and surrounded by square brackets [1, 2, 3, "a"] 2 Various supporting functions • subqueries • ARRAY_CONTAINS • ARRAY_AVG • ARRAY_INSERT • ARRAY_LENGTH, etc… 3
  • 21. Objects Supported by JSON 1 Literals are comma delimited key/value pairs surrounded by curly braces {"key": "value"} 2 Various supporting functions • OBJECT _NAMES • OBJECT_PAIRS • OBJECT_VALUES, etc… 3
  • 22. Nulls Supported by JSON 1 Literal is the word null, no delimiters {"key": null} 2 Various supporting operators and functions • IS NULL • IS NOT NULL • IFNULL, etc… 3
  • 23. Missing attributes Supported by JSON Can’t be explicitly declared Similar to undefined in Javascript 1 No literal, simply don’t include an attribute in an object {} 2 Various supporting operators and functions • IS MISSING • IS NOT MISSING • IFMISSING • IFMISSINGORNULL, etc… 3
  • 24. Date/times Not officially supported by JSON Can be stored using other data types Usually either ISO8601 string or number of milliseconds since the Unix epoch 1 Literal depends on the data type "2018-04-06T19:26:29.000Z" 1528140389000 2 Various supporting functions • STR_TO_MILLIS • CLOCK_MILLIS, ClOCK_STR • DATE_PART_STR, DATE_PART_MILLIS • DATE_DIFF_STR, DATE_DIFF_MILLIS • etc… 3
  • 26. Key: route_10000 { "airline": "AF", "airlineid": "airline_137", "destinationairport": "MRS", "distance": 2881.617376098415, "equipment": "320", "id": 10000, "schedule": [ { "day": 0, "flight": "AF198", "utc": "10:13:00" }, { "day": 0, "flight": "AF547", "utc": "19:14:00" } ], "sourceairport": "TLV", "stops": 0, "type": "route" } Referenced 1:N Relationship Key: airline_137 { "callsign": "AIRFRANS", "country": "France", "iata": "AF", "icao": "AFR", "id": 137, "name": "Air France", "type": "airline" }
  • 27. Joining by Primary Key SELECT route.sourceairport, route.destinationairport, airline.name FROM `travel-sample` AS route INNER JOIN `travel-sample` AS airline ON route.airlineid = META(airline).id WHERE route.type = 'route' ORDER BY route.sourceairport, route.destinationairport, airline.name
  • 28. Key: route_10000 { "airline": "AF", "airlineid": "airline_137", "destinationairport": "MRS", "distance": 2881.617376098415, "equipment": "320", "id": 10000, "schedule": [ { "day": 0, "flight": "AF198", "utc": "10:13:00" }, { "day": 0, "flight": "AF547", "utc": "19:14:00" } ], "sourceairport": "TLV", "stops": 0, "type": "route" } Referenced 1:N Relationship Key: airline_137 { "callsign": "AIRFRANS", "country": "France", "iata": "AF", "icao": "AFR", "id": 137, "name": "Air France", "type": "airline" }
  • 29. Joining by Attributes SELECT route.sourceairport, route.destinationairport, airline.name FROM `travel-sample` AS route INNER JOIN `travel-sample` AS airline ON route.airline = airline.iata AND airline.type = 'airline' WHERE route.type = 'route' ORDER BY route.sourceairport, route.destinationairport, airline.name
  • 30. Key: route_10000 { "airline": "AF", "airlineid": "airline_137", "destinationairport": "MRS", "distance": 2881.617376098415, "equipment": "320", "id": 10000, "schedule": [ { "day": 0, "flight": "AF198", "utc": "10:13:00" }, { "day": 0, "flight": "AF547", "utc": "19:14:00" } ], "sourceairport": "TLV", "stops": 0, "type": "route" } Embedded 1:N Relationship
  • 31. Flattening Embedded Lists SELECT route.sourceairport, route.destinationairport, schedule.utc FROM `travel-sample` AS route UNNEST route.schedule AS schedule WHERE route.type = 'route' AND schedule.day = 0 ORDER BY route.sourceairport, route.destinationairport, schedule.utc
  • 32. My data’s not flat, why are my queries? This Photo by Unknown Author is licensed under CC BY-SA
  • 33. Key: route_50490 { "airline": "SQ", "airlineid": "airline_4435", "destinationairport": "ORD", "distance": 2802.1171926467396, "equipment": "320", "id": 50490, "schedule": [{ "day": 0, "flight": "SQ279", "utc": "15:13:00" }, { "day": 0, "flight": "SQ835", "utc": "21:10:00" }], "sourceairport": "LAX", "stops": 0, "type": "route" } Referenced 1:N Relationship Key: airport_3484 { "airportname": "Los Angeles Intl", "city": "Los Angeles", "country": "United States", "faa": "LAX", "geo": { "alt": 126, "lat": 33.942536, "lon": -118.408075 }, "icao": "KLAX", "id": 3484, "type": "airport", "tz": "America/Los_Angeles" }
  • 34. Nesting (a.k.a. LINQ GroupJoin) SELECT airport.*, (SELECT RAW r2.destinationairport FROM routes AS r2) AS destinations FROM `travel-sample` AS airport INNER NEST `travel-sample` AS routes ON airport.faa = routes.sourceairport AND routes.type = 'route' WHERE airport.type = 'airport' AND airport.airportname LIKE 'Los Angeles%'
  • 35. Nesting (a.k.a. LINQ GroupJoin) { "airportname": "Los Angeles Intl", "city": "Los Angeles", "country": "United States", "destinations": ["PHX", "SEA", "MCO", "ATL", "SYD", "YYZ", "LIM", "LHR", "IND", "CLE", "..."], "faa": "LAX", "geo": { "alt": 126, "lat": 33.942536, "lon": -118.408075 }, "icao": "KLAX", "id": 3484, "type": "airport", "tz": "America/Los_Angeles" }
  • 37. The Primary Index CREATE PRIMARY INDEX ON bucket SELECT * FROM bucket
  • 38. Single Attribute Index CREATE INDEX docsByName ON bucket (name) SELECT * FROM bucket WHERE name LIKE 'A%' SELECT * FROM bucket WHERE name >= 'A' AND name < 'N'
  • 39. Multiple Attribute Index CREATE INDEX docsByNames ON bucket (lastName, firstName) SELECT * FROM bucket WHERE lastName LIKE 'A%' SELECT * FROM bucket WHERE lastName = 'Burnett' AND firstName LIKE ‘B%'
  • 40. Expression Index CREATE INDEX docsByName ON bucket (LOWER(lastName), LOWER(firstName)) SELECT * FROM bucket WHERE LOWER(lastName) LIKE 'a%' SELECT * FROM bucket WHERE LOWER(lastName) = 'burnett' AND LOWER(firstName) LIKE 'b%'
  • 41. Filtered Index CREATE INDEX custsByName ON bucket (LOWER(lastName), LOWER(firstName)) WHERE type = 'customer' SELECT * FROM bucket WHERE LOWER(lastName) LIKE 'a%' AND type = 'customer' SELECT * FROM bucket WHERE LOWER(lastName) = 'burnett' AND LOWER(firstName) LIKE 'b%' AND type = 'customer'
  • 42. Array Index CREATE INDEX custsByNickName ON bucket (DISTINCT ARRAY p FOR p IN nickNames END) WHERE type = 'customer’ SELECT * FROM bucket WHERE ANY p IN nickNames SATISFIES p = 'Buzz' END AND type = 'customer' CREATE INDEX custsByNickName ON bucket (DISTINCT ARRAY LOWER(p) FOR p IN nickNames END) WHERE type = 'customer’ SELECT * FROM bucket WHERE ANY p IN nickNames SATISFIES LOWER(p) = 'buzz' END AND type = 'customer'
  • 43. Index Nodes Node B Index Architecture Data Nodes Node A DCP DCP Index 1 Index 2 Replica Index 3 Index 1 Replica Index 2 Index 4
  • 44. Deferring Index Build CREATE INDEX docsByName ON bucket (name) WITH {"defer_build": true} CREATE INDEX docsByNames ON bucket (lastName, firstName) WITH {"defer_build": true} BUILD INDEX ON bucket (docsByName, docsByNames)
  • 45. Replicated Index CREATE INDEX custsByName ON bucket (LOWER(lastName), LOWER(firstName)) WHERE type = 'customer' WITH {"num_replica": 1} SELECT * FROM bucket WHERE LOWER(lastName) LIKE 'a%’ AND type = 'customer' SELECT * FROM bucket WHERE LOWER(lastName) = 'burnett' AND LOWER(firstName) LIKE 'b%’ AND type = 'customer'
  • 46. Partitioned Index CREATE INDEX custsByName ON bucket (LOWER(lastName), LOWER(firstName)) WHERE type = 'customer' PARTITION BY hash(tenantId) WITH {"num_replica": 1} SELECT * FROM bucket WHERE LOWER(lastName) LIKE 'a%’ AND type = 'customer' SELECT * FROM bucket WHERE LOWER(lastName) = 'burnett' AND tenantId = 123456 AND type = 'customer'
  • 48. Index Selection Criteria • All predicates on the index must be included in the query • The first index expression must be in the predicate • Chooses the index with the most matching expressions • If more than one option, chooses one at random for load balancing • Does not use statistics for optimization (yet…)
  • 49. Query Node Query Process (a simplified subset) Data Nodes Index Node 1. Incoming Query 7. Query Result 2. Query Plan 7. Filter, Sort, Agg, etc
  • 50. Live Demo! This should be interesting… This Photo by Unknown Author is licensed under CC BY-NC-SA
  • 51. Nested Loop vs Hash Join in C# Nested Loop Join IEnumerable<RouteAirlines> Join( IList<Route> routes, IList<Airline> airlines) { foreach (var route in routes) { var routeAirlines = new RouteAirlines { Route = route, Airlines = new List<Airline>() }; foreach (var airline in airlines) { if (airline.Iata == route.Airline) { routeAirlines.Add(airline); } } yield return routeAirlines; } } Hash Join IEnumerable<RouteAirlines> Join( IList<Route> routes, IList<Airline> airlines) { var hashTable = airlines.ToLookup(p => p.Iata); foreach (var route in routes) { var routeAirlines = new RouteAirlines { Route = route, Airlines = hashTable[route.Airline].ToList() }; yield return routeAirlines; } }
  • 52. N1QL Hash Join SELECT route.sourceairport, route.destinationairport, airline.name FROM `travel-sample` AS route INNER JOIN `travel-sample` AS airline USE HASH(build) ON route.airline = airline.iata AND airline.type = 'airline' WHERE route.type = 'route' ORDER BY route.sourceairport, route.destinationairport, airline.name
  • 53. Key Optimization Takeaways Make sure fetch is no larger than necessary 1 Design covering indexes where possible 2 Watch out for pagination 3 Consider USE HASH where applicable 4 Keep joins to a minimum 5
  • 55. Questions? This Photo by Unknown Author is licensed under CC BY-NC-ND

Notas del editor

  1. Scalability – Multi node, auto-sharded architecture makes it easy to scale out horizontally Availability – Multi node architecture makes high availability easy COUCH = Cluster of Unreliable Commodity Hardware Agility – JSON documents without schema enforcement makes it easy for teams to iterate quickly
  2. Non-first normal form query language
  3. Think millions of documents Schema is not enforced by the DB
  4. Let’s see how to represent customer data in JSON. The primary (CustomerID) becomes the DocumentKey Column name-Column value becomes KEY-VALUE pair.
  5. What if I wanted to filter to airports with an altitude less than 1000? Just use Javascript dot notation to access attributes at any depth You may also use Javascript square bracket array notation to access items in arrays by index
  6. Non-first normal form query language
  7. You can use the LOWER function to avoid case-sensitive collation String concatenation is one difference from SQL, uses double vertical bars. Since we can’t know the type in advance, we need a separate concat operator from the addition operator
  8. Note that array elements don’t necessarily have to be of the same type, though they usually are
  9. Non-first normal form query language
  10. First animation: note that we use an alias on the bucket name. This prevents confusion when we’re getting multiple document types from the same bucket. Second animation: note that we’re using META().id to get the primary key of the document to join Also, note that this syntax is only available in Couchbase Server 5.5
  11. But what if I want to join based on an attribute instead of the primary key?
  12. The type filter on the second extent should be part of the ON clause, not the WHERE clause There must be an index to support looking up the second extent based on these clauses Not as performant as a join based on primary key, which doesn’t need an index at all
  13. Embedding an array inside a document creates an implicit 1:N relationship between the root document and the items in the array. But how do I join across this relationship?
  14. The type filter on the second extent should be part of the ON clause, not the WHERE clause There must be an index to support looking up the second extent based on these clauses Not as performant as a join based on primary key, which doesn’t need an index at all
  15. Note that we want to know all the routes for a set of airports. In traditional SQL, we’d have to flatten the output, repeating the airport data for every matching route.
  16. Nesting is analogous to GroupJoin in LINQ, where all matching documents are returned in an array We’re using an additional subquery on the array in the select projection to reduce the data we’re returning There is also LEFT OUTER NEST
  17. Nesting is analogous to GroupJoin in LINQ, where all matching documents are returned in an array We’re using an additional subquery on the array in the select projection to reduce the data we’re returning There is also LEFT OUTER NEST
  18. Indexes every document in the bucket by the primary key Supports any query, but with poor performance Kind of like a table scan in SQL, except it scans every document in the entire bucket Not recommend for production, except some very specific use cases
  19. Automatically excludes any documents where “name” is MISSING The attribute must be included in the predicate for the index to be used, just like a SQL index
  20. Automatically excludes any documents where “lastName” or “firstName” is MISSING At least the first attribute must be included in the predicate for the index to be used, just like a SQL index The second attribute will be used if possible, and so on for multiple attributes
  21. Can use any deterministic function to adjust attributes before they are indexed Predicates must use the same function to match the index Still excludes MISSING lastName and firstName, since LOWER(MISSING) = MISSING
  22. Can include any quantity of deterministic predicates Requires that queries must include the same predicates (all of them!) in order to match the index Because query planning occurs before parameter substitution, the type = ‘customer’ clause cannot be parameterized
  23. Any expression in the index definition can be an ARRAY clause, though only one array is allowed Includes all values in the array, so long arrays can significantly increase index size Animation: You can also use functions as part of the array clause
  24. DCP streams mutations (inserts, updates, and deletes) to the index nodes Streaming is async, thus indexes are eventually consistent High availability and load balancing is provided by having replicas on more than one node, each replica is a full copy of the index Any given query only accesses one copy of the index on one node, avoiding scatter/gather for low latency
  25. When building an index, streams the entire bucket from the data nodes to the index node Only one index build can be running at a time By building more than one index with a single BUILD command, we can share the stream
  26. Creates two complete copies of the index, on two different index nodes Provides HA and load balancing
  27. New in 5.5, only available in enterprise edition Spreads the index across all nodes in the cluster (optionally a subset of nodes), deciding which node receives which part of the index based on a deterministic hash of the referenced attribute Good for particularly large indexes, as index can now scale horizontally Creates a scatter/gather situation which can increase latency, but that is eliminated if you include a equality predicate for the hashed attribute so it can go to just one node
  28. Note that if the index contains all data needed by the query, it will “cover” the query, meaning steps 4 and 5 are skipped Key to optimizing this process is to reduce waste in step 4, avoid having documents returned from the index that are then thrown out by Step 7
  29. Oversimplification, but delivers the concept Which one of these do you think is most efficient? Depends on the relative sizes of the two lists Short first list, isn’t worth the time to build the hash table
  30. Normal attribute join uses an inner loop, which is inefficient if the left hand extent has lots of data and the right hand side is small repeating set Hash join is an optimization automatically selected by RDBMS implementations, but must be manually chosen in N1QL Builds a hash table of all possible matches on the right hand extent, and uses the hash table when processing the left hand extent Use “probe” instead of “build” to build the hash table on the left side instead of the right (should be the smaller set) Only available on 5.5 Enterprise Edition (free for dev, but costs for production, includes support)