Querying Nested JSON Data Using N1QL and Couchbase

Querying Nested
JSON Data Using
N1QL and
Couchbase
TriNUG Data SIG
6/6/2018

Who is this guy?
• Brant Burnett - @btburnett3
• Systems Architect at CenterEdge Software
• .NET since 1.0, SQL Server since 7.0
• MCSD, MCDBA
• Experience from desktop apps to large
scale cloud services

NoSQL Credentials
• Couchbase user since 2012 (v1.8)
• Couchbase Community Expert
• Open source contributions:
• Couchbase .NET SDK
• Couchbase.Extensions for .NET Core
• Couchbase LINQ provider (Linq2Couchbase)
• CouchbaseFakeIt
• couchbase-index-manager

Content
Attributions
• Matthew Groves
Couchbase Developer Advocate
@mgroves
crosscuttingconcerns.com

What is Couchbase
• NoSQL document database
• Get and set documents by key
• Imagine a giant folder full of JSON files
• If you know the filename, you can get or
update the content
• Additional features:
• Query using N1QL (SQL-based)
• Map-Reduce Views
• Full Text Search
• Analytics (Preview in 5.5)
• Eventing (5.5)
• Couchbase is not CouchDB

Why Couchbase
• Scalability
• Availability
• Performance
• Agility

Agenda
Introduction to N1QL
Working with JSON Types
Joins In N1QL
Indexing in Couchbase
Query Optimization

Introduction to N1QL Pronounced “nickel”

What’s a Bucket?
• Large collection of JSON
documents
• Every document may have a
different schema
• Documents are accessed by a
string called the key

CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"Name": "Jane Smith",
"DOB": "1990-01-30",
"type": "customer"
}
Document Key: customer-CBL2015

So how do I query data from a bucket?
This Photo by Unknown Author is licensed under CC BY-SA

{
"Name": "John Smith",
"DOB": "1990-06-29",
"type": "customer"
}
SELECT Name, DOB FROM Bucket
WHERE type = 'customer' AND Name LIKE '%Smith'
ORDER BY Name
[{
"DOB": "1990-01-30"
},
{
"Name": "John Smith",
"DOB": "1990-06-29"
}]
{
"DOB": "1990-01-30",
"type": "customer"
}

What other SQL features are
supported?
• Aggregation (MIN, MAX, SUM, AVG, COUNT, etc)
• GROUP BY/HAVING
• OFFSET/LIMIT
• Subqueries
• UNION/INTERSECT/EXCEPT
• Joins (more details to come…)
• UPDATE/INSERT/DELETE/UPSERT

Accessing Nested Objects
Key: airport_3484
{
"airportname": "Los Angeles Intl",
"city": "Los Angeles",
"country": "United States",
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
"tz": "America/Los_Angeles"
}
SELECT *
FROM `travel-sample`
WHERE type = 'airport' AND geo.alt < 1000
SELECT *
FROM `travel-sample`
WHERE type = 'route'
AND schedules[0].day = 1

O backtick, backtick!
Wherefore art thou a
backtick?
• ANSI SQL delimits identifiers with double quotes
• SELECT * FROM "table-name"
• T-SQL also delimits identifiers with square
brackets
• SELECT * FROM [table-name]
• Both of these are used in JSON!
• {"array": ["string1", "string2"]}
• So, N1QL uses the backtick instead
• SELECT * FROM `bucket-name`
This Photo by Unknown Author is licensed under CC BY-NC-ND

Strings
Supported by JSON
Collation is always
case sensitive
1
Literals are delimited with
either double or single
quotes
x = 'my string here’
x = "my string here"
2
Various supporting
functions
•String concatenation (||)
•LENGTH
•LOWER
•CONTAINS
•TRIM, etc…
3

Numbers
Supported by JSON
1
Literals are included
without delimiters
x = 123456.05124
2
Various supporting
functions
• Arithmetic operators
• ABS
• CEIL
• SQRT
• TRUNC, etc...
3

Booleans
Supported by JSON
1
Literals are true and
false
x = true
2
Various supporting
operators
• NOT
• AND
• OR
3

Arrays
Supported by JSON
1
Literals are comma
delimited and surrounded
by square brackets
[1, 2, 3, "a"]
2
Various supporting
functions
• subqueries
• ARRAY_CONTAINS
• ARRAY_AVG
• ARRAY_INSERT
• ARRAY_LENGTH, etc…
3

Objects
Supported by JSON
1
Literals are comma
delimited key/value pairs
surrounded by curly
braces
{"key": "value"}
2
Various supporting
functions
• OBJECT _NAMES
• OBJECT_PAIRS
• OBJECT_VALUES, etc…
3

Nulls
Supported by JSON
1
Literal is the word null,
no delimiters
{"key": null}
2
Various supporting
operators and
functions
• IS NULL
• IS NOT NULL
• IFNULL, etc…
3

Missing attributes
Supported by JSON
Can’t be explicitly
declared
Similar to undefined in
Javascript
1
No literal, simply don’t
include an attribute in
an object
{}
2
Various supporting
operators and functions
• IS MISSING
• IS NOT MISSING
• IFMISSING
• IFMISSINGORNULL, etc…
3

Date/times
Not officially supported by
JSON
Can be stored using other data
types
Usually either ISO8601 string
or number of milliseconds
since the Unix epoch
1
Literal depends on the data
type
"2018-04-06T19:26:29.000Z"
1528140389000
2
Various supporting functions
• STR_TO_MILLIS
• CLOCK_MILLIS, ClOCK_STR
• DATE_PART_STR, DATE_PART_MILLIS
• DATE_DIFF_STR, DATE_DIFF_MILLIS
• etc…
3

Key: route_10000
{
"airline": "AF",
"airlineid": "airline_137",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
}
],
"sourceairport": "TLV",
"stops": 0,
"type": "route"
}
Referenced 1:N Relationship
Key: airline_137
{
"callsign": "AIRFRANS",
"country": "France",
"iata": "AF",
"icao": "AFR",
"id": 137,
"name": "Air France",
"type": "airline"
}

Joining by Primary Key
SELECT route.sourceairport, route.destinationairport, airline.name
FROM `travel-sample` AS route
INNER JOIN `travel-sample` AS airline
ON route.airlineid = META(airline).id
WHERE route.type = 'route'
ORDER BY route.sourceairport, route.destinationairport, airline.name

Joining by Attributes
INNER JOIN `travel-sample` AS airline
ON route.airline = airline.iata AND airline.type = 'airline'

Key: route_10000
{
"airline": "AF",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
}
],
"sourceairport": "TLV",
"stops": 0,
"type": "route"
}
Embedded 1:N Relationship

Flattening Embedded Lists
SELECT route.sourceairport, route.destinationairport, schedule.utc
UNNEST route.schedule AS schedule
WHERE route.type = 'route' AND schedule.day = 0
ORDER BY route.sourceairport, route.destinationairport, schedule.utc

My data’s not flat, why are my queries?
This Photo by Unknown Author is licensed under CC BY-SA

Key: route_50490
{
"airline": "SQ",
"destinationairport": "ORD",
"distance": 2802.1171926467396,
"equipment": "320",
"id": 50490,
"schedule": [{
"day": 0,
"flight": "SQ279",
"utc": "15:13:00"
}, {
"day": 0,
"flight": "SQ835",
"utc": "21:10:00"
}],
"sourceairport": "LAX",
"stops": 0,
"type": "route"
}
Referenced 1:N Relationship
Key: airport_3484
{
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
}

Nesting (a.k.a. LINQ GroupJoin)
SELECT
airport.*,
(SELECT RAW r2.destinationairport FROM routes AS r2) AS destinations
FROM `travel-sample` AS airport
INNER NEST `travel-sample` AS routes
ON airport.faa = routes.sourceairport AND routes.type = 'route'
WHERE airport.type = 'airport'
AND airport.airportname LIKE 'Los Angeles%'

Nesting (a.k.a. LINQ GroupJoin)
{
"destinations": ["PHX", "SEA", "MCO", "ATL", "SYD", "YYZ", "LIM", "LHR", "IND", "CLE", "..."],
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
}

Indexing in
Couchbase
Global Secondary Indexes
a.k.a. GSI

The Primary Index
CREATE PRIMARY INDEX ON bucket SELECT * FROM bucket

Single Attribute Index
CREATE INDEX docsByName
ON bucket (name)
SELECT * FROM bucket
WHERE name LIKE 'A%'
WHERE name >= 'A' AND name < 'N'

Multiple Attribute Index
CREATE INDEX docsByNames ON bucket
(lastName, firstName)
WHERE lastName LIKE 'A%'
WHERE lastName = 'Burnett'
AND firstName LIKE ‘B%'

Expression Index
CREATE INDEX docsByName ON bucket
(LOWER(lastName), LOWER(firstName))
WHERE LOWER(lastName) LIKE 'a%'
WHERE LOWER(lastName) = 'burnett'
AND LOWER(firstName) LIKE 'b%'

Filtered Index
CREATE INDEX custsByName ON bucket
WHERE type = 'customer'
WHERE LOWER(lastName) LIKE 'a%'
AND type = 'customer'
AND LOWER(firstName) LIKE 'b%'

Array Index
CREATE INDEX custsByNickName ON bucket
(DISTINCT ARRAY p FOR p IN nickNames END)
WHERE type = 'customer’
WHERE ANY p IN nickNames SATISFIES p = 'Buzz' END
CREATE INDEX custsByNickName ON bucket
(DISTINCT ARRAY LOWER(p) FOR p IN nickNames END)
WHERE type = 'customer’
WHERE ANY p IN nickNames SATISFIES LOWER(p) = 'buzz' END

Index Nodes
Node B
Index Architecture
Data Nodes
Node A
DCP
DCP
Index 1
Index 2
Replica
Index 3
Index 1
Replica
Index 2 Index 4

Deferring Index Build
CREATE INDEX docsByName
ON bucket (name)
WITH {"defer_build": true}
CREATE INDEX docsByNames
ON bucket (lastName, firstName)
WITH {"defer_build": true}
BUILD INDEX ON bucket
(docsByName, docsByNames)

Replicated Index
WITH {"num_replica": 1}
WHERE LOWER(lastName) LIKE 'a%’
AND LOWER(firstName) LIKE 'b%’

Partitioned Index
PARTITION BY hash(tenantId)
WITH {"num_replica": 1}
WHERE LOWER(lastName) LIKE 'a%’
AND tenantId = 123456

Index Selection Criteria
• All predicates on the index must be included in
the query
• The first index expression must be in the
predicate
• Chooses the index with the most matching
expressions
• If more than one option, chooses one at random
for load balancing
• Does not use statistics for optimization (yet…)

Query Node
Query Process (a simplified subset)
Data Nodes
Index Node
1. Incoming Query 7. Query Result
2. Query Plan
7. Filter, Sort, Agg, etc

Live Demo!
This should be interesting…
This Photo by Unknown Author is licensed under CC BY-NC-SA

Nested Loop vs Hash Join in C#
Nested Loop Join
IEnumerable<RouteAirlines> Join(
IList<Route> routes, IList<Airline> airlines)
{
foreach (var route in routes)
{
var routeAirlines = new RouteAirlines
{
Route = route,
Airlines = new List<Airline>()
};
foreach (var airline in airlines)
{
if (airline.Iata == route.Airline) {
routeAirlines.Add(airline);
}
}
yield return routeAirlines;
}
}
Hash Join
IEnumerable<RouteAirlines> Join(
IList<Route> routes, IList<Airline> airlines)
{
var hashTable = airlines.ToLookup(p => p.Iata);
foreach (var route in routes)
{
var routeAirlines = new RouteAirlines
{
Route = route,
Airlines = hashTable[route.Airline].ToList()
};
yield return routeAirlines;
}
}

N1QL Hash Join
INNER JOIN `travel-sample` AS airline USE HASH(build)
ON route.airline = airline.iata AND airline.type = 'airline'

Key Optimization Takeaways
Make sure fetch
is no larger than
necessary
1
Design covering
indexes where
possible
2
Watch out for
pagination
3
Consider USE
HASH where
applicable
4
Keep joins to a
minimum
5

Questions?
This Photo by Unknown Author is licensed under CC BY-NC-ND

Querying Nested JSON Data Using N1QL and Couchbase

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Querying Nested JSON Data Using N1QL and Couchbase

Similar a Querying Nested JSON Data Using N1QL and Couchbase (20)

Último

Último (20)

Querying Nested JSON Data Using N1QL and Couchbase

Notas del editor