SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Martijn van Groningen
mvg@apache.org
@mvgroningen
Document relations
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Topics
• Background
• Parent / child support
• Nested support
• Future developments
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Query
Local join
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
• We need more capacity.
• But how to divide the relational data?
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Q
uery
sub-queries
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Query
sub-query
De-normalized document
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Query
sub-query
C
local joinlocal join
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
• Dealing with relations either pay the price on
write time or read time.
• Alternatively documents relations can balance
the costs between read and write time.
For example: one join to reduce duplicated data.
• Supporting “many-to-many” joins in a
distributed system is difficult.
Either unbalanced partitions or very expensive join.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
The query time join
Parent child
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child
• Parent / child is a query time join between
different document types in the same index.
• Parent and children documents are stored as
separate documents in the same index.
• Child documents can point to only one parent.
• Parent documents can be referred by multiple child documents.
• Also a parent document can be a child
document of a different parent.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child
• A parent document and its children
documents are routed into the same shard.
• Parent id is used as routing value.
• In combination with a parent ids in memory
data structure the parent-child join is fast.
• Use warmer api to preload it!
• Parent ids data structure size has significantly been reduced in
version 0.90.1
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - Indexing
• The parent document doesn’t need to exist at
time of indexing.
curl -XPUT 'localhost:9200/products' -d '{
  "mappings" : {
     "offer" : {
        "_parent" : { "type" : "product" }
     }
  }
}'
A offer document
is a parent of a
product document
curl -XPUT 'localhost:9200/products/offer/12?parent=p2345' -d '{
"valid_from" : "2013-05-01",
"valid_to" : "2013-10-01",
"price" : 26.87,
}'
Then when
indexing mention
to what product a
offer points to.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - Querying
• The has_child query returns parent
documents based on matches in its child
documents.
• The optional “score_mode” defines how child
hits are mapped to its parent document.
curl -XGET 'localhost:9200/products/_search' -d '{
"query" : {
      "has_child" : {
         "type" : "offer",
" "query" : {
            "range" : {
               "price" : {
"lte" : 50
               }
            }
       }
    }
  }
}'
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
The index time join
Nested objects
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
• In many cases domain models have the same
write / update live-cycle.
• Books & Chapters.
• Movies & Actors.
• De-normalizing results in the fastest queries.
• Compared to using parent/child queries.
• Nested objects allow smart de-normalization.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"authors" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapter_1_title" : "Introduction",
"chapter_1_summary" : "Short introduction about Elasticsearch’s features ...",
"chapter_1_number_of_pages" : 12,
"chapter_2_title" : "Data in, Data out",
"chapter_2_summary" : "How to manage your data with Elasticsearch ...",
"chapter_2_number_of_pages" : 39,
...
}
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"authors" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapter_1_title" : "Introduction",
"chapter_1_summary" : "Short introduction about Elasticsearch’s features ...",
"chapter_1_number_of_pages" : 12,
"chapter_2_title" : "Data in, Data out",
"chapter_2_summary" : "How to manage your data with Elasticsearch ...",
"chapter_2_number_of_pages" : 39,
...
}
Too verbose!
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"author" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapters" : [
{
"title" : "Introduction",
"summary" : "Short introduction about Elasticsearch’s features ...",
"number_of_pages" : 12
},
{
"title" : "Data in, Data out",
"summary" : "How to manage your data with Elasticsearch ...",
"number_of_pages" : 39
},
...
]
}
• JSON allows complex nesting of objects.
• But how does this get indexed?
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
...
"chapters" : [
{"title" : "Introduction", "summary" : "Short ...", "number_of_pages" : 12},
{"title" : "Data in, ...", "summary" : "How to ...", "number_of_pages" : 39},
...
]
}
{
"title" : "Elasticsearch",
...
"chapters.title" : ["Data in, Data out", "Introduction"],
"chapters.summary" : ["How to ...", "Short ..."],
"chapters.number_of_pages" : [12, 39]
}
Original json document:
Lucene Document Structure:
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Mapping
• The nested type triggers Lucene’s block
indexing.
• Multiple levels of inner objects is possible.
curl -XPUT 'localhost:9200/books' -d '{
"mappings" : {
"book" : {
"properties" : {
"chapters" : {
"type" : "nested"
}
}
}
}
}'
Document type
Field type: ‘nested’
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Block indexing
{"chapters.title" : "Into...", "chapters.summary" : "...", "chapters.number_of_pages" : 12},
{"chapters.title" : "Data...", "chapters.summary" : "...", "chapters.number_of_pages" : 39},
...
{
"title" : "Elasticsearch",
...
}
Lucene Documents Structure:
• Inlining the inner objects as separate Lucene
documents right before the root document.
• The root document and its nested documents
always remain in the same block.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Nested query
• Nested query returns the complete “book” as
hit. (root document)
curl -XGET 'localhost:9200/books/book/_search' -d '{
  "query" : {
     "nested" : {
         "path" : "chapters",
         "score_mode" : "avg",
" "query" : {
            "match" : {
               "chapters.summary" : {
                  "query" : "indexing data"
               }
            }
         }" "
     }
  }
}'
Specify the
nested level.
Chapter level
query
score mode
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
X X X X X
root documents bitset:
Nested Lucene document, that match with the inner query.
Aggregate nested scores and push to root document.
X Set bit, that represents a root document.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
But first questions!
Extra slides
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Nested sorting
curl -XGET 'localhost:9200/books/book/_search' -d '{
 "query" : {
  "match" : {
"summary" : {
"query" : "guide"
}
}       
},
"sort" : [
{
"chapters.number_of_pages" : {
"sort_mode" : "avg",
"nested_filter" : {
"range" : {
"chapters.number_of_pages" : {"lte" : 15}
}
}
}
}
]
}'
Sort mode
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - sorting
• Parent/child sorting isn’t possible at the
moment.
• But there is a “custom_score” query work around.
• Downsides:
• Forces to execute a script for each matching document.
• The child sort value is converted into a float value.
"has_child" : {
"type" : "offer",
"score_mode" : "avg",
"query" : {
"custom_score" : {
"query" : { ... },
"script" : "doc["price"].value"
}
}
}
Wednesday, June 5, 13

Más contenido relacionado

La actualidad más candente

JSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebJSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebGregg Kellogg
 
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackApache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackWes McKinney
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraMarkus Lanthaler
 
DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트:: AWS Summit Online Korea 2020
DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트::  AWS Summit Online Korea 2020DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트::  AWS Summit Online Korea 2020
DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트:: AWS Summit Online Korea 2020Amazon Web Services Korea
 
Backup and archiving in the aws cloud
Backup and archiving in the aws cloudBackup and archiving in the aws cloud
Backup and archiving in the aws cloudAmazon Web Services
 
Presentation implementing oracle asm successfully
Presentation    implementing oracle asm successfullyPresentation    implementing oracle asm successfully
Presentation implementing oracle asm successfullyxKinAnx
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in actionCodemotion
 
Oracle Cloud is Best for Oracle Database - High Availability
Oracle Cloud is Best for Oracle Database - High AvailabilityOracle Cloud is Best for Oracle Database - High Availability
Oracle Cloud is Best for Oracle Database - High AvailabilityMarkus Michalewicz
 
An Introduction to Map/Reduce with MongoDB
An Introduction to Map/Reduce with MongoDBAn Introduction to Map/Reduce with MongoDB
An Introduction to Map/Reduce with MongoDBRainforest QA
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Web Services
 
Query store查詢調校新利器
Query store查詢調校新利器Query store查詢調校新利器
Query store查詢調校新利器Rico Chen
 
Cagando Datos con APEX_DATA_PARSER
Cagando Datos con APEX_DATA_PARSERCagando Datos con APEX_DATA_PARSER
Cagando Datos con APEX_DATA_PARSERRodolfoRodriguez161
 
4차 산업혁명 시대의 제조업 혁신을 위한 Data Lake 고객 사례::구태훈, 최삼락, Tony Spagnuolo::AWS Summit ...
4차 산업혁명 시대의 제조업 혁신을 위한 Data Lake 고객 사례::구태훈, 최삼락, Tony Spagnuolo::AWS Summit ...4차 산업혁명 시대의 제조업 혁신을 위한 Data Lake 고객 사례::구태훈, 최삼락, Tony Spagnuolo::AWS Summit ...
4차 산업혁명 시대의 제조업 혁신을 위한 Data Lake 고객 사례::구태훈, 최삼락, Tony Spagnuolo::AWS Summit ...Amazon Web Services Korea
 
JSON-LD for RESTful services
JSON-LD for RESTful servicesJSON-LD for RESTful services
JSON-LD for RESTful servicesMarkus Lanthaler
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1Satishbabu Gunukula
 
Performance Tuning Using oratop
Performance Tuning Using oratop Performance Tuning Using oratop
Performance Tuning Using oratop Sandesh Rao
 
Managed disk-Azure Storage Evolution
Managed disk-Azure Storage EvolutionManaged disk-Azure Storage Evolution
Managed disk-Azure Storage EvolutionSiraj Mohammad
 

La actualidad más candente (20)

JSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebJSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social Web
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackApache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and Hydra
 
DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트:: AWS Summit Online Korea 2020
DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트::  AWS Summit Online Korea 2020DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트::  AWS Summit Online Korea 2020
DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트:: AWS Summit Online Korea 2020
 
Backup and archiving in the aws cloud
Backup and archiving in the aws cloudBackup and archiving in the aws cloud
Backup and archiving in the aws cloud
 
Presentation implementing oracle asm successfully
Presentation    implementing oracle asm successfullyPresentation    implementing oracle asm successfully
Presentation implementing oracle asm successfully
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Oracle Cloud is Best for Oracle Database - High Availability
Oracle Cloud is Best for Oracle Database - High AvailabilityOracle Cloud is Best for Oracle Database - High Availability
Oracle Cloud is Best for Oracle Database - High Availability
 
An Introduction to Map/Reduce with MongoDB
An Introduction to Map/Reduce with MongoDBAn Introduction to Map/Reduce with MongoDB
An Introduction to Map/Reduce with MongoDB
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
 
Query store查詢調校新利器
Query store查詢調校新利器Query store查詢調校新利器
Query store查詢調校新利器
 
Cagando Datos con APEX_DATA_PARSER
Cagando Datos con APEX_DATA_PARSERCagando Datos con APEX_DATA_PARSER
Cagando Datos con APEX_DATA_PARSER
 
4차 산업혁명 시대의 제조업 혁신을 위한 Data Lake 고객 사례::구태훈, 최삼락, Tony Spagnuolo::AWS Summit ...
4차 산업혁명 시대의 제조업 혁신을 위한 Data Lake 고객 사례::구태훈, 최삼락, Tony Spagnuolo::AWS Summit ...4차 산업혁명 시대의 제조업 혁신을 위한 Data Lake 고객 사례::구태훈, 최삼락, Tony Spagnuolo::AWS Summit ...
4차 산업혁명 시대의 제조업 혁신을 위한 Data Lake 고객 사례::구태훈, 최삼락, Tony Spagnuolo::AWS Summit ...
 
JSON-LD for RESTful services
JSON-LD for RESTful servicesJSON-LD for RESTful services
JSON-LD for RESTful services
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1
 
Performance Tuning Using oratop
Performance Tuning Using oratop Performance Tuning Using oratop
Performance Tuning Using oratop
 
Managed disk-Azure Storage Evolution
Managed disk-Azure Storage EvolutionManaged disk-Azure Storage Evolution
Managed disk-Azure Storage Evolution
 

Destacado

Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...Dana Todd
 
Discovery hub : an exploratory search engine on the top of DBpedia
Discovery hub : an exploratory search engine on the top of DBpediaDiscovery hub : an exploratory search engine on the top of DBpedia
Discovery hub : an exploratory search engine on the top of DBpediaNicolas MARIE
 
Web search-metrics-tutorial-www2010-section-5of7-discovery
Web search-metrics-tutorial-www2010-section-5of7-discoveryWeb search-metrics-tutorial-www2010-section-5of7-discovery
Web search-metrics-tutorial-www2010-section-5of7-discoveryAli Dasdan
 
OseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platformOseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platform@CULT Srl
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 

Destacado (6)

Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
Keywords, Keywords, Everywhere! Discovery, Analysis, and Strategies in Keywor...
 
Discovery hub : an exploratory search engine on the top of DBpedia
Discovery hub : an exploratory search engine on the top of DBpediaDiscovery hub : an exploratory search engine on the top of DBpedia
Discovery hub : an exploratory search engine on the top of DBpedia
 
Web search-metrics-tutorial-www2010-section-5of7-discovery
Web search-metrics-tutorial-www2010-section-5of7-discoveryWeb search-metrics-tutorial-www2010-section-5of7-discovery
Web search-metrics-tutorial-www2010-section-5of7-discovery
 
OseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platformOseeGenius - Semantic search engine and discovery platform
OseeGenius - Semantic search engine and discovery platform
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 

Similar a Document relations - Berlin Buzzwords 2013

Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshopMathieu Elie
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearchsirensolutions
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearchmartijnvg
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachSymfonyMu
 
Linked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsLinked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsJay Myers
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBHector Correa
 
The googlization of search 2014
The googlization of search 2014The googlization of search 2014
The googlization of search 2014nabot
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyondErnesto Reig
 
Intro to Angular.JS Directives
Intro to Angular.JS DirectivesIntro to Angular.JS Directives
Intro to Angular.JS DirectivesChristian Lilley
 
20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdfMichal Miklas
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineLucidworks
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2Rafał Kuć
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchSematext Group, Inc.
 
NoSQL Now 2013 Presentation
NoSQL Now 2013 PresentationNoSQL Now 2013 Presentation
NoSQL Now 2013 PresentationArjen Schoneveld
 
Mongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeMongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeSpyros Passas
 

Similar a Document relations - Berlin Buzzwords 2013 (20)

Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshop
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearch
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approach
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Linked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsLinked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI Mpls
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDB
 
The googlization of search 2014
The googlization of search 2014The googlization of search 2014
The googlization of search 2014
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyond
 
Intro to Angular.JS Directives
Intro to Angular.JS DirectivesIntro to Angular.JS Directives
Intro to Angular.JS Directives
 
20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search Engine
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
 
NoSQL Now 2013 Presentation
NoSQL Now 2013 PresentationNoSQL Now 2013 Presentation
NoSQL Now 2013 Presentation
 
Mongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeMongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappe
 
Ias2010
Ias2010Ias2010
Ias2010
 
CD-LOR SRU Tool
CD-LOR SRU ToolCD-LOR SRU Tool
CD-LOR SRU Tool
 

Último

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Document relations - Berlin Buzzwords 2013

  • 1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Martijn van Groningen mvg@apache.org @mvgroningen Document relations Wednesday, June 5, 13
  • 2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Topics • Background • Parent / child support • Nested support • Future developments Wednesday, June 5, 13
  • 3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Wednesday, June 5, 13
  • 4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Query Local join Wednesday, June 5, 13
  • 5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background • We need more capacity. • But how to divide the relational data? Wednesday, June 5, 13
  • 6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Q uery sub-queries Wednesday, June 5, 13
  • 7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Query sub-query De-normalized document Wednesday, June 5, 13
  • 8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Wednesday, June 5, 13
  • 9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Query sub-query C local joinlocal join Wednesday, June 5, 13
  • 10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background • Dealing with relations either pay the price on write time or read time. • Alternatively documents relations can balance the costs between read and write time. For example: one join to reduce duplicated data. • Supporting “many-to-many” joins in a distributed system is difficult. Either unbalanced partitions or very expensive join. Wednesday, June 5, 13
  • 11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited The query time join Parent child Wednesday, June 5, 13
  • 12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child • Parent / child is a query time join between different document types in the same index. • Parent and children documents are stored as separate documents in the same index. • Child documents can point to only one parent. • Parent documents can be referred by multiple child documents. • Also a parent document can be a child document of a different parent. Wednesday, June 5, 13
  • 13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child • A parent document and its children documents are routed into the same shard. • Parent id is used as routing value. • In combination with a parent ids in memory data structure the parent-child join is fast. • Use warmer api to preload it! • Parent ids data structure size has significantly been reduced in version 0.90.1 Wednesday, June 5, 13
  • 14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - Indexing • The parent document doesn’t need to exist at time of indexing. curl -XPUT 'localhost:9200/products' -d '{   "mappings" : {      "offer" : {         "_parent" : { "type" : "product" }      }   } }' A offer document is a parent of a product document curl -XPUT 'localhost:9200/products/offer/12?parent=p2345' -d '{ "valid_from" : "2013-05-01", "valid_to" : "2013-10-01", "price" : 26.87, }' Then when indexing mention to what product a offer points to. Wednesday, June 5, 13
  • 15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - Querying • The has_child query returns parent documents based on matches in its child documents. • The optional “score_mode” defines how child hits are mapped to its parent document. curl -XGET 'localhost:9200/products/_search' -d '{ "query" : {       "has_child" : {          "type" : "offer", " "query" : {             "range" : {                "price" : { "lte" : 50                }             }        }     }   } }' Wednesday, June 5, 13
  • 16. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited The index time join Nested objects Wednesday, June 5, 13
  • 17. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects • In many cases domain models have the same write / update live-cycle. • Books & Chapters. • Movies & Actors. • De-normalizing results in the fastest queries. • Compared to using parent/child queries. • Nested objects allow smart de-normalization. Wednesday, June 5, 13
  • 18. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "authors" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapter_1_title" : "Introduction", "chapter_1_summary" : "Short introduction about Elasticsearch’s features ...", "chapter_1_number_of_pages" : 12, "chapter_2_title" : "Data in, Data out", "chapter_2_summary" : "How to manage your data with Elasticsearch ...", "chapter_2_number_of_pages" : 39, ... } Wednesday, June 5, 13
  • 19. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "authors" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapter_1_title" : "Introduction", "chapter_1_summary" : "Short introduction about Elasticsearch’s features ...", "chapter_1_number_of_pages" : 12, "chapter_2_title" : "Data in, Data out", "chapter_2_summary" : "How to manage your data with Elasticsearch ...", "chapter_2_number_of_pages" : 39, ... } Too verbose! Wednesday, June 5, 13
  • 20. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "author" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapters" : [ { "title" : "Introduction", "summary" : "Short introduction about Elasticsearch’s features ...", "number_of_pages" : 12 }, { "title" : "Data in, Data out", "summary" : "How to manage your data with Elasticsearch ...", "number_of_pages" : 39 }, ... ] } • JSON allows complex nesting of objects. • But how does this get indexed? Wednesday, June 5, 13
  • 21. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", ... "chapters" : [ {"title" : "Introduction", "summary" : "Short ...", "number_of_pages" : 12}, {"title" : "Data in, ...", "summary" : "How to ...", "number_of_pages" : 39}, ... ] } { "title" : "Elasticsearch", ... "chapters.title" : ["Data in, Data out", "Introduction"], "chapters.summary" : ["How to ...", "Short ..."], "chapters.number_of_pages" : [12, 39] } Original json document: Lucene Document Structure: Wednesday, June 5, 13
  • 22. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Mapping • The nested type triggers Lucene’s block indexing. • Multiple levels of inner objects is possible. curl -XPUT 'localhost:9200/books' -d '{ "mappings" : { "book" : { "properties" : { "chapters" : { "type" : "nested" } } } } }' Document type Field type: ‘nested’ Wednesday, June 5, 13
  • 23. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Block indexing {"chapters.title" : "Into...", "chapters.summary" : "...", "chapters.number_of_pages" : 12}, {"chapters.title" : "Data...", "chapters.summary" : "...", "chapters.number_of_pages" : 39}, ... { "title" : "Elasticsearch", ... } Lucene Documents Structure: • Inlining the inner objects as separate Lucene documents right before the root document. • The root document and its nested documents always remain in the same block. Wednesday, June 5, 13
  • 24. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Nested query • Nested query returns the complete “book” as hit. (root document) curl -XGET 'localhost:9200/books/book/_search' -d '{   "query" : {      "nested" : {          "path" : "chapters",          "score_mode" : "avg", " "query" : {             "match" : {                "chapters.summary" : {                   "query" : "indexing data"                }             }          }" "      }   } }' Specify the nested level. Chapter level query score mode Wednesday, June 5, 13
  • 25. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects X X X X X root documents bitset: Nested Lucene document, that match with the inner query. Aggregate nested scores and push to root document. X Set bit, that represents a root document. Wednesday, June 5, 13
  • 26. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited But first questions! Extra slides Wednesday, June 5, 13
  • 27. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Nested sorting curl -XGET 'localhost:9200/books/book/_search' -d '{  "query" : {   "match" : { "summary" : { "query" : "guide" } }        }, "sort" : [ { "chapters.number_of_pages" : { "sort_mode" : "avg", "nested_filter" : { "range" : { "chapters.number_of_pages" : {"lte" : 15} } } } } ] }' Sort mode Wednesday, June 5, 13
  • 28. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - sorting • Parent/child sorting isn’t possible at the moment. • But there is a “custom_score” query work around. • Downsides: • Forces to execute a script for each matching document. • The child sort value is converted into a float value. "has_child" : { "type" : "offer", "score_mode" : "avg", "query" : { "custom_score" : { "query" : { ... }, "script" : "doc["price"].value" } } } Wednesday, June 5, 13