SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Flexible search in Apache 
Jackrabbit Oak 
Tommaso Teofili
Apache Jackrabbit Oak 
• Scalable content repository 
• JCR 2.0 
• Designed for concurrent access (MVCC) 
• Pluggable components (storage, indexes) 
• Powering AEM 6.0 
18/11/14 
2
Oak Architecture 
• Oak-JCR 
• Oak-Core 
– MVCC (node states and immutable trees) 
– Core components (Security, Query engine, …) 
– Plugins 
• Oak-MK 
– Pluggable storage 
18/11/14 
3
Oak – the Query Engine 
• Query languages 
– XPATH 
– SQL-2 
• Selects the index(es) supposed to perform 
better 
– Search is demanded to the underlying indexes 
– No index? The repository is traversed 
• ACLs applied afterwards 
18/11/14 
4
Indexing – the IndexEditor API 
• NodeState before = builder.getNodeState(); 
• builder.child(”a").setProperty(”foo", ”bar"); 
• NodeState after = builder.getNodeState(); 
• NodeState indexed = 
editorHook.processCommit(before, after, 
…); // who said MVCC? 
18/11/14 
5
Searching – the QueryIndex API 
• Filter filter = … ; // "select * from [nt:folder]" 
• filter.restrictPath("/somenode", 
Filter.PathRestriction.DIRECT_CHILDREN); 
• Cursor cursor = queryIndex.query(filter, 
nodeState); // search against a state 
• IndexRow row = cursor.next(); // results 
18/11/14 
6
Searching – Filters 
• Full text expressions 
• Property restrictions 
• Path restrictions 
– Exact 
– Parent 
– Child 
– Descendant 
• Node type restrictions 
18/11/14 
7
Configuring indexes 
• Indexes are declared by adding “query 
index configuration” nodes in the repository 
– Type 
– Asynchronous 
– Reindex 
– Index specific properties 
18/11/14 
8
In repository indexes 
• Data structures designed as content 
– Property index 
– Ordered property index 
– Node type index 
– Reference index 
18/11/14 
9
Lucene index 
• Full text and (sorted) property restrictions 
• Stored in repository 
• Tika for indexing binaries 
• Configurable indexing rules (boost), codec, 
analyzers 
19/11/14 
10
Lucene index 
• Interesting facts 
– DocValues for sorted property restrictions 
– Uncompressed stored fields 
– Property exists queries 
• TermRange vs Wildcard vs Term vs MatchAll 
+FieldExistsFilter 
19/11/14 
11
Solr index 
• Full text, property, path restrictions 
• Embedded or remote Solr(Cloud) 
• Configurable 
– Mapping restriction / fields 
– Page size 
– Commit policy 
• Most is configured on the Solr side 
18/11/14 
12
Problems 
• Hard to express complex queries 
• Cannot leverage underlying indexes 
advanced capabilities 
18/11/14 
13
Native language support 
• Leverage underlying index capabilities 
– Multiple query languages/parsers 
• More accurate full text queries (and results) 
– … where native(’lucene', 'name:(hello world) 
“hello world”^3') 
• Advanced index capabilities (e.g. MLT) 
– … where native('solr', 'mlt?q=path:/content/ 
sample1&mlt.fl=jcr:title') 
19/11/14 
14
Adding more indexes 
• Create an IndexEditor 
– Turn diff into an “indexable” 
• Create a QueryIndex 
– Turn a Filter into an index-specific query 
• “Declare” the index 
18/11/14 
15
Looking forward 
• Results aggregation features (e.g. facets) 
• More configuration options (Lucene, Solr) 
• Smarter index selection 
• Cover indexes 
18/11/14 
16
Thanks

Más contenido relacionado

La actualidad más candente

Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solrsagar chaturvedi
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Taking eZ Find beyond full-text search
Taking eZ Find beyond  full-text searchTaking eZ Find beyond  full-text search
Taking eZ Find beyond full-text searchPaul Borgermans
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!Paul Borgermans
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 

La actualidad más candente (20)

Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Taking eZ Find beyond full-text search
Taking eZ Find beyond  full-text searchTaking eZ Find beyond  full-text search
Taking eZ Find beyond full-text search
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 

Destacado

Magnolia CMS 5.0 - Architecture
Magnolia CMS 5.0 - ArchitectureMagnolia CMS 5.0 - Architecture
Magnolia CMS 5.0 - ArchitecturePhilipp Bärfuss
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in SlingTommaso Teofili
 
Spring and Web Content Management
Spring and Web Content ManagementSpring and Web Content Management
Spring and Web Content ManagementZak Greant
 
Demystifying Oak Search
Demystifying Oak SearchDemystifying Oak Search
Demystifying Oak SearchJustin Edelson
 
RESTful Web Applications with Apache Sling
RESTful Web Applications with Apache SlingRESTful Web Applications with Apache Sling
RESTful Web Applications with Apache SlingBertrand Delacretaz
 
Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Jukka Zitting
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache JackrabbitJukka Zitting
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
 

Destacado (10)

Oak Lucene Indexes
Oak Lucene IndexesOak Lucene Indexes
Oak Lucene Indexes
 
Magnolia CMS 5.0 - Architecture
Magnolia CMS 5.0 - ArchitectureMagnolia CMS 5.0 - Architecture
Magnolia CMS 5.0 - Architecture
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 
Apache Jackrabbit
Apache JackrabbitApache Jackrabbit
Apache Jackrabbit
 
Spring and Web Content Management
Spring and Web Content ManagementSpring and Web Content Management
Spring and Web Content Management
 
Demystifying Oak Search
Demystifying Oak SearchDemystifying Oak Search
Demystifying Oak Search
 
RESTful Web Applications with Apache Sling
RESTful Web Applications with Apache SlingRESTful Web Applications with Apache Sling
RESTful Web Applications with Apache Sling
 
Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache Jackrabbit
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
 

Similar a Flexible search in Apache Jackrabbit Oak

BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchNetConstructor, Inc.
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
An Introduction To Oracle Database
An Introduction To Oracle DatabaseAn Introduction To Oracle Database
An Introduction To Oracle DatabaseMeysam Javadi
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentAlkacon Software GmbH & Co. KG
 
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataFabian Hueske
 
ModeShape 3 overview
ModeShape 3 overviewModeShape 3 overview
ModeShape 3 overviewRandall Hauch
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
hibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfhibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfPatiento Del Mar
 
SQL Queries on Smalltalk Objects
SQL Queries on Smalltalk ObjectsSQL Queries on Smalltalk Objects
SQL Queries on Smalltalk ObjectsESUG
 
XML Amsterdam - Creating structure in unstructured data
XML Amsterdam - Creating structure in unstructured dataXML Amsterdam - Creating structure in unstructured data
XML Amsterdam - Creating structure in unstructured dataMarco Gralike
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
PLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationPLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationAlfresco Software
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Spark Summit
 
Intro to GemStone/S
Intro to GemStone/SIntro to GemStone/S
Intro to GemStone/SESUG
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 

Similar a Flexible search in Apache Jackrabbit Oak (20)

Solr
SolrSolr
Solr
 
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
An Introduction To Oracle Database
An Introduction To Oracle DatabaseAn Introduction To Oracle Database
An Introduction To Oracle Database
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary data
 
ModeShape 3 overview
ModeShape 3 overviewModeShape 3 overview
ModeShape 3 overview
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
hibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfhibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdf
 
SQL Queries on Smalltalk Objects
SQL Queries on Smalltalk ObjectsSQL Queries on Smalltalk Objects
SQL Queries on Smalltalk Objects
 
XML Amsterdam - Creating structure in unstructured data
XML Amsterdam - Creating structure in unstructured dataXML Amsterdam - Creating structure in unstructured data
XML Amsterdam - Creating structure in unstructured data
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Oracle by Muhammad Iqbal
Oracle by Muhammad IqbalOracle by Muhammad Iqbal
Oracle by Muhammad Iqbal
 
PLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationPLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR Integration
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 
Intro to GemStone/S
Intro to GemStone/SIntro to GemStone/S
Intro to GemStone/S
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 

Más de Tommaso Teofili

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRTommaso Teofili
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache HamaTommaso Teofili
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiTommaso Teofili
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaTommaso Teofili
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in SolrTommaso Teofili
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on codeTommaso Teofili
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA IntroductionTommaso Teofili
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU TourTommaso Teofili
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationTommaso Teofili
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic SearchTommaso Teofili
 

Más de Tommaso Teofili (14)

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IR
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGi
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and Clerezza
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on code
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU Tour
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata Generation
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic Search
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Flexible search in Apache Jackrabbit Oak

  • 1. Flexible search in Apache Jackrabbit Oak Tommaso Teofili
  • 2. Apache Jackrabbit Oak • Scalable content repository • JCR 2.0 • Designed for concurrent access (MVCC) • Pluggable components (storage, indexes) • Powering AEM 6.0 18/11/14 2
  • 3. Oak Architecture • Oak-JCR • Oak-Core – MVCC (node states and immutable trees) – Core components (Security, Query engine, …) – Plugins • Oak-MK – Pluggable storage 18/11/14 3
  • 4. Oak – the Query Engine • Query languages – XPATH – SQL-2 • Selects the index(es) supposed to perform better – Search is demanded to the underlying indexes – No index? The repository is traversed • ACLs applied afterwards 18/11/14 4
  • 5. Indexing – the IndexEditor API • NodeState before = builder.getNodeState(); • builder.child(”a").setProperty(”foo", ”bar"); • NodeState after = builder.getNodeState(); • NodeState indexed = editorHook.processCommit(before, after, …); // who said MVCC? 18/11/14 5
  • 6. Searching – the QueryIndex API • Filter filter = … ; // "select * from [nt:folder]" • filter.restrictPath("/somenode", Filter.PathRestriction.DIRECT_CHILDREN); • Cursor cursor = queryIndex.query(filter, nodeState); // search against a state • IndexRow row = cursor.next(); // results 18/11/14 6
  • 7. Searching – Filters • Full text expressions • Property restrictions • Path restrictions – Exact – Parent – Child – Descendant • Node type restrictions 18/11/14 7
  • 8. Configuring indexes • Indexes are declared by adding “query index configuration” nodes in the repository – Type – Asynchronous – Reindex – Index specific properties 18/11/14 8
  • 9. In repository indexes • Data structures designed as content – Property index – Ordered property index – Node type index – Reference index 18/11/14 9
  • 10. Lucene index • Full text and (sorted) property restrictions • Stored in repository • Tika for indexing binaries • Configurable indexing rules (boost), codec, analyzers 19/11/14 10
  • 11. Lucene index • Interesting facts – DocValues for sorted property restrictions – Uncompressed stored fields – Property exists queries • TermRange vs Wildcard vs Term vs MatchAll +FieldExistsFilter 19/11/14 11
  • 12. Solr index • Full text, property, path restrictions • Embedded or remote Solr(Cloud) • Configurable – Mapping restriction / fields – Page size – Commit policy • Most is configured on the Solr side 18/11/14 12
  • 13. Problems • Hard to express complex queries • Cannot leverage underlying indexes advanced capabilities 18/11/14 13
  • 14. Native language support • Leverage underlying index capabilities – Multiple query languages/parsers • More accurate full text queries (and results) – … where native(’lucene', 'name:(hello world) “hello world”^3') • Advanced index capabilities (e.g. MLT) – … where native('solr', 'mlt?q=path:/content/ sample1&mlt.fl=jcr:title') 19/11/14 14
  • 15. Adding more indexes • Create an IndexEditor – Turn diff into an “indexable” • Create a QueryIndex – Turn a Filter into an index-specific query • “Declare” the index 18/11/14 15
  • 16. Looking forward • Results aggregation features (e.g. facets) • More configuration options (Lucene, Solr) • Smarter index selection • Cover indexes 18/11/14 16