SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
www.elastic.co
Apache Lucene
Adrien Grand
1
www.elastic.co
What is Lucene?
• An information retrieval library
• Can be used to build search apps
• Not a runtine, use Solr or Elasticsearch
• Written in Java
• Developed at the Apache Software Foundation
• Contributors include IBM, Twitter, Elastic,
Lucidworks, …
2
www.elastic.co
History
3
1999: creation on
Sourceforge
2001: moved to
the ASF
May 2006
2.0 release
November 2009
3.0 release
October 2012
4.0 release
February 2015
5.0 release
www.elastic.co
Activity
4
source
https://www.openhub.net/p/lucene
www.elastic.co
Features
• Full-text search
• Structured search
• Highlighting
• Faceting
• Suggestions
5
www.elastic.co
Design
• Embeds
• an inverted index, for efficient query execution
• a document store, to get original data back
• a column store, for sorting and analytics
6
www.elastic.co
More history
• Lucene 3.4 Added a faceting module
• Lucene 4.0: Added a column store to the index
• Lucene 4.1: More efficient structured search
• Lucene 4.1: More efficient PK lookups
• Lucene 4.1: Built-in compression of the doc store
• Lucene 4.5: Column store moved from memory to disk
• Lucene 4.8: Checksums on all index files
• Lucene 5.1: Better query execution plans with 2-phases iterators
7
www.elastic.co
Design
8
Segment core
0
name: Breizh camp
location: Rennes, France
1
name: Devoxx
location: Antwerp, Belgium
Document store
doc id stored fields
breizh
camp
conference
devoxx
1
1
2
1
0
0
0,1
1
Inverted index
terms dict doc
freq
postings
Column store
0
1
42
1242
0
1
1000
10
Price Popularity
Live docs
0
1
true
true
www.elastic.co
Design
• Index divided into immutable segments
• To add more documents, add more segments
• In-place updates are not supported
• To update documents, delete then add
9
www.elastic.co
Merging
• Background merges
• Keep the number of segments low for fast search
• Reclaim space from deleted documents
10
www.elastic.co
Merging
• Writing/Merging segments is expensive
• IndexWriter buffers pending docs in memory
• Refresh/Reopen:
• Flush in-memory buffer into a segment
• Make segment searchable
• Commit
• Flush in-memory buffer to a segment
• “fsync” data to disk
11
www.elastic.co
Index safety
12
Only data which has been committed is safe.
If you need better safety, write the data
somewhere else too: other database,
transaction log, …
www.elastic.co
Advices
• Don’t give all machine memory to Java
• Performance factor #1 is the filesystem cache
• Reopen asynchronously, typically every X seconds
• Batch writes before committing
13
www.elastic.co
Pros/cons
• Fast search
• Cross-field index intersections
• On the contrary to many
databases!
• Powerful combinations of
features
• Run facets on docs that
match a particular query
14
• Not realtime
• Yet “near” realtime
• No fine-grained updates
• Ingestion speed
• Yet fast enough for most
use-cases
• Disk usage: data is duplicated
for each access pattern
www.elastic.co
Backward compatibility
• Version N can read indices of version N-1
• Public API: minor versions are backward compatible
• IndexWriter, IndexSearcher, Query, Document, …
• Unless we discover API is trappy
• Internal/Experimental APIs will break
• Collector, Scorer, Comparator, …
15
www.elastic.co
SimpleText
16
IndexWriterConfig iwConfig = new IndexWriterConfig(new WhitespaceAnalyzer());
iwConfig.setCodec(new SimpleTextCodec());
try (Directory dir = FSDirectory.open(new File("/tmp/my_index").toPath());
IndexWriter writer = new IndexWriter(dir, iwConfig)) {
Document document = new Document();
document.add(new TextField("name", "Breizh C@mp", Store.YES));
document.add(new TextField("desc", "la conférence des développeurs du Grand Ouest", Store.NO));
document.add(new StoredField("location", "Rennes, France"));
document.add(new NumericDocValuesField("founded_year", 2011));
writer.addDocument(document);
document = new Document();
document.add(new TextField("name", "Devoxx France", Store.YES));
document.add(new TextField("desc", "la conférence des développeurs passionnés", Store.NO));
document.add(new StoredField("location", "Paris, France"));
document.add(new NumericDocValuesField("founded_year", 2012));
writer.addDocument(document);
writer.commit();
document = new Document();
document.add(new TextField("name", "Riviera DEV", Store.YES));
document.add(new TextField("desc", "la conférence des développeurs du Sud Est", Store.NO));
document.add(new StoredField("location", "Sophia-Antipolis, France"));
document.add(new NumericDocValuesField("founded_year", 2009));
writer.addDocument(document);
writer.commit();
}
www.elastic.co
17
% ls /tmp/my_index
_0.scf
_0.si
_1.scf
_1.si
segments_2
www.elastic.co
18
% cat _0.si
version 6.0.0
number of documents 2
uses compound file true
diagnostics 8
key os
value Linux
key java.vendor
value Oracle Corporation
key java.version
value 1.8.0_25
key lucene.version
value 6.0.0
key os.arch
value amd64
key source
value flush
key os.version
value 3.13.0-53-generic
key timestamp
value 1434102490791
attributes 0
files 2
file _0.si
file _0.scf
id ??hFq? E?q??h??
checksum 00000000001526513595
www.elastic.co
19
% cat _0.scf
cfs entry for: _0.dat
field founded_year
type NUMERIC
minvalue 2011
pattern 0
0
T
1
T
END
checksum 00000000003242224815
[…]
www.elastic.co
20
cfs entry for: _0.fld
doc 0
field 0
name name
type string
value Breizh C@mp
field 2
name location
type string
value Rennes, France
doc 1
field 0
name name
type string
value Devoxx France
field 2
name location
type string
value Paris, France
END
checksum 00000000002801255432
www.elastic.co
21
cfs entry for: _0.pst
field desc
term Grand
doc 0
freq 1
pos 5
term Ouest
doc 0
freq 1
pos 6
term conférence
doc 0
freq 1
pos 1
doc 1
freq 1
pos 1
[…]
END
checksum 00000000002149012390
www.elastic.co
22
Thank you!
@jpountz

Más contenido relacionado

La actualidad más candente

Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneSwapnil & Patil
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache luceneShrikrishna Parab
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetuprcmuir
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...Lucidworks
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 

La actualidad más candente (20)

Lucene
LuceneLucene
Lucene
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
 
Search Lucene
Search LuceneSearch Lucene
Search Lucene
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Flexible Indexing in Lucene 4.0
Flexible Indexing in Lucene 4.0Flexible Indexing in Lucene 4.0
Flexible Indexing in Lucene 4.0
 
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
 
Azure search
Azure searchAzure search
Azure search
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 

Similar a Apache Lucene intro - Breizhcamp 2015

06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearchErhwen Kuo
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIOliver Busse
 
Utilizing the open ntf domino api
Utilizing the open ntf domino apiUtilizing the open ntf domino api
Utilizing the open ntf domino apiOliver Busse
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIOliver Busse
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Academy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storageAcademy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storageBinary Studio
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Domingo Suarez Torres
 
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEONuxeo
 
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...buildacloud
 
Open writing-cloud-collab
Open writing-cloud-collabOpen writing-cloud-collab
Open writing-cloud-collabKaren Vuong
 
UKLUG 2012 - XPages, Beyond the basics
UKLUG 2012 - XPages, Beyond the basicsUKLUG 2012 - XPages, Beyond the basics
UKLUG 2012 - XPages, Beyond the basicsUlrich Krause
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the BasicsUlrich Krause
 
XPages -Beyond the Basics
XPages -Beyond the BasicsXPages -Beyond the Basics
XPages -Beyond the BasicsUlrich Krause
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebPascal-Nicolas Becker
 

Similar a Apache Lucene intro - Breizhcamp 2015 (20)

06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 
Utilizing the open ntf domino api
Utilizing the open ntf domino apiUtilizing the open ntf domino api
Utilizing the open ntf domino api
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Academy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storageAcademy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storage
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Wikipedia Cloud Search Webinar
Wikipedia Cloud Search WebinarWikipedia Cloud Search Webinar
Wikipedia Cloud Search Webinar
 
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
 
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
 
Open writing-cloud-collab
Open writing-cloud-collabOpen writing-cloud-collab
Open writing-cloud-collab
 
UKLUG 2012 - XPages, Beyond the basics
UKLUG 2012 - XPages, Beyond the basicsUKLUG 2012 - XPages, Beyond the basics
UKLUG 2012 - XPages, Beyond the basics
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics
 
XPages -Beyond the Basics
XPages -Beyond the BasicsXPages -Beyond the Basics
XPages -Beyond the Basics
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
 

Último

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Último (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Apache Lucene intro - Breizhcamp 2015

  • 2. www.elastic.co What is Lucene? • An information retrieval library • Can be used to build search apps • Not a runtine, use Solr or Elasticsearch • Written in Java • Developed at the Apache Software Foundation • Contributors include IBM, Twitter, Elastic, Lucidworks, … 2
  • 3. www.elastic.co History 3 1999: creation on Sourceforge 2001: moved to the ASF May 2006 2.0 release November 2009 3.0 release October 2012 4.0 release February 2015 5.0 release
  • 5. www.elastic.co Features • Full-text search • Structured search • Highlighting • Faceting • Suggestions 5
  • 6. www.elastic.co Design • Embeds • an inverted index, for efficient query execution • a document store, to get original data back • a column store, for sorting and analytics 6
  • 7. www.elastic.co More history • Lucene 3.4 Added a faceting module • Lucene 4.0: Added a column store to the index • Lucene 4.1: More efficient structured search • Lucene 4.1: More efficient PK lookups • Lucene 4.1: Built-in compression of the doc store • Lucene 4.5: Column store moved from memory to disk • Lucene 4.8: Checksums on all index files • Lucene 5.1: Better query execution plans with 2-phases iterators 7
  • 8. www.elastic.co Design 8 Segment core 0 name: Breizh camp location: Rennes, France 1 name: Devoxx location: Antwerp, Belgium Document store doc id stored fields breizh camp conference devoxx 1 1 2 1 0 0 0,1 1 Inverted index terms dict doc freq postings Column store 0 1 42 1242 0 1 1000 10 Price Popularity Live docs 0 1 true true
  • 9. www.elastic.co Design • Index divided into immutable segments • To add more documents, add more segments • In-place updates are not supported • To update documents, delete then add 9
  • 10. www.elastic.co Merging • Background merges • Keep the number of segments low for fast search • Reclaim space from deleted documents 10
  • 11. www.elastic.co Merging • Writing/Merging segments is expensive • IndexWriter buffers pending docs in memory • Refresh/Reopen: • Flush in-memory buffer into a segment • Make segment searchable • Commit • Flush in-memory buffer to a segment • “fsync” data to disk 11
  • 12. www.elastic.co Index safety 12 Only data which has been committed is safe. If you need better safety, write the data somewhere else too: other database, transaction log, …
  • 13. www.elastic.co Advices • Don’t give all machine memory to Java • Performance factor #1 is the filesystem cache • Reopen asynchronously, typically every X seconds • Batch writes before committing 13
  • 14. www.elastic.co Pros/cons • Fast search • Cross-field index intersections • On the contrary to many databases! • Powerful combinations of features • Run facets on docs that match a particular query 14 • Not realtime • Yet “near” realtime • No fine-grained updates • Ingestion speed • Yet fast enough for most use-cases • Disk usage: data is duplicated for each access pattern
  • 15. www.elastic.co Backward compatibility • Version N can read indices of version N-1 • Public API: minor versions are backward compatible • IndexWriter, IndexSearcher, Query, Document, … • Unless we discover API is trappy • Internal/Experimental APIs will break • Collector, Scorer, Comparator, … 15
  • 16. www.elastic.co SimpleText 16 IndexWriterConfig iwConfig = new IndexWriterConfig(new WhitespaceAnalyzer()); iwConfig.setCodec(new SimpleTextCodec()); try (Directory dir = FSDirectory.open(new File("/tmp/my_index").toPath()); IndexWriter writer = new IndexWriter(dir, iwConfig)) { Document document = new Document(); document.add(new TextField("name", "Breizh C@mp", Store.YES)); document.add(new TextField("desc", "la conférence des développeurs du Grand Ouest", Store.NO)); document.add(new StoredField("location", "Rennes, France")); document.add(new NumericDocValuesField("founded_year", 2011)); writer.addDocument(document); document = new Document(); document.add(new TextField("name", "Devoxx France", Store.YES)); document.add(new TextField("desc", "la conférence des développeurs passionnés", Store.NO)); document.add(new StoredField("location", "Paris, France")); document.add(new NumericDocValuesField("founded_year", 2012)); writer.addDocument(document); writer.commit(); document = new Document(); document.add(new TextField("name", "Riviera DEV", Store.YES)); document.add(new TextField("desc", "la conférence des développeurs du Sud Est", Store.NO)); document.add(new StoredField("location", "Sophia-Antipolis, France")); document.add(new NumericDocValuesField("founded_year", 2009)); writer.addDocument(document); writer.commit(); }
  • 18. www.elastic.co 18 % cat _0.si version 6.0.0 number of documents 2 uses compound file true diagnostics 8 key os value Linux key java.vendor value Oracle Corporation key java.version value 1.8.0_25 key lucene.version value 6.0.0 key os.arch value amd64 key source value flush key os.version value 3.13.0-53-generic key timestamp value 1434102490791 attributes 0 files 2 file _0.si file _0.scf id ??hFq? E?q??h?? checksum 00000000001526513595
  • 19. www.elastic.co 19 % cat _0.scf cfs entry for: _0.dat field founded_year type NUMERIC minvalue 2011 pattern 0 0 T 1 T END checksum 00000000003242224815 […]
  • 20. www.elastic.co 20 cfs entry for: _0.fld doc 0 field 0 name name type string value Breizh C@mp field 2 name location type string value Rennes, France doc 1 field 0 name name type string value Devoxx France field 2 name location type string value Paris, France END checksum 00000000002801255432
  • 21. www.elastic.co 21 cfs entry for: _0.pst field desc term Grand doc 0 freq 1 pos 5 term Ouest doc 0 freq 1 pos 6 term conférence doc 0 freq 1 pos 1 doc 1 freq 1 pos 1 […] END checksum 00000000002149012390