SlideShare una empresa de Scribd logo
1 de 51
Faceted Search – the 120 Million Documents Story
Who am I? ,[object Object],[object Object],[object Object]
Who are Sourcesense? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Committers and Contributors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Who is the customer? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Their story? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Solution:
The Solution: Apache Solr ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How Solr Works ,[object Object],[object Object],[object Object]
How Solr Works ,[object Object],[object Object],[object Object]
How Solr Works ,[object Object],[object Object],[object Object],[object Object]
How Solr Works ,[object Object],[object Object],[object Object],[object Object],[object Object]
How Solr Works Index
How Solr Works Index Index Snapshot Active Index Reader Searches
How Solr Works Index Index Snapshot Active Index Reader Searches New Content Active  Index Writer
How Solr Works Index Index Snapshot Active Index Reader Searches New Content Active  Index Writer commit
How Solr Works Index Index Snapshot Index Snapshot Index Reader Active Index Reader Searches New Content Active  Index Writer
How Solr Works Index Index Snapshot Index Snapshot Index Reader Active Index Reader Searches New Content Active  Index Writer
How Solr Works Index Index Snapshot Index Reader Searches New Content Active  Index Writer
How Solr Distributes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solr Host Configuration shard 1 shard 2 shard   3 searches
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr at The Customer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Oops. OutOfMemoryError
Solr: a Java web application ,[object Object],[object Object],[object Object],[object Object]
How Solr Works Index Index Snapshot Searches New Content Active  Index Writer Active Index Reader
How Solr Works Index Index Snapshot Searches New Content Active  Index Writer cache Active Index Reader
How Solr Works Index Index Snapshot Index Snapshot Index Reader Searches New Content cache Active Index Reader cache commit Active  Index Writer
How Solr Works Index Index Snapshot Index Snapshot Index Reader Searches New Content Active  Index Writer cache Active Index Reader cache
How Solr Works Index Index Snapshot Index Reader Searches New Content Active  Index Writer cache
Optimisation #1: autowarm < listener   event = &quot;newSearcher&quot;   class = &quot;solr.QuerySenderListener&quot; > < arr   name = &quot;queries&quot; > < lst >   < str   name = &quot;q&quot; > solr </ str >   < str   name = &quot;relf&quot; > 4 </ str > < str   name = &quot;facet.field&quot; > sourceCountryCS </ str > < str   name = &quot;facet.field&quot; > entityCSPerson </ str >   < str   name = &quot;facet.field&quot; > entityCSCompany </ str >   < str   name = &quot;facet.field&quot; > entityCSProduct </ str > < str   name = &quot;facet.field&quot; > sourceCS </ str > < str   name = &quot;facet.field&quot; > authorCS </ str > < str   name = &quot;facet.field&quot; > stockTickerCS </ str > < str   name = &quot;facet.field&quot; > feedClassCS </ str > < str   name = &quot;facet.field&quot; > entityCSOrganization </ str > < str   name = &quot;facet.field&quot; > platformCS </ str > < str   name = &quot;facet.field&quot; > eventOrFactCS </ str > < str   name = &quot;facet.field&quot; > sourceRank </ str > < str   name = &quot;facet&quot; > true </ str > < str   name = &quot;facet.date&quot; > harvestDate </ str > < str   name = &quot;facet.date.start&quot; > NOW-1MONTH </ str > < str   name = &quot;facet.date.end&quot; > NOW </ str > < str   name = &quot;facet.date.gap&quot; > +24HOURS </ str > < str   name = &quot;qt&quot; > /duplicate </ str > < str   name = &quot;duplicateOrder&quot; > latest </ str > < str   name = &quot;collapseFields&quot; > duplicateGroup titleForDuplicates </ str > </ lst > </ arr > </ listener >
#2: Garbage collection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
#3: Profiling ,[object Object],[object Object],[object Object],[object Object],[object Object]
Managing So Many Hosts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator 35Gb 35Gb 35Gb
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator Entire row: 40 minutes
Content Archiving ,[object Object],[object Object],[object Object],[object Object]
Being Dynamic ,[object Object],[object Object],[object Object],[object Object]
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator shard 1 shard 2 shard   3 co-ordinator archive ingestion
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
thank you [email_address]

Más contenido relacionado

Destacado

Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world usesRogue Wave Software
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeRogue Wave Software
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Provectus
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to KazanProvectus
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesPeter
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 

Destacado (20)

Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world uses
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Hackathon
HackathonHackathon
Hackathon
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source code
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to Kazan
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build Sites
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 

Similar a Faceted Search – the 120 Million Documents Story

Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portaleby
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solrtomhill
 
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Yauheni Akhotnikau
 
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Jean-Paul Calbimonte
 
Searching the Now
Searching the NowSearching the Now
Searching the Nowlucasjosh
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Christopher Biow
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4Jjkumaranc
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4Jjkumaranc
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4Jjkumaranc
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4Jjkumaranc
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deploymentsOdoo
 
10reasons
10reasons10reasons
10reasonsLi Huan
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing ServicesLeigh Dodds
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoringMiguel Rodriguez
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheCris Holdorph
 
Sinatra and JSONQuery Web Service
Sinatra and JSONQuery Web ServiceSinatra and JSONQuery Web Service
Sinatra and JSONQuery Web Servicevvatikiotis
 

Similar a Faceted Search – the 120 Million Documents Story (20)

Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portal
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
 
RESTFul IDEAS
RESTFul IDEASRESTFul IDEAS
RESTFul IDEAS
 
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
 
Searching the Now
Searching the NowSearching the Now
Searching the Now
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4J
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4J
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4J
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4J
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deployments
 
Struts2
Struts2Struts2
Struts2
 
10reasons
10reasons10reasons
10reasons
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing Services
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
 
Web::Scraper
Web::ScraperWeb::Scraper
Web::Scraper
 
Sinatra and JSONQuery Web Service
Sinatra and JSONQuery Web ServiceSinatra and JSONQuery Web Service
Sinatra and JSONQuery Web Service
 

Más de Sourcesense

Atlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiAtlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiSourcesense
 
Atlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionAtlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionSourcesense
 
Atlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesAtlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesSourcesense
 
Atlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introAtlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introSourcesense
 
Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Sourcesense
 
Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Small wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSmall wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSourcesense
 
Sharded Solr setup with master
Sharded Solr setup with masterSharded Solr setup with master
Sharded Solr setup with masterSourcesense
 

Más de Sourcesense (9)

Atlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiAtlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad Cavalcanti
 
Atlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionAtlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps Session
 
Atlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesAtlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense References
 
Atlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introAtlassian Roadshow 2016 intro
Atlassian Roadshow 2016 intro
 
Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015
 
Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Small wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSmall wins in a small time with Apache Solr
Small wins in a small time with Apache Solr
 
Sharded Solr setup with master
Sharded Solr setup with masterSharded Solr setup with master
Sharded Solr setup with master
 

Último

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Faceted Search – the 120 Million Documents Story

  • 1. Faceted Search – the 120 Million Documents Story
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. How Solr Works Index
  • 14. How Solr Works Index Index Snapshot Active Index Reader Searches
  • 15. How Solr Works Index Index Snapshot Active Index Reader Searches New Content Active Index Writer
  • 16. How Solr Works Index Index Snapshot Active Index Reader Searches New Content Active Index Writer commit
  • 17. How Solr Works Index Index Snapshot Index Snapshot Index Reader Active Index Reader Searches New Content Active Index Writer
  • 18. How Solr Works Index Index Snapshot Index Snapshot Index Reader Active Index Reader Searches New Content Active Index Writer
  • 19. How Solr Works Index Index Snapshot Index Reader Searches New Content Active Index Writer
  • 20.
  • 21. Solr Host Configuration shard 1 shard 2 shard 3 searches
  • 22. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator
  • 23. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer
  • 24. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 25. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 26.
  • 28.
  • 29. How Solr Works Index Index Snapshot Searches New Content Active Index Writer Active Index Reader
  • 30. How Solr Works Index Index Snapshot Searches New Content Active Index Writer cache Active Index Reader
  • 31. How Solr Works Index Index Snapshot Index Snapshot Index Reader Searches New Content cache Active Index Reader cache commit Active Index Writer
  • 32. How Solr Works Index Index Snapshot Index Snapshot Index Reader Searches New Content Active Index Writer cache Active Index Reader cache
  • 33. How Solr Works Index Index Snapshot Index Reader Searches New Content Active Index Writer cache
  • 34. Optimisation #1: autowarm < listener event = &quot;newSearcher&quot; class = &quot;solr.QuerySenderListener&quot; > < arr name = &quot;queries&quot; > < lst > < str name = &quot;q&quot; > solr </ str > < str name = &quot;relf&quot; > 4 </ str > < str name = &quot;facet.field&quot; > sourceCountryCS </ str > < str name = &quot;facet.field&quot; > entityCSPerson </ str > < str name = &quot;facet.field&quot; > entityCSCompany </ str > < str name = &quot;facet.field&quot; > entityCSProduct </ str > < str name = &quot;facet.field&quot; > sourceCS </ str > < str name = &quot;facet.field&quot; > authorCS </ str > < str name = &quot;facet.field&quot; > stockTickerCS </ str > < str name = &quot;facet.field&quot; > feedClassCS </ str > < str name = &quot;facet.field&quot; > entityCSOrganization </ str > < str name = &quot;facet.field&quot; > platformCS </ str > < str name = &quot;facet.field&quot; > eventOrFactCS </ str > < str name = &quot;facet.field&quot; > sourceRank </ str > < str name = &quot;facet&quot; > true </ str > < str name = &quot;facet.date&quot; > harvestDate </ str > < str name = &quot;facet.date.start&quot; > NOW-1MONTH </ str > < str name = &quot;facet.date.end&quot; > NOW </ str > < str name = &quot;facet.date.gap&quot; > +24HOURS </ str > < str name = &quot;qt&quot; > /duplicate </ str > < str name = &quot;duplicateOrder&quot; > latest </ str > < str name = &quot;collapseFields&quot; > duplicateGroup titleForDuplicates </ str > </ lst > </ arr > </ listener >
  • 35.
  • 36.
  • 37.
  • 38. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer
  • 39. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 40. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator 35Gb 35Gb 35Gb
  • 41. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 42. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator Entire row: 40 minutes
  • 43.
  • 44.
  • 45. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 46. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator shard 1 shard 2 shard 3 co-ordinator
  • 47. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator shard 1 shard 2 shard 3 co-ordinator archive ingestion
  • 48. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator shard 1 shard 2 shard 3 co-ordinator
  • 49. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 50.