SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Journey from FAST to Solr

      Presented By :
David Hamson , Mou Nandi
Goal of the Session
•  NetDocuments	
  
•  Why	
  move	
  to	
  Solr	
  from	
  FAST	
  
•  Architec8ng	
  Solr	
  to	
  work	
  as	
  a	
  core	
  module	
  for	
  a	
  Cloud	
  Document	
  
   Management	
  product	
  user	
  interface	
  building	
  and	
  document	
  
   discovery	
  
•  Tes8ng	
  and	
  benchmarking	
  Solr	
  to	
  scale	
  and	
  perform	
  for	
  billions	
  of	
  
   documents	
  with	
  200	
  QPS	
  and	
  200	
  DPS	
  
•  Lessons	
  learned/	
  shortcuts	
  found	
  migra8ng	
  from	
  FAST	
  to	
  Solr	
  




                                                                                                         2/14
Who We Are

A	
  Leading	
  cloud	
  content	
  management	
  and	
  collabora8on	
  service	
  for	
  small	
  to	
  medium	
  businesses	
  (SMB)	
  
and	
  professional	
  services	
  firms	
  




                                                                                                                                  2/14
Who We Serve
We	
  service	
  over	
  1,000	
  customers	
  across	
  128	
  countries	
  worldwide	
  and	
  host	
  over	
  250+million	
  
documents.	
  	
  




                                                                                                                                   2/14
Why Migrate to Solr

•      Product	
  roadmap	
  does	
  not	
  fit	
  with	
  company	
  roadmap	
  
•      Large	
  hardware	
  footprint	
  ,	
  expensive	
  to	
  scale	
  
•      High	
  indexing	
  latency	
  	
  
•      Unpredictable	
  and	
  untraceable	
  document	
  loss	
  	
  
•      A	
  black	
  box	
  search	
  engine,	
  dependency	
  on	
  MicrosoT	
  FAST	
  support	
  team	
  	
  
•      No	
  control	
  over	
  new	
  features	
  
•      Expensive	
  license	
  
	
  

	
  
	
  
                                        •    Solr	
  supports	
  massive	
  index	
  
                                        •    Ac8ve	
  hardworking	
  development	
  community	
  
                                        •    Access	
  to	
  what’s	
  happening	
  under	
  the	
  hood	
  
                                        •    Improved	
  hardware	
  footprint	
  	
  
                                        •    Reduced	
  licensing	
  cost	
  	
  




                                                                                                                   2/14
Migration to Solr




             FAST Instance 1                                                          •  95	
  %	
  of	
  searches	
  are	
  
                                                                                         metadata	
  search	
  -­‐	
  Metadata	
  
                                                     FIXML
                                                                Fast
                                                                          MDI + FTI
                                                                                         index	
  does	
  not	
  need	
  rich	
  text	
  
                                                                Indexer
                    Fast Doc Processors
                                                                                         processing	
  	
  

             FAST Instance 2                                                          •  Flexibility	
  to	
  implement	
  
                                                                                         different	
  architecture	
  for	
  
     ND
  Document                                           FIXML
                                                                Fast
                                                                Indexer   MDI + FTI      MDI	
  and	
  FTI	
  
                    Fast Doc Processors

                                                                                      •  Highest	
  level	
  of	
  logging	
  can	
  
                                                                                         not	
  trace	
  the	
  document	
  loss	
  
                                          More FAST Instances
                                                                                         during	
  a	
  heavy	
  feeding	
  traffic	
  




                                                                                                                                       2/14
Migration to Solr – Solr Indexing
                                              Solr MD Instance 1


                                                  Solr MDI         MDI

           MD                       Solr MD
                                     XML
                                              Solr MD Instance 1


                                                  Solr MDI         MDI

   ND
Document
                                              Solr FT Instance

                     ND Pipeline
                                                  Solr FTI         FTI


           FT                       Solr FT
                                     XML
                                              Solr FT Instance
                          Aspire

                                                  Solr FTI         FTI




                                                                         2/14
The Migration Project




                                   •    Only create MDI
                  Phase 1 - MDI    •    Use FAST data to prototype Solr
                                   •    Use the fixmls to build the Solr index
                                   •    Use 100% filter queries




                   Phase 2 – FTI   •  Build a robust feeding pipeline to handle both MD FT
                                   •  Building a text processing pipeline




                        Phase 3    •  Implement new Solr features




                                                                                             2/14
Some ft. view of NetDocuments Search Architecture


            Web Queue                                                                                                                  Solr MDI


                                          NDPipeline	
  	
  -­‐	
  	
  

                                           Administration ( monitoring, debugging, stats)




                                          MDH1                                        FTP1                                        D1




                                                                  FT Processor pool
                        MD Handler Pool




                                                                                             Dispatcher queue


                                                                                                                Dispatcher pool
                                          MDH2                                        FTP2                                        D2
                                                                                                                                                    Query
                                                       FT Queue

                                                                                                                                                                Web App
  Web App                                 MDH3                                        FTP3                                        D3              Distributor

                                          MDH4                                        FTP4                                        D4




                                          MDH5                                        FTP5                                        D5




              File                                                                                                                     Solr FTI
             System




                                                                                                                                                                          2/14
Benchmarking Solr Config Parameter for indexing
•  Created	
  Solr	
  index	
  from	
  fixmls	
  with	
  different	
  ram	
  buffer,	
  merge	
  factor	
  
   and	
  auto	
  commit	
  configura8on	
  




  Testing with HDD and SSD

    •  We	
  did	
  not	
  see	
  any	
  performance	
  difference	
  between	
  HDD	
  (	
  15k	
  rpm)	
  and	
  
       the	
  iodrive2	
  with	
  ND	
  documents	
  
    •  15	
  threads	
  running	
  at	
  a	
  8me	
  from	
  client	
  feeder	
  applica8on	
  




                                                                                                                     2/14
Testing using different file system



   •  We	
  did	
  not	
  see	
  huge	
  performance	
  difference	
  between	
  ext3	
  and	
  xfs	
  on	
  
      HDD	
  or	
  SSD,	
  with	
  ND	
  Documents	
  
   •  We	
  chose	
  to	
  use	
  ext3	
  for	
  FTI	
  	
  with	
  15K	
  HDD	
  on	
  RAID10	
  	
  
   •  We	
  are	
  using	
  xfs	
  for	
  iodrive	
  for	
  MDI	
  as	
  suggested	
  by	
  fusion	
  Io	
  




                                                                                                               2/14
Benchmarking Solr Indexing and Query Process

                                                                                                                     search	
  going	
  to	
  10	
  
                                                 search	
  going	
  to	
  5	
  shards	
                              shards	
  
                                                 5	
  solr	
  meter	
  instances	
                                   10	
  	
  Solr	
  meter	
  instances	
  


                                                 Each	
  shard	
  serving	
  	
  3000	
  queries	
  per	
  min	
     Each	
  shard	
  serving	
  	
  1500	
  queries/min	
  

                                                 Total	
  15000	
  queries/min	
                                     Total	
  15000	
  queries/min	
  
 Implemented	
  and	
  compared	
  
 mul8-­‐core	
  index	
  processing	
            avg	
  response	
  8me	
  8	
  ms	
                                 avg	
  response	
  8me	
  12	
  ms	
  

 and	
  query	
  	
  performance	
               cpu	
  20	
  %	
                                                    cpu	
  32	
  %	
  

 compared	
  to	
  single	
  core	
  index	
     ram	
  -­‐	
  52	
  G	
                                             ram	
  -­‐	
  53	
  G	
  

                                                 cache	
  warmup	
  8me	
  2.5	
  S	
                                cache	
  warmup	
  8me	
  2.7	
  S	
  

                                                 cachehit	
  ra8o	
  .98	
                                           cachehit	
  ra8o	
  .98	
  

                                                 cache	
  size	
  2276	
                                             cache	
  size	
  2276	
  

                                                 no	
  evic8on	
                                                     no	
  evic8on	
  

                                                 index	
  updated	
  every	
  7	
  sec	
                             index	
  updated	
  every	
  7	
  sec	
  

                                                 test	
  ran	
  5	
  min	
                                           test	
  ran	
  8	
  min	
  




                                                                                                                                                                               2/14
Benchmark qtime increase as Solr scales and start row increases




                  qTime does not vary much with start row increase.




                                                                      6/14
Tuning System queries for Solr
•  System	
  searches	
  are	
  metadata	
  searches	
  
•  Thousands	
  of	
  real-­‐life	
  queries	
  were	
  extracted	
  from	
  FAST	
  query	
  log	
  
•  	
  Extensive	
  use	
  of	
  filter	
  queries	
  and	
  filter	
  cache	
  give	
  excellent	
  response	
  8me	
  for	
  complex	
  
   queries	
  

•  Example	
  queries:	
  

FAST	
  Query	
  :	
  
ANDNOT(ANDNOT(ANDNOT(AND(AND(ndcabinets:string(“cab1",	
  
mode="and"),ndcredate:range(2011-­‐09-­‐26T00:00:00,2012-­‐04-­‐13T23:59:59)),FILTER(ndacl:string(“acl1	
  acl2	
  acl3	
  
",mode="OR"))),nddeletedcabs:string(“cab1",	
  mode="and")),ndexten:string("ndws",	
  mode="and")),ndexten:string("ndflt",	
  
mode="and"))	
  
	
  
Solr	
  Query:	
  
hlp://solrserver:port/solrSearch/core0/select?shards=solrserver:port/solrSearch/core0,1solrserver:port/solrSearch/
core1&start=0&rows=500&fl=ndenvurl,nddocmodnum_s_std,nd8tle_t_idx_std&sort=ndlastmoddate_tdt_idx
+desc&q=ndenvurl:*&fq=ndcabinets_smul8_idx:cab1&fq=ndcredate_tdt_idx:[2011-­‐09-­‐26T00:00:00Z	
  TO	
  
2012-­‐04-­‐13T23:59:59Z]&fq={!cache=false	
  cost=100}(ndacl_smul8_idx:acl1	
  OR	
  ndacl_smul8_idx:acl2	
  OR	
  
ndacl_smul8_idx:acl3)&fq=-­‐nddeletedcabs_smul8_idx:cab1&fq=-­‐ndexten_s_idx:ndws&fq=-­‐ndexten_s_idx:ndflt	
  




                                                                                                                                           2/14
THANK
 YOU

Más contenido relacionado

Más de lucenerevolution

Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 

Más de lucenerevolution (20)

Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 

Último

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Último (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

NetDocuments- Journey from FAST to Solr

  • 1. Journey from FAST to Solr Presented By : David Hamson , Mou Nandi
  • 2. Goal of the Session •  NetDocuments   •  Why  move  to  Solr  from  FAST   •  Architec8ng  Solr  to  work  as  a  core  module  for  a  Cloud  Document   Management  product  user  interface  building  and  document   discovery   •  Tes8ng  and  benchmarking  Solr  to  scale  and  perform  for  billions  of   documents  with  200  QPS  and  200  DPS   •  Lessons  learned/  shortcuts  found  migra8ng  from  FAST  to  Solr   2/14
  • 3. Who We Are A  Leading  cloud  content  management  and  collabora8on  service  for  small  to  medium  businesses  (SMB)   and  professional  services  firms   2/14
  • 4. Who We Serve We  service  over  1,000  customers  across  128  countries  worldwide  and  host  over  250+million   documents.     2/14
  • 5. Why Migrate to Solr •  Product  roadmap  does  not  fit  with  company  roadmap   •  Large  hardware  footprint  ,  expensive  to  scale   •  High  indexing  latency     •  Unpredictable  and  untraceable  document  loss     •  A  black  box  search  engine,  dependency  on  MicrosoT  FAST  support  team     •  No  control  over  new  features   •  Expensive  license         •  Solr  supports  massive  index   •  Ac8ve  hardworking  development  community   •  Access  to  what’s  happening  under  the  hood   •  Improved  hardware  footprint     •  Reduced  licensing  cost     2/14
  • 6. Migration to Solr FAST Instance 1 •  95  %  of  searches  are   metadata  search  -­‐  Metadata   FIXML Fast MDI + FTI index  does  not  need  rich  text   Indexer Fast Doc Processors processing     FAST Instance 2 •  Flexibility  to  implement   different  architecture  for   ND Document FIXML Fast Indexer MDI + FTI MDI  and  FTI   Fast Doc Processors •  Highest  level  of  logging  can   not  trace  the  document  loss   More FAST Instances during  a  heavy  feeding  traffic   2/14
  • 7. Migration to Solr – Solr Indexing Solr MD Instance 1 Solr MDI MDI MD Solr MD XML Solr MD Instance 1 Solr MDI MDI ND Document Solr FT Instance ND Pipeline Solr FTI FTI FT Solr FT XML Solr FT Instance Aspire Solr FTI FTI 2/14
  • 8. The Migration Project •  Only create MDI Phase 1 - MDI •  Use FAST data to prototype Solr •  Use the fixmls to build the Solr index •  Use 100% filter queries Phase 2 – FTI •  Build a robust feeding pipeline to handle both MD FT •  Building a text processing pipeline Phase 3 •  Implement new Solr features 2/14
  • 9. Some ft. view of NetDocuments Search Architecture Web Queue Solr MDI NDPipeline    -­‐     Administration ( monitoring, debugging, stats) MDH1 FTP1 D1 FT Processor pool MD Handler Pool Dispatcher queue Dispatcher pool MDH2 FTP2 D2 Query FT Queue Web App Web App MDH3 FTP3 D3 Distributor MDH4 FTP4 D4 MDH5 FTP5 D5 File Solr FTI System 2/14
  • 10. Benchmarking Solr Config Parameter for indexing •  Created  Solr  index  from  fixmls  with  different  ram  buffer,  merge  factor   and  auto  commit  configura8on   Testing with HDD and SSD •  We  did  not  see  any  performance  difference  between  HDD  (  15k  rpm)  and   the  iodrive2  with  ND  documents   •  15  threads  running  at  a  8me  from  client  feeder  applica8on   2/14
  • 11. Testing using different file system •  We  did  not  see  huge  performance  difference  between  ext3  and  xfs  on   HDD  or  SSD,  with  ND  Documents   •  We  chose  to  use  ext3  for  FTI    with  15K  HDD  on  RAID10     •  We  are  using  xfs  for  iodrive  for  MDI  as  suggested  by  fusion  Io   2/14
  • 12. Benchmarking Solr Indexing and Query Process search  going  to  10   search  going  to  5  shards   shards   5  solr  meter  instances   10    Solr  meter  instances   Each  shard  serving    3000  queries  per  min   Each  shard  serving    1500  queries/min   Total  15000  queries/min   Total  15000  queries/min   Implemented  and  compared   mul8-­‐core  index  processing   avg  response  8me  8  ms   avg  response  8me  12  ms   and  query    performance   cpu  20  %   cpu  32  %   compared  to  single  core  index   ram  -­‐  52  G   ram  -­‐  53  G   cache  warmup  8me  2.5  S   cache  warmup  8me  2.7  S   cachehit  ra8o  .98   cachehit  ra8o  .98   cache  size  2276   cache  size  2276   no  evic8on   no  evic8on   index  updated  every  7  sec   index  updated  every  7  sec   test  ran  5  min   test  ran  8  min   2/14
  • 13. Benchmark qtime increase as Solr scales and start row increases qTime does not vary much with start row increase. 6/14
  • 14. Tuning System queries for Solr •  System  searches  are  metadata  searches   •  Thousands  of  real-­‐life  queries  were  extracted  from  FAST  query  log   •   Extensive  use  of  filter  queries  and  filter  cache  give  excellent  response  8me  for  complex   queries   •  Example  queries:   FAST  Query  :   ANDNOT(ANDNOT(ANDNOT(AND(AND(ndcabinets:string(“cab1",   mode="and"),ndcredate:range(2011-­‐09-­‐26T00:00:00,2012-­‐04-­‐13T23:59:59)),FILTER(ndacl:string(“acl1  acl2  acl3   ",mode="OR"))),nddeletedcabs:string(“cab1",  mode="and")),ndexten:string("ndws",  mode="and")),ndexten:string("ndflt",   mode="and"))     Solr  Query:   hlp://solrserver:port/solrSearch/core0/select?shards=solrserver:port/solrSearch/core0,1solrserver:port/solrSearch/ core1&start=0&rows=500&fl=ndenvurl,nddocmodnum_s_std,nd8tle_t_idx_std&sort=ndlastmoddate_tdt_idx +desc&q=ndenvurl:*&fq=ndcabinets_smul8_idx:cab1&fq=ndcredate_tdt_idx:[2011-­‐09-­‐26T00:00:00Z  TO   2012-­‐04-­‐13T23:59:59Z]&fq={!cache=false  cost=100}(ndacl_smul8_idx:acl1  OR  ndacl_smul8_idx:acl2  OR   ndacl_smul8_idx:acl3)&fq=-­‐nddeletedcabs_smul8_idx:cab1&fq=-­‐ndexten_s_idx:ndws&fq=-­‐ndexten_s_idx:ndflt   2/14
  • 15.