SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Solr 4
                   Presented by Erik Hatcher




© Copyright 2012
About: Erik Hatcher

    • “Lucene in Action”, co-author
       -  And also “Java Development with Ant”/”Ant in Action” co-author
    • Open Source
       -  Apache Software Foundation: member, Lucene/Solr committer
          and PMC
       -  Originator of “Blacklight”, a Solr-powered discovery interface
    • LucidWorks
       -  Co-founder
       -  Recently renamed from Lucid Imagination
       -  Customer Support




    © 2012 LucidWorks
2
Abstract

    Solr 4.0 dramatically improves scalability, performance,
    and flexibility. An overhauled Lucene underneath sports near
    real-time (NRT) capabilities allowing indexed documents to
    be rapidly visible and searchable. Lucene’s improvements
    also include pluggable scoring, much faster fuzzy and
    wildcard querying, and vastly improved memory usage.
    These Lucene improvements automatically make Solr much
    better, and Solr magnifies these advances with “SolrCloud.”
    SolrCloud enables highly available and fault tolerant clusters
    for large scale distributed indexing and searching. There are
    many other changes that will be surveyed as well. This talk
    will cover these improvements in detail, comparing and
    contrasting to previous versions of Solr.

    © 2012 LucidWorks
3
Lucene 4 Improvements

    • Flexible index formats
    • Pluggable scoring
    • String -> BytesRef
    • DWPT (Document Writer Per Thread)
       -  faster, more consistent indexing speed
    • NRT (Near Real-Time)
    • Spatial overhaul
    • FST/FSA
       -  FuzzyQuery over 100x faster
       -  also reduces memory footprint for Terms index
    • DocValues: aka column-stride fields

    © 2012 LucidWorks
4
Flexible index formats

    • For terms, postings lists, stored fields, term vectors, etc
    • Several new posting list codecs
       -  Pulsing (inlines low doc freq)
       -  Block (packed int blocks)
       -  SimpleText (debugging, transparency)
       -  Bloom (experimental, also inlines low doc freq)
       -  Appending (for append-only filesystems such as HDFS)
       -  Memory (terms as FST)




    © 2012 LucidWorks
5
Pluggable scoring

    • Decoupled from traditional vector space (TF/IDF)
    • Additional index statistics
       -  number of tokens for a term or field
       -  number of postings for a field
       -  number of documents with a posting for a field
    • Several built-in alternatives:
       -  BM25
       -  DFR – divergence from randomness
       -  Information-based models
    • “norms” are no longer limited to a single byte
       -  Similarity implementations can use any DocValues type to store
          norms


    © 2012 LucidWorks
6
String -> BytesRef

    • How many bytes does a Java String require?
       -  BytesRef is now used to avoid this overhead
       -  Think of the internal structure as a big buffer with pointers
    • Garbage collection much more efficient
       -  big blocks rather than zillions of small ones
    • How much reduction? 10%? 20%?
       -  No. Way more than that




    © 2012 LucidWorks
7
NRT: Near Real-Time

    • Per-segment
       -  FieldCache needs to only load from new segments
    • Soft commit
       -  Faster: does not fsync
       -  Can soft commit very rapidly, as low as every second




    © 2012 LucidWorks
8
Lucene 4: there’s more

    • AutomatonQuery
       -  term matching a provided finite-state automaton
    • Term offsets
       -  optionally encoded into the postings lists and can be retrieved
          per-position
    • DirectSpellChecker
       -  finds possible corrections directly against the main search index
          without requiring a separate index
    • DWPT
       -  Flushing new segment is now concurrent w/ indexing




    © 2012 LucidWorks
9
Indexing performance (Wikipedia 4KB docs)

     • http://people.apache.org/~mikemccand/lucenebench/
       indexing.html




     © 2012 LucidWorks
10
QPS (primary key lookup)

     • http://people.apache.org/~mikemccand/lucenebench/
       PKLookup.html




     © 2012 LucidWorks
11
FuzzyQuery

     • http://people.apache.org/~mikemccand/lucenebench/
       Fuzzy2.html




     © 2012 LucidWorks
12
Solr 4 Highlights

     •  SolrJ streaming response
     •  Pivot facets
     •  New relevancy function queries
        -  termfreq, tf, docfreq, idf norm, maxdoc, numdocs, exists, if, and, or,
           xor, not, def, and true and false constants
     •  DirectSpellChecker support
     •  Improved document response: DocTransformer, function
        calculations
     •  Pseudo-join
     •  New admin UI: Including SolrCloud cluster visualizations
     •  Transaction log
     •  Several new update processors, including a “script” one
     •  Spatial overhaul
     •  Content-type savvy /update handler
     •  SolrCloud

     © 2012 LucidWorks
13
Per-segment faceting improvement

     • Field-cache, per segment
        -  Test index: 10M documents, 18 segments, single valued field
     • facet.method=fcs
     • Result set=100 docs, 100,000 unique terms
        -  static index fc=3ms fcs=244 ms
        -  quickly changing index fc=1388 ms, fcs=267 ms
     • Result set=1,000,000 docs, 100 unique terms
        -  static index fc=26 ms fcs=34 ms
        -  quickly changing index fc=741 ms, fcs=94 ms
     • Data from Yonik’s Lucene Revolution 2011 faceting talk



     © 2012 LucidWorks
14
Solr 3.x scalability

     • Capabilities:
        -  Replication
        -  Distributed search
     • Limitations:
        -  Documents only available after (expensive) “hard” commit,
           replication, and warming delays
        -  Configuration labor intensive, manually maintained and
           coordinated
        -  Manual sharding: no automatic distributed indexing
        -  Failure recovery difficult if master goes down




     © 2012 LucidWorks
15
SolrCloud: Solr 4’s scalability

     • Sharded leaders and replicas
     • ZooKeeper used for cluster management
     • Distributed indexing
        -  Automatically distributes updates to appropriate shard
        -  Facilitates Near Real-Time (NRT) searching
     • Distributed search
        -  Automatically distributes to nodes of each shard
     • Robust, automatic update recovery
     • Real-time /get
        -  Leverages transaction log
     • No single point of failure
     • Large scale NRT using soft commits

     © 2012 LucidWorks
16
SolrCloud details

     • “Leaders” and “replicas”
        -  Leaders are automatically elected
     • Leaders are just a replica with some coordination
       responsibilities for the associated replicas
     • If a leader goes down, one of the associated replicas is
       elected as the new leader
     • New nodes are automatically assigned a shard and
       role, and replicate/recover as needed
     • CloudSolrServer
     • Replication in Solr 4
        -  Used for new and recovering replicas
        -  Or for traditional master/slave configuration

     © 2012 LucidWorks
17
NoSQL

     • Update durability
        -  A transaction log ensures that even uncommitted documents are
           never lost.
     • Real-time Get
        -  The ability to quickly retrieve the latest version of a document,
           without the need to commit or open a new searcher
     • Versioning and Optimistic Locking
        -  combined with real-time get, this allows read-update-write
           functionality that ensures no conflicting changes were made
           concurrently by other clients.
     • Atomic updates
        -  the ability to add, remove, change, and increment fields of an
           existing document without having to send in the complete
           document again.


     © 2012 LucidWorks
18
Some numbers

     • On a Wikipedia index (11M documents)
        -  Time to perform the first query with sorting (no warmup queries)
           Solr 3x: 13 seconds, Solr 4: 6 seconds.
        -  Memory consumption Solr 3x: 1,040M, Solr 4: 366M. Yes,
           almost a 2/3 reduction in memory use. And that’s the entire
           program size, not counting memory used to just start Solr and
           Jetty running.
        -  Number of objects on the heap. Solr 3x: 19.4M, Solr 4: 80K. No,
           that’s not a typo. There are over two orders of magnitude fewer
           objects on the heap in trunk!
     • From an Erick Erickson blog entry (see Links slide)




     © 2012 LucidWorks
19
Links

     • Lucene/Solr: lucene.apache.org
     • “Lucene in Action”: www.manning.com/lucene
     • Blacklight
        -  projectblacklight.org
        -  Examples: search.lib.virginia.edu and searchworks.stanford.edu
     • SearchHub.org
        -  Community/public content
        -  http://searchhub.org/dev/2012/04/06/memory-comparisons-
           between-solr-3x-and-trunk/




     © 2012 LucidWorks
20
About LucidWorks

     • LucidWorks Search
        -  Lucene/Solr 4 powered
        -  On-premise or hosted (Amazon EC2 and Azure)
        -  Rich connector framework for SharePoint, web crawling, etc
        -  Built-in security support
     • LucidWorks Big Data
        -  Scalable classification, machine learning, analytics
     • Lucene/Solr commercial support
     • Consulting
     • Training
     • http://www.lucidworks.com


     © 2012 LucidWorks
21

Más contenido relacionado

La actualidad más candente

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 

La actualidad más candente (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 

Destacado

Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Provectus
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeRogue Wave Software
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StorySourcesense
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to KazanProvectus
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesPeter
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world usesRogue Wave Software
 
Solr introduction
Solr introductionSolr introduction
Solr introductionLap Tran
 
How to achieve security, reliability, and productivity in less time
How to achieve security, reliability, and productivity in less timeHow to achieve security, reliability, and productivity in less time
How to achieve security, reliability, and productivity in less timeRogue Wave Software
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result ReorderingVarun Thacker
 

Destacado (20)

Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source code
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents Story
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to Kazan
 
Hackathon
HackathonHackathon
Hackathon
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build Sites
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world uses
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
How to achieve security, reliability, and productivity in less time
How to achieve security, reliability, and productivity in less timeHow to achieve security, reliability, and productivity in less time
How to achieve security, reliability, and productivity in less time
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result Reordering
 

Similar a Solr 4

What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Grant Ingersoll
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmaplucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road maplucenerevolution
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionLucidworks
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in AlfrescoAngel Borroy López
 
Vijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresVijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresmkorremans
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsLucidworks
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksLucidworks
 

Similar a Solr 4 (20)

What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Vijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresVijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-features
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 

Más de Erik Hatcher

Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
 

Más de Erik Hatcher (10)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache Solr
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Solr 4

  • 1. Solr 4 Presented by Erik Hatcher © Copyright 2012
  • 2. About: Erik Hatcher • “Lucene in Action”, co-author -  And also “Java Development with Ant”/”Ant in Action” co-author • Open Source -  Apache Software Foundation: member, Lucene/Solr committer and PMC -  Originator of “Blacklight”, a Solr-powered discovery interface • LucidWorks -  Co-founder -  Recently renamed from Lucid Imagination -  Customer Support © 2012 LucidWorks 2
  • 3. Abstract Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr. © 2012 LucidWorks 3
  • 4. Lucene 4 Improvements • Flexible index formats • Pluggable scoring • String -> BytesRef • DWPT (Document Writer Per Thread) -  faster, more consistent indexing speed • NRT (Near Real-Time) • Spatial overhaul • FST/FSA -  FuzzyQuery over 100x faster -  also reduces memory footprint for Terms index • DocValues: aka column-stride fields © 2012 LucidWorks 4
  • 5. Flexible index formats • For terms, postings lists, stored fields, term vectors, etc • Several new posting list codecs -  Pulsing (inlines low doc freq) -  Block (packed int blocks) -  SimpleText (debugging, transparency) -  Bloom (experimental, also inlines low doc freq) -  Appending (for append-only filesystems such as HDFS) -  Memory (terms as FST) © 2012 LucidWorks 5
  • 6. Pluggable scoring • Decoupled from traditional vector space (TF/IDF) • Additional index statistics -  number of tokens for a term or field -  number of postings for a field -  number of documents with a posting for a field • Several built-in alternatives: -  BM25 -  DFR – divergence from randomness -  Information-based models • “norms” are no longer limited to a single byte -  Similarity implementations can use any DocValues type to store norms © 2012 LucidWorks 6
  • 7. String -> BytesRef • How many bytes does a Java String require? -  BytesRef is now used to avoid this overhead -  Think of the internal structure as a big buffer with pointers • Garbage collection much more efficient -  big blocks rather than zillions of small ones • How much reduction? 10%? 20%? -  No. Way more than that © 2012 LucidWorks 7
  • 8. NRT: Near Real-Time • Per-segment -  FieldCache needs to only load from new segments • Soft commit -  Faster: does not fsync -  Can soft commit very rapidly, as low as every second © 2012 LucidWorks 8
  • 9. Lucene 4: there’s more • AutomatonQuery -  term matching a provided finite-state automaton • Term offsets -  optionally encoded into the postings lists and can be retrieved per-position • DirectSpellChecker -  finds possible corrections directly against the main search index without requiring a separate index • DWPT -  Flushing new segment is now concurrent w/ indexing © 2012 LucidWorks 9
  • 10. Indexing performance (Wikipedia 4KB docs) • http://people.apache.org/~mikemccand/lucenebench/ indexing.html © 2012 LucidWorks 10
  • 11. QPS (primary key lookup) • http://people.apache.org/~mikemccand/lucenebench/ PKLookup.html © 2012 LucidWorks 11
  • 12. FuzzyQuery • http://people.apache.org/~mikemccand/lucenebench/ Fuzzy2.html © 2012 LucidWorks 12
  • 13. Solr 4 Highlights •  SolrJ streaming response •  Pivot facets •  New relevancy function queries -  termfreq, tf, docfreq, idf norm, maxdoc, numdocs, exists, if, and, or, xor, not, def, and true and false constants •  DirectSpellChecker support •  Improved document response: DocTransformer, function calculations •  Pseudo-join •  New admin UI: Including SolrCloud cluster visualizations •  Transaction log •  Several new update processors, including a “script” one •  Spatial overhaul •  Content-type savvy /update handler •  SolrCloud © 2012 LucidWorks 13
  • 14. Per-segment faceting improvement • Field-cache, per segment -  Test index: 10M documents, 18 segments, single valued field • facet.method=fcs • Result set=100 docs, 100,000 unique terms -  static index fc=3ms fcs=244 ms -  quickly changing index fc=1388 ms, fcs=267 ms • Result set=1,000,000 docs, 100 unique terms -  static index fc=26 ms fcs=34 ms -  quickly changing index fc=741 ms, fcs=94 ms • Data from Yonik’s Lucene Revolution 2011 faceting talk © 2012 LucidWorks 14
  • 15. Solr 3.x scalability • Capabilities: -  Replication -  Distributed search • Limitations: -  Documents only available after (expensive) “hard” commit, replication, and warming delays -  Configuration labor intensive, manually maintained and coordinated -  Manual sharding: no automatic distributed indexing -  Failure recovery difficult if master goes down © 2012 LucidWorks 15
  • 16. SolrCloud: Solr 4’s scalability • Sharded leaders and replicas • ZooKeeper used for cluster management • Distributed indexing -  Automatically distributes updates to appropriate shard -  Facilitates Near Real-Time (NRT) searching • Distributed search -  Automatically distributes to nodes of each shard • Robust, automatic update recovery • Real-time /get -  Leverages transaction log • No single point of failure • Large scale NRT using soft commits © 2012 LucidWorks 16
  • 17. SolrCloud details • “Leaders” and “replicas” -  Leaders are automatically elected • Leaders are just a replica with some coordination responsibilities for the associated replicas • If a leader goes down, one of the associated replicas is elected as the new leader • New nodes are automatically assigned a shard and role, and replicate/recover as needed • CloudSolrServer • Replication in Solr 4 -  Used for new and recovering replicas -  Or for traditional master/slave configuration © 2012 LucidWorks 17
  • 18. NoSQL • Update durability -  A transaction log ensures that even uncommitted documents are never lost. • Real-time Get -  The ability to quickly retrieve the latest version of a document, without the need to commit or open a new searcher • Versioning and Optimistic Locking -  combined with real-time get, this allows read-update-write functionality that ensures no conflicting changes were made concurrently by other clients. • Atomic updates -  the ability to add, remove, change, and increment fields of an existing document without having to send in the complete document again. © 2012 LucidWorks 18
  • 19. Some numbers • On a Wikipedia index (11M documents) -  Time to perform the first query with sorting (no warmup queries) Solr 3x: 13 seconds, Solr 4: 6 seconds. -  Memory consumption Solr 3x: 1,040M, Solr 4: 366M. Yes, almost a 2/3 reduction in memory use. And that’s the entire program size, not counting memory used to just start Solr and Jetty running. -  Number of objects on the heap. Solr 3x: 19.4M, Solr 4: 80K. No, that’s not a typo. There are over two orders of magnitude fewer objects on the heap in trunk! • From an Erick Erickson blog entry (see Links slide) © 2012 LucidWorks 19
  • 20. Links • Lucene/Solr: lucene.apache.org • “Lucene in Action”: www.manning.com/lucene • Blacklight -  projectblacklight.org -  Examples: search.lib.virginia.edu and searchworks.stanford.edu • SearchHub.org -  Community/public content -  http://searchhub.org/dev/2012/04/06/memory-comparisons- between-solr-3x-and-trunk/ © 2012 LucidWorks 20
  • 21. About LucidWorks • LucidWorks Search -  Lucene/Solr 4 powered -  On-premise or hosted (Amazon EC2 and Azure) -  Rich connector framework for SharePoint, web crawling, etc -  Built-in security support • LucidWorks Big Data -  Scalable classification, machine learning, analytics • Lucene/Solr commercial support • Consulting • Training • http://www.lucidworks.com © 2012 LucidWorks 21