SlideShare una empresa de Scribd logo
1 de 53
Descargar para leer sin conexión
Riak Search
    A Full-Text Search
   and Indexing Engine
         based on Riak


      Berlin Buzzwords· June 2010

               Basho Technologies
          Rusty Klophaus - @rklophaus
Why did we build it?
What are the major goals?
  How does it work?




                            2
Part One
Why did we build
 Riak Search?




                   3
Riak is
a scalable, highly-available, networked,
    open-source key/value store.




                                           4
Writing to a Key/Value Store




                 Key/Value




       CLIENT                  RIAK
                                      5
Writing to a Key/Value Store




                     Object




       CLIENT                  RIAK
                                      6
Querying a Key/Value Store




                       Key


                     Object

       CLIENT                 RIAK
                                     7
Querying Riak via LinkWalking

               Key + Instructions
                                     Walk to
                                     Related
                                      Keys


                  Object(s)

       CLIENT                       RIAK
                                               8
Querying Riak via Map/Reduce

            Key(s) + JS Functions
                                      Map

                                      Map

                                     Reduce
             Computed Value(s)

       CLIENT                       RIAK
                                              9
Key/Value Stores
       like
Key-Based Queries




                    10
Query by Secondary Index


            where Category == "Shoes"




                           WTF!? I'm a
                           KV store!



      CLIENT                            RIAK
                                               11
Full-Text Query


                  "Converse AND Shoes"




                                This is
                              getting old.



       CLIENT                             RIAK
                                                 12
These kinds of queries
   need an Index.

*Market Opportunity!*



                         13
Part Two
What are the major
goals of Riak Search?




                        14
An application built on Riak.




              Your
                                Riak
          Application




                                       15
Hrm... I need an index.




              Your
                             Riak
          Application     Index
                          Object




                                    16
Hrm... I need an index with more features.




       Your
                         ???           Riak
  Application




                                              17
Lucene should do the trick...




       Your
                      Lucene    Riak
   Application




                                       18
...shard to add more storage capacity...




      Your
   Application
                 Lucene   Lucene   Lucene   Riak




                                                   19
...replicate to add more throughput.




                 Lucene   Lucene   Lucene
      Your
   Application
                 Lucene   Lucene   Lucene   Riak
                 Lucene   Lucene   Lucene




                                                   20
...replicate to add more throughput.




                 Lucene   Lucene   Lucene
      Your
   Application
                 Lucene   Lucene   Lucene   Riak
                 Lucene   Lucene   Lucene




                  Operations nightmare!

                                                   21
What do we really want?




       Your       Riak-ified
                               Riak
   Application      Lucene




                                      22
What do we really want?




       Your          Riak
                            Riak
   Application     Search




                                   23
Functionality? Be like Lucene (and more).

• Lucene Syntax
• Leverages Java Lucene Analyzers
• Solr Endpoints
• Integration via Riak Post-Commit Hook (Index)
• Integration via Riak Map/Reduce (Query)
• Near-Realtime
• Schema-less


                                                  24
Operations? Be like Riak.

• No special nodes
• Add nodes, get more compute and storage
• Automatically load balance
• Replicas for durability and performance
• Index and query in parallel
• Swappable storage backends




                                            25
Part Three
How do we do it?




                   26
A Gentle Introduction to
  Document Indexing




                           27
The Inverted Index

    Document           Inverted Index



                           day, 1
                           dog, 1
  #1
       Every dog has
       his day.
                           every, 1
                           has, 1
                           his, 1




                                        28
The Inverted Index

   Documents            Combined Inverted Index
                                and, 4
 #1   Every dog has
      his day.
                                bag, 3
                                bark, 2
                                bite, 2
      The dog's bark
 #2
                                cat, 3
      is worse than
      his bite.                 cat, 4
                                day, 1
                                dog, 1
 #3
      Let the cat out
      of the bag.               dog, 2
                                dog, 4
                                every, 1
#4    It's raining
      cats and dogs.            has, 1
                                ...


                                                  29
At Query Time...

                   "dog AND cat"




                       AND



           dog                     cat
                                         30
At Query Time...

                   AND


           dog            cat


         dog, 1
                         cat, 3
         dog, 2
                         cat, 4
         dog, 4


                                  31
At Query Time...
                      Result: 4




                        AND
                 (Merge Intersection)




             1
                                        3
             2
                                        4
             4


                                            32
At Query Time...
                   Result: 1, 2, 3, 4




                           OR
                      (Merge Union)




             1
                                        3
             2
                                        4
             4


                                            33
Complex Behavior from Simple Structures




                                          34
Storage Approaches...




                        35
Riak Search uses
Consistent Hashing
 to store data on
     Partitions



                     36
Introduction to Consistent Hashing and Partitions

                Partitions = 10
                Number of Nodes = 5
                Partitions per Node = 2
                Replicas (NVal) = 2




                                                    37
Introduction to Consistent Hashing and Partitions


             Object




                                                    38
Document Partitioning
        vs.
  Term Partitioning




                        39
...and the
Resulting Tradeoffs




                      40
Document Partitioning @ Index Time


     #1   Every dog has
          his day.




                                     41
Document Partitioning @ Query Time

                          "dog OR cat"




                                         42
Term Partitioning @ Index Time
                                 day, 1
                                 dog, 1
     #1   Every dog has
          his day.
                                 every, 1
                                 has, 1
                                 his, 1




                                            43
Term Partitioning @ Index Time



                       dog, 1
     day, 1                      has, 1




     his, 1                      every, 1



                                            44
Term Partitioning @ Query Time

                           "dog OR cat"




                                          45
Tradeoffs...


    Document Partitioning       Term Partitioning

   + Lower Latency Queries   - Higher Latency Queries
   - Lower Throughput        + Higher Throughput
   - Lots of Disk Seeks      - Hotspots in Ring
                               (the "Obama" problem)




                                                        46
Riak Search: Term Partitioning

Term-partitioning is the most viable approach for our beta
clients’ needs: high throughput on Really Big Datasets.
Optimizations:
     • Term splitting to reduce hot spots
     • Bloom filters & caching to save query-time bandwidth
     • Batching to save query-time & index-time bandwidth
Support for either approach eventually.




                                                             47
Part Four
 Review




            48
Riak Search turns this...


                  "Converse AND Shoes"




                              WTF!? I'm a
                               KV store!



        CLIENT                             RIAK
                                                  49
...into this...


                  "Converse AND Shoes"




                                Gladly!



           CLIENT                         RIAK
                                                 50
...into this...


                  "Converse AND Shoes"




                    Keys or Objects




           CLIENT                        RIAK
                                                51
...while keeping operations easy.




        Your            Riak
                                    Riak
    Application       Search




                                           52
Thanks! Questions?

                        Search Team:

                        John Muellerleile - @jrecursive

                        Rusty Klophaus - @rklophaus

                        Kevin Smith - @kevsmith



Currently working with a small set of Beta users.
Open-source release planned for Q3.
www.basho.com

Más contenido relacionado

Destacado

Destacado (6)

Riak Search - The Next Generation
Riak Search - The Next GenerationRiak Search - The Next Generation
Riak Search - The Next Generation
 
Riak Search 2: Yokozuna
Riak Search 2: YokozunaRiak Search 2: Yokozuna
Riak Search 2: Yokozuna
 
Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak Core
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared State
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0
 

Similar a Riak Search - Berlin Buzzwords 2010

Essential action script 3.0
Essential action script 3.0Essential action script 3.0
Essential action script 3.0
UltimateCodeX
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
Kevin Smith
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
Kevin Smith
 
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Lucidworks
 
MonoRails - GoGaRuCo 2012
MonoRails - GoGaRuCo 2012MonoRails - GoGaRuCo 2012
MonoRails - GoGaRuCo 2012
jackdanger
 

Similar a Riak Search - Berlin Buzzwords 2010 (14)

Riak Search - Erlang Factory London 2010
Riak Search - Erlang Factory London 2010Riak Search - Erlang Factory London 2010
Riak Search - Erlang Factory London 2010
 
Riak - From Small to Large - StrangeLoop
Riak - From Small to Large - StrangeLoopRiak - From Small to Large - StrangeLoop
Riak - From Small to Large - StrangeLoop
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products
 
Essential action script 3.0
Essential action script 3.0Essential action script 3.0
Essential action script 3.0
 
Eclipse IDE for Scala (2.9 story)
Eclipse IDE for Scala (2.9 story)Eclipse IDE for Scala (2.9 story)
Eclipse IDE for Scala (2.9 story)
 
Personalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in AmericaPersonalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in America
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
 
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
 
MonoRails - GoGaRuCo 2012
MonoRails - GoGaRuCo 2012MonoRails - GoGaRuCo 2012
MonoRails - GoGaRuCo 2012
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
 
Jazoon 2011 - Smart EAI with Apache Camel
Jazoon 2011 - Smart EAI with Apache CamelJazoon 2011 - Smart EAI with Apache Camel
Jazoon 2011 - Smart EAI with Apache Camel
 
Riak CS in Cloudstack
Riak CS in CloudstackRiak CS in Cloudstack
Riak CS in Cloudstack
 
TorqueBox for Rubyists
TorqueBox for RubyistsTorqueBox for Rubyists
TorqueBox for Rubyists
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Riak Search - Berlin Buzzwords 2010