SlideShare una empresa de Scribd logo
1 de 12
Lucene KV-Store
A high-performance key-value store




                                     Mark Harwood
Benefits
High-speed reads and writes of key/value pairs
sustained over growing volumes of data

Read costs are always 0 or 1 disk seek

Efficient use of memory

Simple file structures with strong durability
guarantees
Why “Lucene” KV store?
Uses Lucene’s “Directory” APIs for low-level file
access

Based on Lucene’s concepts of segment
files, soft deletes, background merges, commit
points etc BUT a fundamentally different form of
index

I’d like to offer it to the Lucene community as a
“contrib” module because they have a track
record in optimizing these same concepts (and
could potentially make use of it in Lucene?)
Example benchmark
               results




Note, regular Lucene search indexes follow the same trajectory of the
“Common KV Store” when it comes to lookups on a store with millions
of keys
KV-Store High-level Design
Map        Key hash (int)   Disk pointer
                            (int)
held in
           23434            0
RAM        6545463          10
           874382           22



           Num keys Key 1        Key 1      Value 1 Value 1 Key/values 2,3,4…
           with hash size        (byte [ ]) size    (byte[ ])
           (VInt)    (VInt)                 (Vint)
Disk
           1           3         Foo       3      Bar
           2           5         Hello     5      World    7,Bonjour,8,Le Mon..

          Most hashes have only one associated key and value     Some hashes will
                                                                have key collisions
                                                               requiring the use of
                                                               extra columns here
Read logic (pseudo code)
int keyHash=hash(searchKey);
int filePointer=ramMap.get(keyHash);            There is a
if filePointer is null                         guaranteed
                                            maximum of one
      return null for value;               random disk seek
file.seek(filePointer);                      for any lookup
int numKeysWithHash=file.readInt()
for numKeysWithHash                          With a good
{                                          hashing function
                                           most lookups will
      storedKey=file.readKeyData();         only need to go
      if(storedKey==searchKey)             once around this
            return file.readValueData();          loop

      file.readValueData();
}
Write logic (pseudo code)
                                                                 Updates will
int keyHash=hash(newKey);                                    always append to
int oldFilePointer=ramMap.get(keyHash);                         the end of the
ramMap.put(keyHash,file.length());                           file, leaving older
if oldFilePointer is null                                           values
{                                                               unreferenced
       file.append(1);//only 1 key with hash
       file.append(newKey);
       file.append(newValue);                                In case of any key
}else                                                        collisions, previou
{                                                             sly stored values
       file.seek(oldFilePointer);                             are copied to the
       int numOldKeys=file.readInt();                          new position at
       Map tmpMap=file.readNextNKeysAndValues(numOldKeys);    the end of the file
       tmpMap.put(newKey,newValue);                             along with the
       file.append(tmpMap.size());                               new content
       file.appendKeysAndValues(tmpMap);
}
Segment generations:
                  writes
             Hash        Pointer   Hash         Pointer   Hash        Pointer   Hash        Pointer
Maps held    23434       0         203765       0         23434       0         15243       0
                                                                                                       Writes append to
  in RAM     65463       10        37594        10        65463       10
                                                                                3                        the end of the
                                                                                74229       10         latest generation
             …           …         …            …         …           …         7
                                                                                …           …
                                                                                                        segment until it
                                                                                                         reaches a set
  Key and                                                                                                 size then it is
                                                                                        3
value disk                                                                                              made read-only
                     0                      1                     2
                                                                                                             and new
    stores
                                                                                                           segment is
             old                                                                                             created.
                                                                                                 new
Segment generations:
                  reads
Maps held    Hash        Pointer   Hash         Pointer   Hash        Pointer   Hash     Pointer   Read operations
             23434       0         203765       0         23434       0         15243     0         search memory
  in RAM     65463       10        37594        10        65463       10
                                                                                3
                                                                                74229     10
                                                                                                    maps in reverse
             …           …         …            …         …           …         7                   order. The first
                                                                                …         …
                                                                                                   map found with a
                                                                                                   hash is expected
  Key and                                                                               3          to have a pointer
value disk           0                      1                     2                               into its associated
    stores                                                                                       file for all the latest
                                                                                                   keys/values with
             old                                                                             new        this hash
Segment generations:
                 merges
             Hash        Pointer   Hash        Pointer   Hash        Pointer   Hash        Pointer
Maps held    23434       0         20376       0         23434       0         15243       0
  in RAM     65463       10
                                   5
                                                         65463       10
                                                                               3
                                   37594       10                              74229       10
             …           …                               …           …         7
                                   …           …
                                                                               …           …


  Key and                                                                              3
value disk           0                     1                     2
    stores



 A background thread
  merges read-only
 segments with many                                      4
 outdated entries into
 new, more compact
       versions
Segment generations:
             durability
             Hash        Pointer   Hash         Pointer   Hash         Pointer
Maps held    23434       0         203765       0         152433       0
  in RAM     65463       10        37594        10        742297       10

             …           …         …            …         …            …




  Key and                                                          3
value disk           0                      4
    stores

                                                                                 Completed     0,4
                                                                                 Segment IDs
                                                                                 Active        3
          Like Lucene, commit                                                    Segment ID

       operations create a new                                                   Active        423423
                                                                                 segment
     generation of a “segments”                                                  committed
                                                                                 length
      file, the contents of which
      reflect the committed (i.e.
     fsync’ed state of the store.)
Implementation details
JVM needs sufficient RAM for 2 ints for every active key
(note: using “modulo N” on the hash can reduce RAM max
to Nx2 ints at the cost of more key collisions = more disk
IO)
Uses Lucene Directory for
   Abstraction from choice of file system
   Buffered reads/writes
   Support for Vint encoding of numbers
   Rate-limited merge operations

Borrows successful Lucene concepts:
   Multiple segments flushed then made read-only.
   “Segments” file used to list committed content (could
   potentially support multiple commit points)
   Background merges

Uses LGPL “Trove” for maps of primitives

Más contenido relacionado

Similar a Lucene KV-Store

General commands for navisphere cli
General commands for navisphere cliGeneral commands for navisphere cli
General commands for navisphere cli
msaleh1234
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
exsuns
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
Etsuji Nakai
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
injae yeo
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
Redis Labs
 

Similar a Lucene KV-Store (19)

General commands for navisphere cli
General commands for navisphere cliGeneral commands for navisphere cli
General commands for navisphere cli
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValuesColumn Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Implementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codesImplementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codes
 
RuG Guest Lecture
RuG Guest LectureRuG Guest Lecture
RuG Guest Lecture
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Plebeia, a new storage for Tezos blockchain state
Plebeia, a new storage for Tezos blockchain statePlebeia, a new storage for Tezos blockchain state
Plebeia, a new storage for Tezos blockchain state
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime Internals
 
Process and Threads in Linux - PPT
Process and Threads in Linux - PPTProcess and Threads in Linux - PPT
Process and Threads in Linux - PPT
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
 
Devoxx 17 - Swift server-side
Devoxx 17 - Swift server-sideDevoxx 17 - Swift server-side
Devoxx 17 - Swift server-side
 
About memcached
About memcachedAbout memcached
About memcached
 
Linux Survival Kit for Proof of Concept & Proof of Technology
Linux Survival Kit for Proof of Concept & Proof of TechnologyLinux Survival Kit for Proof of Concept & Proof of Technology
Linux Survival Kit for Proof of Concept & Proof of Technology
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
Top ESXi command line v2.0
Top ESXi command line v2.0Top ESXi command line v2.0
Top ESXi command line v2.0
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Lucene KV-Store

  • 1. Lucene KV-Store A high-performance key-value store Mark Harwood
  • 2. Benefits High-speed reads and writes of key/value pairs sustained over growing volumes of data Read costs are always 0 or 1 disk seek Efficient use of memory Simple file structures with strong durability guarantees
  • 3. Why “Lucene” KV store? Uses Lucene’s “Directory” APIs for low-level file access Based on Lucene’s concepts of segment files, soft deletes, background merges, commit points etc BUT a fundamentally different form of index I’d like to offer it to the Lucene community as a “contrib” module because they have a track record in optimizing these same concepts (and could potentially make use of it in Lucene?)
  • 4. Example benchmark results Note, regular Lucene search indexes follow the same trajectory of the “Common KV Store” when it comes to lookups on a store with millions of keys
  • 5. KV-Store High-level Design Map Key hash (int) Disk pointer (int) held in 23434 0 RAM 6545463 10 874382 22 Num keys Key 1 Key 1 Value 1 Value 1 Key/values 2,3,4… with hash size (byte [ ]) size (byte[ ]) (VInt) (VInt) (Vint) Disk 1 3 Foo 3 Bar 2 5 Hello 5 World 7,Bonjour,8,Le Mon.. Most hashes have only one associated key and value Some hashes will have key collisions requiring the use of extra columns here
  • 6. Read logic (pseudo code) int keyHash=hash(searchKey); int filePointer=ramMap.get(keyHash); There is a if filePointer is null guaranteed maximum of one return null for value; random disk seek file.seek(filePointer); for any lookup int numKeysWithHash=file.readInt() for numKeysWithHash With a good { hashing function most lookups will storedKey=file.readKeyData(); only need to go if(storedKey==searchKey) once around this return file.readValueData(); loop file.readValueData(); }
  • 7. Write logic (pseudo code) Updates will int keyHash=hash(newKey); always append to int oldFilePointer=ramMap.get(keyHash); the end of the ramMap.put(keyHash,file.length()); file, leaving older if oldFilePointer is null values { unreferenced file.append(1);//only 1 key with hash file.append(newKey); file.append(newValue); In case of any key }else collisions, previou { sly stored values file.seek(oldFilePointer); are copied to the int numOldKeys=file.readInt(); new position at Map tmpMap=file.readNextNKeysAndValues(numOldKeys); the end of the file tmpMap.put(newKey,newValue); along with the file.append(tmpMap.size()); new content file.appendKeysAndValues(tmpMap); }
  • 8. Segment generations: writes Hash Pointer Hash Pointer Hash Pointer Hash Pointer Maps held 23434 0 203765 0 23434 0 15243 0 Writes append to in RAM 65463 10 37594 10 65463 10 3 the end of the 74229 10 latest generation … … … … … … 7 … … segment until it reaches a set Key and size then it is 3 value disk made read-only 0 1 2 and new stores segment is old created. new
  • 9. Segment generations: reads Maps held Hash Pointer Hash Pointer Hash Pointer Hash Pointer Read operations 23434 0 203765 0 23434 0 15243 0 search memory in RAM 65463 10 37594 10 65463 10 3 74229 10 maps in reverse … … … … … … 7 order. The first … … map found with a hash is expected Key and 3 to have a pointer value disk 0 1 2 into its associated stores file for all the latest keys/values with old new this hash
  • 10. Segment generations: merges Hash Pointer Hash Pointer Hash Pointer Hash Pointer Maps held 23434 0 20376 0 23434 0 15243 0 in RAM 65463 10 5 65463 10 3 37594 10 74229 10 … … … … 7 … … … … Key and 3 value disk 0 1 2 stores A background thread merges read-only segments with many 4 outdated entries into new, more compact versions
  • 11. Segment generations: durability Hash Pointer Hash Pointer Hash Pointer Maps held 23434 0 203765 0 152433 0 in RAM 65463 10 37594 10 742297 10 … … … … … … Key and 3 value disk 0 4 stores Completed 0,4 Segment IDs Active 3 Like Lucene, commit Segment ID operations create a new Active 423423 segment generation of a “segments” committed length file, the contents of which reflect the committed (i.e. fsync’ed state of the store.)
  • 12. Implementation details JVM needs sufficient RAM for 2 ints for every active key (note: using “modulo N” on the hash can reduce RAM max to Nx2 ints at the cost of more key collisions = more disk IO) Uses Lucene Directory for Abstraction from choice of file system Buffered reads/writes Support for Vint encoding of numbers Rate-limited merge operations Borrows successful Lucene concepts: Multiple segments flushed then made read-only. “Segments” file used to list committed content (could potentially support multiple commit points) Background merges Uses LGPL “Trove” for maps of primitives