SlideShare a Scribd company logo
Lucene KV-Store
A high-performance key-value store




                                     Mark Harwood
Benefits
High-speed reads and writes of key/value pairs
sustained over growing volumes of data

Read costs are always 0 or 1 disk seek

Efficient use of memory

Simple file structures with strong durability
guarantees
Why “Lucene” KV store?
Uses Lucene’s “Directory” APIs for low-level file
access

Based on Lucene’s concepts of segment
files, soft deletes, background merges, commit
points etc BUT a fundamentally different form of
index

I’d like to offer it to the Lucene community as a
“contrib” module because they have a track
record in optimizing these same concepts (and
could potentially make use of it in Lucene?)
Example benchmark
               results




Note, regular Lucene search indexes follow the same trajectory of the
“Common KV Store” when it comes to lookups on a store with millions
of keys
KV-Store High-level Design
Map        Key hash (int)   Disk pointer
                            (int)
held in
           23434            0
RAM        6545463          10
           874382           22



           Num keys Key 1        Key 1      Value 1 Value 1 Key/values 2,3,4…
           with hash size        (byte [ ]) size    (byte[ ])
           (VInt)    (VInt)                 (Vint)
Disk
           1           3         Foo       3      Bar
           2           5         Hello     5      World    7,Bonjour,8,Le Mon..

          Most hashes have only one associated key and value     Some hashes will
                                                                have key collisions
                                                               requiring the use of
                                                               extra columns here
Read logic (pseudo code)
int keyHash=hash(searchKey);
int filePointer=ramMap.get(keyHash);            There is a
if filePointer is null                         guaranteed
                                            maximum of one
      return null for value;               random disk seek
file.seek(filePointer);                      for any lookup
int numKeysWithHash=file.readInt()
for numKeysWithHash                          With a good
{                                          hashing function
                                           most lookups will
      storedKey=file.readKeyData();         only need to go
      if(storedKey==searchKey)             once around this
            return file.readValueData();          loop

      file.readValueData();
}
Write logic (pseudo code)
                                                                 Updates will
int keyHash=hash(newKey);                                    always append to
int oldFilePointer=ramMap.get(keyHash);                         the end of the
ramMap.put(keyHash,file.length());                           file, leaving older
if oldFilePointer is null                                           values
{                                                               unreferenced
       file.append(1);//only 1 key with hash
       file.append(newKey);
       file.append(newValue);                                In case of any key
}else                                                        collisions, previou
{                                                             sly stored values
       file.seek(oldFilePointer);                             are copied to the
       int numOldKeys=file.readInt();                          new position at
       Map tmpMap=file.readNextNKeysAndValues(numOldKeys);    the end of the file
       tmpMap.put(newKey,newValue);                             along with the
       file.append(tmpMap.size());                               new content
       file.appendKeysAndValues(tmpMap);
}
Segment generations:
                  writes
             Hash        Pointer   Hash         Pointer   Hash        Pointer   Hash        Pointer
Maps held    23434       0         203765       0         23434       0         15243       0
                                                                                                       Writes append to
  in RAM     65463       10        37594        10        65463       10
                                                                                3                        the end of the
                                                                                74229       10         latest generation
             …           …         …            …         …           …         7
                                                                                …           …
                                                                                                        segment until it
                                                                                                         reaches a set
  Key and                                                                                                 size then it is
                                                                                        3
value disk                                                                                              made read-only
                     0                      1                     2
                                                                                                             and new
    stores
                                                                                                           segment is
             old                                                                                             created.
                                                                                                 new
Segment generations:
                  reads
Maps held    Hash        Pointer   Hash         Pointer   Hash        Pointer   Hash     Pointer   Read operations
             23434       0         203765       0         23434       0         15243     0         search memory
  in RAM     65463       10        37594        10        65463       10
                                                                                3
                                                                                74229     10
                                                                                                    maps in reverse
             …           …         …            …         …           …         7                   order. The first
                                                                                …         …
                                                                                                   map found with a
                                                                                                   hash is expected
  Key and                                                                               3          to have a pointer
value disk           0                      1                     2                               into its associated
    stores                                                                                       file for all the latest
                                                                                                   keys/values with
             old                                                                             new        this hash
Segment generations:
                 merges
             Hash        Pointer   Hash        Pointer   Hash        Pointer   Hash        Pointer
Maps held    23434       0         20376       0         23434       0         15243       0
  in RAM     65463       10
                                   5
                                                         65463       10
                                                                               3
                                   37594       10                              74229       10
             …           …                               …           …         7
                                   …           …
                                                                               …           …


  Key and                                                                              3
value disk           0                     1                     2
    stores



 A background thread
  merges read-only
 segments with many                                      4
 outdated entries into
 new, more compact
       versions
Segment generations:
             durability
             Hash        Pointer   Hash         Pointer   Hash         Pointer
Maps held    23434       0         203765       0         152433       0
  in RAM     65463       10        37594        10        742297       10

             …           …         …            …         …            …




  Key and                                                          3
value disk           0                      4
    stores

                                                                                 Completed     0,4
                                                                                 Segment IDs
                                                                                 Active        3
          Like Lucene, commit                                                    Segment ID

       operations create a new                                                   Active        423423
                                                                                 segment
     generation of a “segments”                                                  committed
                                                                                 length
      file, the contents of which
      reflect the committed (i.e.
     fsync’ed state of the store.)
Implementation details
JVM needs sufficient RAM for 2 ints for every active key
(note: using “modulo N” on the hash can reduce RAM max
to Nx2 ints at the cost of more key collisions = more disk
IO)
Uses Lucene Directory for
   Abstraction from choice of file system
   Buffered reads/writes
   Support for Vint encoding of numbers
   Rate-limited merge operations

Borrows successful Lucene concepts:
   Multiple segments flushed then made read-only.
   “Segments” file used to list committed content (could
   potentially support multiple commit points)
   Background merges

Uses LGPL “Trove” for maps of primitives

More Related Content

Similar to Lucene KV-Store

General commands for navisphere cli
General commands for navisphere cliGeneral commands for navisphere cli
General commands for navisphere climsaleh1234
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simonlucenerevolution
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Lucidworks (Archived)
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Implementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codesImplementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codesKhadidja BOUKREDIMI
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReducefvanvollenhoven
 
Plebeia, a new storage for Tezos blockchain state
Plebeia, a new storage for Tezos blockchain statePlebeia, a new storage for Tezos blockchain state
Plebeia, a new storage for Tezos blockchain stateJun Furuse
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsCheng Lian
 
Process and Threads in Linux - PPT
Process and Threads in Linux - PPTProcess and Threads in Linux - PPT
Process and Threads in Linux - PPTQUONTRASOLUTIONS
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationEtsuji Nakai
 
Linux Survival Kit for Proof of Concept & Proof of Technology
Linux Survival Kit for Proof of Concept & Proof of TechnologyLinux Survival Kit for Proof of Concept & Proof of Technology
Linux Survival Kit for Proof of Concept & Proof of TechnologyNugroho Gito
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: IntroductionBrendan Gregg
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introductioninjae yeo
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedis Labs
 

Similar to Lucene KV-Store (19)

General commands for navisphere cli
General commands for navisphere cliGeneral commands for navisphere cli
General commands for navisphere cli
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValuesColumn Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Implementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codesImplementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codes
 
RuG Guest Lecture
RuG Guest LectureRuG Guest Lecture
RuG Guest Lecture
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Plebeia, a new storage for Tezos blockchain state
Plebeia, a new storage for Tezos blockchain statePlebeia, a new storage for Tezos blockchain state
Plebeia, a new storage for Tezos blockchain state
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime Internals
 
Process and Threads in Linux - PPT
Process and Threads in Linux - PPTProcess and Threads in Linux - PPT
Process and Threads in Linux - PPT
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
 
Devoxx 17 - Swift server-side
Devoxx 17 - Swift server-sideDevoxx 17 - Swift server-side
Devoxx 17 - Swift server-side
 
About memcached
About memcachedAbout memcached
About memcached
 
Linux Survival Kit for Proof of Concept & Proof of Technology
Linux Survival Kit for Proof of Concept & Proof of TechnologyLinux Survival Kit for Proof of Concept & Proof of Technology
Linux Survival Kit for Proof of Concept & Proof of Technology
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
Top ESXi command line v2.0
Top ESXi command line v2.0Top ESXi command line v2.0
Top ESXi command line v2.0
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
 

Recently uploaded

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024TopCSSGallery
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 

Recently uploaded (20)

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 

Lucene KV-Store

  • 1. Lucene KV-Store A high-performance key-value store Mark Harwood
  • 2. Benefits High-speed reads and writes of key/value pairs sustained over growing volumes of data Read costs are always 0 or 1 disk seek Efficient use of memory Simple file structures with strong durability guarantees
  • 3. Why “Lucene” KV store? Uses Lucene’s “Directory” APIs for low-level file access Based on Lucene’s concepts of segment files, soft deletes, background merges, commit points etc BUT a fundamentally different form of index I’d like to offer it to the Lucene community as a “contrib” module because they have a track record in optimizing these same concepts (and could potentially make use of it in Lucene?)
  • 4. Example benchmark results Note, regular Lucene search indexes follow the same trajectory of the “Common KV Store” when it comes to lookups on a store with millions of keys
  • 5. KV-Store High-level Design Map Key hash (int) Disk pointer (int) held in 23434 0 RAM 6545463 10 874382 22 Num keys Key 1 Key 1 Value 1 Value 1 Key/values 2,3,4… with hash size (byte [ ]) size (byte[ ]) (VInt) (VInt) (Vint) Disk 1 3 Foo 3 Bar 2 5 Hello 5 World 7,Bonjour,8,Le Mon.. Most hashes have only one associated key and value Some hashes will have key collisions requiring the use of extra columns here
  • 6. Read logic (pseudo code) int keyHash=hash(searchKey); int filePointer=ramMap.get(keyHash); There is a if filePointer is null guaranteed maximum of one return null for value; random disk seek file.seek(filePointer); for any lookup int numKeysWithHash=file.readInt() for numKeysWithHash With a good { hashing function most lookups will storedKey=file.readKeyData(); only need to go if(storedKey==searchKey) once around this return file.readValueData(); loop file.readValueData(); }
  • 7. Write logic (pseudo code) Updates will int keyHash=hash(newKey); always append to int oldFilePointer=ramMap.get(keyHash); the end of the ramMap.put(keyHash,file.length()); file, leaving older if oldFilePointer is null values { unreferenced file.append(1);//only 1 key with hash file.append(newKey); file.append(newValue); In case of any key }else collisions, previou { sly stored values file.seek(oldFilePointer); are copied to the int numOldKeys=file.readInt(); new position at Map tmpMap=file.readNextNKeysAndValues(numOldKeys); the end of the file tmpMap.put(newKey,newValue); along with the file.append(tmpMap.size()); new content file.appendKeysAndValues(tmpMap); }
  • 8. Segment generations: writes Hash Pointer Hash Pointer Hash Pointer Hash Pointer Maps held 23434 0 203765 0 23434 0 15243 0 Writes append to in RAM 65463 10 37594 10 65463 10 3 the end of the 74229 10 latest generation … … … … … … 7 … … segment until it reaches a set Key and size then it is 3 value disk made read-only 0 1 2 and new stores segment is old created. new
  • 9. Segment generations: reads Maps held Hash Pointer Hash Pointer Hash Pointer Hash Pointer Read operations 23434 0 203765 0 23434 0 15243 0 search memory in RAM 65463 10 37594 10 65463 10 3 74229 10 maps in reverse … … … … … … 7 order. The first … … map found with a hash is expected Key and 3 to have a pointer value disk 0 1 2 into its associated stores file for all the latest keys/values with old new this hash
  • 10. Segment generations: merges Hash Pointer Hash Pointer Hash Pointer Hash Pointer Maps held 23434 0 20376 0 23434 0 15243 0 in RAM 65463 10 5 65463 10 3 37594 10 74229 10 … … … … 7 … … … … Key and 3 value disk 0 1 2 stores A background thread merges read-only segments with many 4 outdated entries into new, more compact versions
  • 11. Segment generations: durability Hash Pointer Hash Pointer Hash Pointer Maps held 23434 0 203765 0 152433 0 in RAM 65463 10 37594 10 742297 10 … … … … … … Key and 3 value disk 0 4 stores Completed 0,4 Segment IDs Active 3 Like Lucene, commit Segment ID operations create a new Active 423423 segment generation of a “segments” committed length file, the contents of which reflect the committed (i.e. fsync’ed state of the store.)
  • 12. Implementation details JVM needs sufficient RAM for 2 ints for every active key (note: using “modulo N” on the hash can reduce RAM max to Nx2 ints at the cost of more key collisions = more disk IO) Uses Lucene Directory for Abstraction from choice of file system Buffered reads/writes Support for Vint encoding of numbers Rate-limited merge operations Borrows successful Lucene concepts: Multiple segments flushed then made read-only. “Segments” file used to list committed content (could potentially support multiple commit points) Background merges Uses LGPL “Trove” for maps of primitives