SlideShare una empresa de Scribd logo
1 de 22
LevelDB                      Riak
  Some key-value stores using
        log-structure
                            Zhichao Liang
                  frankey0207@gmail.com
Outline
•   Why log structure?
•   Riak: log-structure hash table
•   Rethinkdb: log-structure b-tree
•   Leveldb: log-structure merge tree
•   Conclusion
Outline
•   Why log structure?
•   Riak: log-structure hash table
•   Rethinkdb: log-structure b-tree
•   Leveldb: log-structure merge tree
•   Conclusion
Log Structure
• A log-structured file system is a file system design first
  proposed in 1988 by John K. Ousterhout and Fred Douglis.
• Design for high write throughput, all updates to data and
  metadata are written sequentially to a continuous stream,
  called a log.
• Conventional file systems tend
  to lay out files with great care for
  spatial locality and make in-place
  changes to their data structures.
Log Structure for flash memory
• Random write degrades the system performance and shrinks
  the lifetime of flash memory.
• Log structure is flash-friendly natively!

   Magnetic Disk     Flash Memory

       new data 1
         data 1
          free           new data 1
                           erased
                           data 1
                            free              RAM




                                      block
         data 2
          free             erased
                           data 2
                            free                    data 2
                                                     free
       new data 3
         data 3
          free             erased
                           data 3
                            free                    data 3
                                                     free
         data 4
          free              free                     free

                                      block
          free              free
          free              free
Outline
•   Why log structure?
•   Riak: log-structure hash table
•   Rethinkdb: log-structure b-tree
•   Leveldb: log-structure merge tree
•   Conclusion
Riak ?
• Riak is an open source, highly scalable, fault-tolerant
  distributed database.
• Supported core features:
  - operate in highly distributed environments
  - no single point of failure
  - highly fault-tolerant
  - scales simply and intelligently
  - highly data available
  - low cost of operations
Bitcask
• A Bitcask instance is a directory, and only one
  operating system process will open that Bitcask for
  writing at a given time.
• The active file is only written by appending, which
  means that sequential writes do not require disk
  seeking.
Hash Index: keydir
• A keydir is simply a hash table that maps every key in
  a Bitcask to a fixed-size structure giving the file, offset
  and size of the most recently written entry for that
  key .
Merge
• The merge process iterates over all non-active file
  and produces as output a set of data files containing
  only the “live” or latest versions of each present key.
• During the merge process, for each merged data file,
  a byproduct called hint file is generated, which can
  be used to make startup and crash recovery easy.
Outline
•   Why log structure?
•   Riak: log-structure hash table
•   Rethinkdb: log-structure b-tree
•   Leveldb: log-structure merge tree
•   Conclusion
RethinkDB ?
• RethinkDB is a persistent, industrial-strength key-value store
  with full support for the Memcached protocol.
• Powerful technology:
  - Linear scaling across cores
  - Fine-grained durability control
  - Instantaneous recovery on power failure
• Supported core features:
  - Atomic increment/decrement
  - Values up to 10MB in size
  - Multi-GET support
  - Up to one million transactions per second on commodity hardware
Installation & usage
• RethinkDB works on modern 64-bit distributions of
  Linux.
    Ubuntu 10.04.1 x86_64                      Ubuntu 10.10 x86_64
    Red Hat Enterprise Linux 5 x86_64          CentOS 5 x86_64
    SUSE Linux 10

• Running the rethinkdb server:
     Default installation path: /usr/bin/rethinkdb-1.0
     ./rethinkdb-1.0 -f /u01/rethinkdb_data
     ./rethinkdb-1.0 -f /u01/rethinkdb_data -c 4 -p 11500
     ./rethinkdb-1.0 -f /u01/rethinkdb_data
                      -f /u03/rethinkdb_data -c 4 -p 11500
The methodology
• Firstly, lack of mechanical parts makes random reads
  on SSD are significantly efficient!
• Secondly, random writes trigger more erases, making
  these operations expensive, and decreasing the drive
  lifetime!
• RethinkDB takes an append-only approach to storing
  data, pioneered by log-structured file system!
            What are the
       consequences of appen-
               only ?
Append-only consequences
      Data Consistency
                            1) eliminating data locality
        Hot Backups         requires a larger number of
                            disk access
   Instantaneous Recovery
       Easy Replication
                            2) large amount of data that
                            quickly becomes obsolete in
    Lock-Free Concurrency
                            an environment with a
                            heavy insert or update
    Live Schema Changes     workload
     Database Snapshots
Append-only B-tree
            Page 1                  15                                           Page 1               15



  Page 2    5        9             Page 3     15        19              Page 3     15       19




Data File   5        15       9          19        15        … …



Page 1          15        Page 2     5        9    Page 3          15      19      Page 3        15    19

                                                                                                 Page 1     15
Outline
•   Why log structure?
•   Riak: log-structure hash table
•   Rethinkdb: log-structure b-tree
•   Leveldb: log-structure merge tree
•   Conclusion
LevelDB ?
• LevelDB is a fast key-value storage library written at
  Google that provides an ordered mapping from string
  keys to string values.
• Supported core features:
  - Data is stored sorted by key
  - Multiple changes can be made in one atomic batch
  - Users can create a transient snapshot to get a consistent
  view of data
  - Data is automatically compressed using the Snappy
  compression library
Installation & usage
• LevelDB works with snappy, which is a compression
  /decompression library.
    download snappy from http://code.google.com/p/snappy/
    cd snappy-1.0.4
    ./configure && make && make install

• It is a library, no database server!
    svn checkout http://leveldb.googlecode.com/svn/trunk/leveldb-read-only
    cd leveldb-read-only
    make && cp libleveldb.a /usr/local/lib &&     libleveldb.a
    cp -r include/leveldb /usr/local/include
Log-structure merge tree
• Log file: a log file (*.log) stores a sequence of recent
  updates and each update isRead    appended to the current
    Memtable
  log file.                                      Memory
• Memtable: a in-memory strcucture keeps a copy of  Disk
                            SSTable     SSTable
  theLog file log file.
      current
                            SSTable     SSTable
• Sorted tables: a sorted table (*.sst) stores a…sequence
                                                     …




                                       ……
                            SSTable
  of entries sorted by key and each entry is either a
        Write               SSTable     SSTable
  value for the key, or a deletion marker for the key.
                         Level-0    Level-1
Outline
•   Why log structure?
•   Riak: log-structure hash table
•   Rethinkdb: log-structure b-tree
•   Leveldb: log-structure merge tree
•   Conclusion
Conclusion
• Log-structure enjoys high write throughput and
  makes data consistency, hot backups, recovery and
  snapshot easy.
• Log-structure eliminates the data locality, queries
  require a larger number of random disk access
  consequently.
• An excellent garbage collection method can be very
  important to log-structure storage system.

Más contenido relacionado

La actualidad más candente

Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank:  Rocking the Database World with RocksDBThe Hive Think Tank:  Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
M|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB ServerM|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB ServerMariaDB plc
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good? Alkin Tezuysal
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalabGluster.org
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster PerformanceGluster.org
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013Udo Seidel
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigSelena Deckelmann
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScyllaDB
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster.org
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
 
Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Ashutosh Bapat
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXTim Callaghan
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XCAshutosh Bapat
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterGluster.org
 

La actualidad más candente (20)

Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank:  Rocking the Database World with RocksDBThe Hive Think Tank:  Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
Write behind logging
Write behind loggingWrite behind logging
Write behind logging
 
M|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB ServerM|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB Server
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good?
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmap
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XC
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_gluster
 

Similar a Some key value stores using log-structure

Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)referenceFromDual GmbH
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Alluxio, Inc.
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)Ontico
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Databricks
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Tachyon Nexus, Inc.
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...StampedeCon
 
DRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
 
Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Kaiyao Huang
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0Ted Wennmark
 
Oracle db architecture
Oracle db architectureOracle db architecture
Oracle db architectureSimon Huang
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 

Similar a Some key value stores using log-structure (20)

Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Oracle DB
Oracle DBOracle DB
Oracle DB
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
Why MariaDB?
Why MariaDB?Why MariaDB?
Why MariaDB?
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
 
DRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
 
Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控
 
Oracle DBA
Oracle DBAOracle DBA
Oracle DBA
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
Oracle db architecture
Oracle db architectureOracle db architecture
Oracle db architecture
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 

Más de Zhichao Liang

微软Bot framework简介
微软Bot framework简介微软Bot framework简介
微软Bot framework简介Zhichao Liang
 
青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker RegistryZhichao Liang
 
开源Pass平台flynn功能简介
开源Pass平台flynn功能简介开源Pass平台flynn功能简介
开源Pass平台flynn功能简介Zhichao Liang
 
青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes 青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes Zhichao Liang
 
Introduction of own cloud
Introduction of own cloudIntroduction of own cloud
Introduction of own cloudZhichao Liang
 
Power drill列存储底层设计
Power drill列存储底层设计Power drill列存储底层设计
Power drill列存储底层设计Zhichao Liang
 
C store底层存储设计
C store底层存储设计C store底层存储设计
C store底层存储设计Zhichao Liang
 
Storage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsStorage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsZhichao Liang
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redisZhichao Liang
 
A novel method to extend flash memory lifetime in flash based dbms
A novel method to extend flash memory lifetime in flash based dbmsA novel method to extend flash memory lifetime in flash based dbms
A novel method to extend flash memory lifetime in flash based dbmsZhichao Liang
 
Sub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseSub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseZhichao Liang
 
Hush…tell you something novel about flash memory
Hush…tell you something novel about flash memoryHush…tell you something novel about flash memory
Hush…tell you something novel about flash memoryZhichao Liang
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage systemZhichao Liang
 

Más de Zhichao Liang (14)

微软Bot framework简介
微软Bot framework简介微软Bot framework简介
微软Bot framework简介
 
青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry
 
开源Pass平台flynn功能简介
开源Pass平台flynn功能简介开源Pass平台flynn功能简介
开源Pass平台flynn功能简介
 
青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes 青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes
 
Introduction of own cloud
Introduction of own cloudIntroduction of own cloud
Introduction of own cloud
 
Power drill列存储底层设计
Power drill列存储底层设计Power drill列存储底层设计
Power drill列存储底层设计
 
C store底层存储设计
C store底层存储设计C store底层存储设计
C store底层存储设计
 
Storage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsStorage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System Impacts
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
 
Memcached简介
Memcached简介Memcached简介
Memcached简介
 
A novel method to extend flash memory lifetime in flash based dbms
A novel method to extend flash memory lifetime in flash based dbmsA novel method to extend flash memory lifetime in flash based dbms
A novel method to extend flash memory lifetime in flash based dbms
 
Sub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseSub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based database
 
Hush…tell you something novel about flash memory
Hush…tell you something novel about flash memoryHush…tell you something novel about flash memory
Hush…tell you something novel about flash memory
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Some key value stores using log-structure

  • 1. LevelDB Riak Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com
  • 2. Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
  • 3. Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
  • 4. Log Structure • A log-structured file system is a file system design first proposed in 1988 by John K. Ousterhout and Fred Douglis. • Design for high write throughput, all updates to data and metadata are written sequentially to a continuous stream, called a log. • Conventional file systems tend to lay out files with great care for spatial locality and make in-place changes to their data structures.
  • 5. Log Structure for flash memory • Random write degrades the system performance and shrinks the lifetime of flash memory. • Log structure is flash-friendly natively! Magnetic Disk Flash Memory new data 1 data 1 free new data 1 erased data 1 free RAM block data 2 free erased data 2 free data 2 free new data 3 data 3 free erased data 3 free data 3 free data 4 free free free block free free free free
  • 6. Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
  • 7. Riak ? • Riak is an open source, highly scalable, fault-tolerant distributed database. • Supported core features: - operate in highly distributed environments - no single point of failure - highly fault-tolerant - scales simply and intelligently - highly data available - low cost of operations
  • 8. Bitcask • A Bitcask instance is a directory, and only one operating system process will open that Bitcask for writing at a given time. • The active file is only written by appending, which means that sequential writes do not require disk seeking.
  • 9. Hash Index: keydir • A keydir is simply a hash table that maps every key in a Bitcask to a fixed-size structure giving the file, offset and size of the most recently written entry for that key .
  • 10. Merge • The merge process iterates over all non-active file and produces as output a set of data files containing only the “live” or latest versions of each present key. • During the merge process, for each merged data file, a byproduct called hint file is generated, which can be used to make startup and crash recovery easy.
  • 11. Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
  • 12. RethinkDB ? • RethinkDB is a persistent, industrial-strength key-value store with full support for the Memcached protocol. • Powerful technology: - Linear scaling across cores - Fine-grained durability control - Instantaneous recovery on power failure • Supported core features: - Atomic increment/decrement - Values up to 10MB in size - Multi-GET support - Up to one million transactions per second on commodity hardware
  • 13. Installation & usage • RethinkDB works on modern 64-bit distributions of Linux. Ubuntu 10.04.1 x86_64 Ubuntu 10.10 x86_64 Red Hat Enterprise Linux 5 x86_64 CentOS 5 x86_64 SUSE Linux 10 • Running the rethinkdb server: Default installation path: /usr/bin/rethinkdb-1.0 ./rethinkdb-1.0 -f /u01/rethinkdb_data ./rethinkdb-1.0 -f /u01/rethinkdb_data -c 4 -p 11500 ./rethinkdb-1.0 -f /u01/rethinkdb_data -f /u03/rethinkdb_data -c 4 -p 11500
  • 14. The methodology • Firstly, lack of mechanical parts makes random reads on SSD are significantly efficient! • Secondly, random writes trigger more erases, making these operations expensive, and decreasing the drive lifetime! • RethinkDB takes an append-only approach to storing data, pioneered by log-structured file system! What are the consequences of appen- only ?
  • 15. Append-only consequences Data Consistency 1) eliminating data locality Hot Backups requires a larger number of disk access Instantaneous Recovery Easy Replication 2) large amount of data that quickly becomes obsolete in Lock-Free Concurrency an environment with a heavy insert or update Live Schema Changes workload Database Snapshots
  • 16. Append-only B-tree Page 1 15 Page 1 15 Page 2 5 9 Page 3 15 19 Page 3 15 19 Data File 5 15 9 19 15 … … Page 1 15 Page 2 5 9 Page 3 15 19 Page 3 15 19 Page 1 15
  • 17. Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
  • 18. LevelDB ? • LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. • Supported core features: - Data is stored sorted by key - Multiple changes can be made in one atomic batch - Users can create a transient snapshot to get a consistent view of data - Data is automatically compressed using the Snappy compression library
  • 19. Installation & usage • LevelDB works with snappy, which is a compression /decompression library. download snappy from http://code.google.com/p/snappy/ cd snappy-1.0.4 ./configure && make && make install • It is a library, no database server! svn checkout http://leveldb.googlecode.com/svn/trunk/leveldb-read-only cd leveldb-read-only make && cp libleveldb.a /usr/local/lib && libleveldb.a cp -r include/leveldb /usr/local/include
  • 20. Log-structure merge tree • Log file: a log file (*.log) stores a sequence of recent updates and each update isRead appended to the current Memtable log file. Memory • Memtable: a in-memory strcucture keeps a copy of Disk SSTable SSTable theLog file log file. current SSTable SSTable • Sorted tables: a sorted table (*.sst) stores a…sequence … …… SSTable of entries sorted by key and each entry is either a Write SSTable SSTable value for the key, or a deletion marker for the key. Level-0 Level-1
  • 21. Outline • Why log structure? • Riak: log-structure hash table • Rethinkdb: log-structure b-tree • Leveldb: log-structure merge tree • Conclusion
  • 22. Conclusion • Log-structure enjoys high write throughput and makes data consistency, hot backups, recovery and snapshot easy. • Log-structure eliminates the data locality, queries require a larger number of random disk access consequently. • An excellent garbage collection method can be very important to log-structure storage system.