SlideShare una empresa de Scribd logo
1 de 27
FlashCache



Mohan Srinivasan
April 2011
FlashCache at Facebook
▪   What
        ▪   We want to use some Flash storage on existing servers
        ▪   We want something that is simple to deploy and use
        ▪   Our IO access patterns benefit from a cache

▪   Who
    ▪   Mohan Srinivasan – design and implementation
    ▪   Paul Saab – platform and MySQL integration
    ▪   Michael Jiang – testing, performance and capacity planning
    ▪   Mark Callaghan - benchmarketing
Introduction
▪   Block cache for Linux - write back and write through modes
▪   Layered below the filesystem at the top of the storage stack
▪   Cache Disk Blocks on fast persistent storage (Flash, SSD)
▪   Loadable Linux Kernel module, built using the Device Mapper (DM)
▪   Primary use case InnoDB, but general purpose
▪   Based on dm-cache by Prof. Ming
Caching Modes
Write Back                              Write Through, Write Around
 ▪   Lazy writing to disk                ▪   Non-persistent
 ▪   Persistent across reboot            ▪   Are you a pessimist?
 ▪   Persistent across device removal
Cache Structure
▪   Set associative hash
▪   Hash with fixed sized buckets (sets) with linear probing within a set
▪   512-way set associative by default
▪   dbn: Disk Block Number, address of block on disk
▪   Set = (dbn / block size / set size) mod (number of sets)
▪   Sequential range of dbns map onto a single sets
Cache Structure
                         .                .
      Set 0              .                .
                         .                .
                         .                .
                         .                .
                         .                .

                                       Block 0
                 Cache set Block 0
                                          .
                                          .
                                          .

                                      Block 511

                                          .
      SET i                               .
                                          .       N set’s worth
                                          .
                                          .       Of blocks
                                          .

                                       Block 0

                                          .
                Cache set Block 511       .
                                          .
                         .
                         .
                                      Block 511
                         .
                         .                .
                         .                .
      Set N-1
Replacement and Memory Footprint
▪   Replacement policy is FIFO (default) or LRU within a set
▪   Switch on the fly between FIFO/LRU (sysctl)
▪   Metadata per cache block: 16 bytes in memory, 16 bytes on ssd
▪   On ssd metadata per-slot
     ▪   <dbn, block state>

▪   In memory metadata per-slot:
     ▪   <dbn, block state, LRU chain pointers, misc>
Reads
▪   Compute cache set for dbn
▪   Cache Hit
     ▪   Verify checksums if configured
     ▪   Serve read out of cache

▪   Cache Miss
     ▪   Find free block or reclaim block based on replacement policy
     ▪   Read block from disk and populate cache
     ▪   Update block checksum if configured
     ▪   Return data to user
Write Through - writes
▪   Compute cache set for dbn
▪   Cache hit
      ▪   Get cached block

▪   Cache miss
      ▪   Find free block or reclaim block

▪   Write data block to disk
▪   Write data block to cache
▪   Update block checksum
Write Back - writes
▪   Compute cache set for dbn
▪   Cache Hit
     ▪   Write data block into cache
     ▪   If data block not DIRTY, synchronously update on-ssd cache metadata to
         mark block DIRTY

▪   Cache miss
     ▪   Find free block or reclaim block based on replacement policy
     ▪   Write data block to cache
     ▪   Synchronously update on-ssd cache metadata to mark block DIRTY
Small or uncacheable requests
▪   First invalidate blocks that overlap the requests
      ▪   There are at most 2 such blocks
      ▪   For Write Back, if the overlapping blocks are DIRTY they are cleaned
          first then invalidated

▪   Uncacheable full block reads are served from cache in case of a cache
    hit.
▪   Perform disk IO
▪   Repeat invalidation to close races which might have caused the block
    to be cached while the disk IO was in progress
Write Back policy
▪   Dirty blocks not recently accessed
      ▪   A clock-like algorithm picks off Dirty blocks not accessed in the last
          15 minutes (configurable) for cleaning
▪   When dirty blocks in a set exceeds configurable threshold, clean some
    blocks
      ▪   Blocks selected for writeback based on replacement policy
      ▪   Default dirty threshold 20%. Set higher for write heavy workloads

▪   Sort selected blocks and pickup any other blocks in set that can be
    contiguously merged with these
▪   Writes merged by the IO scheduler
Write Back – overheads
▪   In-Memory cache metadata memory footprint
     ▪   300GB/4KB cache -> ~1.2GB
     ▪   160GB/4KB cache -> ~640MB

▪   Cache metadata writes/file system write
     ▪   Worst case is 2 cache metadata updates per write
           ▪   (VALID->DIRTY, DIRTY->VALID)
     ▪   Average case is much lower because of cache write hits and batching of
         cache metadata updates
Write Through/Around – cache overheads
▪   In-Memory Cache metadata footprint
     ▪   300GB/4KB cache -> ~1.2GB
     ▪   160GB/4KB cache -> ~640MB

▪   Cache metadata writes per file system write
     ▪   1 cache data write per file system write (Write Through)
     ▪   No overhead (for Write Around)
Write Back – metadata updates
▪   Cache (on-ssd) metadata only updated on writes and block cleanings
    (VALID->DIRTY or DIRTY->VALID)
▪   Cache (on-ssd) metadata not updated on cache population for reads
▪   Reload after an unclean shutdown only loads DIRTY blocks
▪   Fast and Slow cache shutdowns
     ▪   Only metadata is written on fast shutdown. Reload loads both dirty and
         clean blocks
     ▪   Slow shutdown writes all dirty blocks to disk first, then writes out
         metadata to the ssd. Reload only loads clean blocks.

▪   Metadata updates to multiple blocks in same sector are batched
Torn Page Problem
▪   Handle partial block write caused by power failure or other causes
▪   Problem exists for Flashcache in Write Back mode
▪   Detected via block checksums
     ▪   Checksums are disabled by default
     ▪   Pages with bad checksums are not used

▪   Checksums increase cache metadata writes and memory footprint
     ▪   Update cache metadata checksums on DIRTY->DIRTY block transitions
         for Write Back
     ▪   Each per-cache slot grows by 8 bytes to hold the checksum (a 50%
         increase from 16 bytes to 24 bytes for the Write Back case).
Cache controls for Write Back
▪   Work best with O_DIRECT file access
▪   Global modes – Cache All or Cache Nothing
     ▪   Cache All has a blacklist of pids and tgids
     ▪   Cache Nothing has a whitelist of pids and tgids

▪   tgids can be used to tag all pthreads in the group as cacheable
▪   Exceptions for threads within a group are supported
▪   List changes done via FlashCache ioctls
▪   Cache can be read but is not written for non-cacheable tgids and pids
▪   We modified MySQL and scp to use this support
Cache Nothing policy
▪   If the thread id is whitelisted, cache all IOs for this thread
▪   If the tgid is whitelisted, cache all IOs for this thread
▪   If the thread id is blacklisted do not cache IOs
Cache control example
▪   We use Cache Nothing mode for MySQL servers
▪   The mysqld tgid is added to the whitelist
     ▪   All IO done by it is cacheable
     ▪   Writes done by other processes do not update the cache

▪   Full table scans done by mysqldump use a hint that directs mysqld to
    add the query’s thread id to the blacklist to avoid wiping FlashCache
     ▪   select /* SQL_NO_FCACHE */ pk, col1, col2 from foobar
Utilities
▪   flashcache_create
     ▪   flashcache_create -b 4k -s 10g mysql /dev/flash /dev/disk

▪   flashcache_destroy
     ▪   flashcache_destory /dev/flash

▪   flashcache_load
sysctl –a | grep flash
dev.flashcache.cache_all = 0          dev.flashcache.do_pid_expiry = 0

dev.flashcache.fast_remove = 0        dev.flashcache.max_clean_ios_set = 2

dev.flashcache.zero_stats = 1         dev.flashcache.max_clean_ios_total = 4

dev.flashcache.write_merge = 1        dev.flashcache.debug = 0

dev.flashcache.reclaim_policy = 0     dev.flashcache.dirty_thresh_pct = 20

dev.flashcache.pid_expiry_secs = 60   dev.flashcache.stop_sync = 0

dev.flashcache.max_pids = 100         dev.flashcache.do_sync = 0
Removing FlashCache
▪   umount /data
▪   dmesetup remove mysql
▪   flashcache_destroy /dev/flash
cat /proc/flashcache_stats
reads=4 writes=0 read_hits=0 read_hit_percent=0 write_hits=0
 write_hit_percent=0 dirty_write_hits=0 dirty_write_hit_percent=0
 replacement=0 write_replacement=0 write_invalidates=0
 read_invalidates=0 pending_enqueues=0 pending_inval=0
 metadata_dirties=0 metadata_cleans=0 cleanings=0 no_room=0
 front_merge=0 back_merge=0 nc_pid_adds=0 nc_pid_dels=0
 nc_pid_drops=0 nc_expiry=0 disk_reads=0 disk_writes=0
 ssd_reads=0 ssd_writes=0 uncached_reads=169
 uncached_writes=128
Future Work
▪   Cache mirroring
        ▪   SW RAID 0 block device as a cache

▪   Online cache resize
        ▪   No shutdown and recreate

▪   Support for ATA trim
    ▪   Discard blocks no longer in use
▪   Fix the torn page problem
    ▪   Use shadow pages
Resources
▪   GitHub : facebook/flashcache
▪   Mailing list : flashcache-dev@googlegroups.com
▪   http://facebook.com/MySQLatFacebook
▪   Email :
        ▪   mohan@fb.com (Mohan Srinivasan)
        ▪   ps@fb.com (Paul Saab)
        ▪   michael@fb.com (Michael Jiang)
        ▪   mcallaghan@fb.com (Mark Callaghan)
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Más contenido relacionado

La actualidad más candente

PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetAlexey Lesovsky
 
12cR2 Single-Tenant: Multitenant Features for All Editions
12cR2 Single-Tenant: Multitenant Features for All Editions12cR2 Single-Tenant: Multitenant Features for All Editions
12cR2 Single-Tenant: Multitenant Features for All EditionsFranck Pachot
 
MYSQLDUMP & ZRM COMMUNITY (EN)
MYSQLDUMP & ZRM COMMUNITY (EN)MYSQLDUMP & ZRM COMMUNITY (EN)
MYSQLDUMP & ZRM COMMUNITY (EN)Cédric P
 
GOTO 2013: Why Zalando trusts in PostgreSQL
GOTO 2013: Why Zalando trusts in PostgreSQLGOTO 2013: Why Zalando trusts in PostgreSQL
GOTO 2013: Why Zalando trusts in PostgreSQLHenning Jacobs
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and BeyondSage Weil
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for youTaco Scargo
 
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013odnoklassniki.ru
 
Varnish in action phpday2011
Varnish in action phpday2011Varnish in action phpday2011
Varnish in action phpday2011Combell NV
 
Caching and tuning fun for high scalability @ phpBenelux 2011
Caching and tuning fun for high scalability @ phpBenelux 2011Caching and tuning fun for high scalability @ phpBenelux 2011
Caching and tuning fun for high scalability @ phpBenelux 2011Wim Godden
 
Varnish in action phpuk11
Varnish in action phpuk11Varnish in action phpuk11
Varnish in action phpuk11Combell NV
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Control your service resources with systemd
 Control your service resources with systemd  Control your service resources with systemd
Control your service resources with systemd Marian Marinov
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) PostgreSQL Experts, Inc.
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016Tomas Vondra
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkXiaoxi Chen
 
Improve your storage with bcachefs
Improve your storage with bcachefsImprove your storage with bcachefs
Improve your storage with bcachefsMarian Marinov
 
Varnish in action confoo11
Varnish in action confoo11Varnish in action confoo11
Varnish in action confoo11Combell NV
 
Things I wish I knew about GemStone
Things I wish I knew about GemStoneThings I wish I knew about GemStone
Things I wish I knew about GemStoneESUG
 

La actualidad más candente (20)

PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 
12cR2 Single-Tenant: Multitenant Features for All Editions
12cR2 Single-Tenant: Multitenant Features for All Editions12cR2 Single-Tenant: Multitenant Features for All Editions
12cR2 Single-Tenant: Multitenant Features for All Editions
 
MYSQLDUMP & ZRM COMMUNITY (EN)
MYSQLDUMP & ZRM COMMUNITY (EN)MYSQLDUMP & ZRM COMMUNITY (EN)
MYSQLDUMP & ZRM COMMUNITY (EN)
 
GOTO 2013: Why Zalando trusts in PostgreSQL
GOTO 2013: Why Zalando trusts in PostgreSQLGOTO 2013: Why Zalando trusts in PostgreSQL
GOTO 2013: Why Zalando trusts in PostgreSQL
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for you
 
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
 
Dbdeployer
DbdeployerDbdeployer
Dbdeployer
 
Varnish in action phpday2011
Varnish in action phpday2011Varnish in action phpday2011
Varnish in action phpday2011
 
Caching and tuning fun for high scalability @ phpBenelux 2011
Caching and tuning fun for high scalability @ phpBenelux 2011Caching and tuning fun for high scalability @ phpBenelux 2011
Caching and tuning fun for high scalability @ phpBenelux 2011
 
Varnish in action phpuk11
Varnish in action phpuk11Varnish in action phpuk11
Varnish in action phpuk11
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Control your service resources with systemd
 Control your service resources with systemd  Control your service resources with systemd
Control your service resources with systemd
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmark
 
Improve your storage with bcachefs
Improve your storage with bcachefsImprove your storage with bcachefs
Improve your storage with bcachefs
 
Varnish in action confoo11
Varnish in action confoo11Varnish in action confoo11
Varnish in action confoo11
 
Things I wish I knew about GemStone
Things I wish I knew about GemStoneThings I wish I knew about GemStone
Things I wish I knew about GemStone
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 

Destacado

How Big is Facebook Really?
How Big is Facebook Really?How Big is Facebook Really?
How Big is Facebook Really?Wishpond
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksAnne Nicolas
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Facebook Technology Stack
Facebook Technology StackFacebook Technology Stack
Facebook Technology StackHusain Ali
 
Hpca2012 facebook keynote
Hpca2012 facebook keynoteHpca2012 facebook keynote
Hpca2012 facebook keynoteparallellabs
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBernard Marr
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenHARMAN Services
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Facebook Analysis and Study
Facebook Analysis and StudyFacebook Analysis and Study
Facebook Analysis and StudyOuriel Ohayon
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...
20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...
20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...HubSpot
 

Destacado (20)

How Big is Facebook Really?
How Big is Facebook Really?How Big is Facebook Really?
How Big is Facebook Really?
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Facebook Technology Stack
Facebook Technology StackFacebook Technology Stack
Facebook Technology Stack
 
Hpca2012 facebook keynote
Hpca2012 facebook keynoteHpca2012 facebook keynote
Hpca2012 facebook keynote
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it Open
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Facebook Analysis and Study
Facebook Analysis and StudyFacebook Analysis and Study
Facebook Analysis and Study
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...
20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...
20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...
 
Customer Journey Analytics and Big Data
Customer Journey Analytics and Big DataCustomer Journey Analytics and Big Data
Customer Journey Analytics and Big Data
 

Similar a FlashCache

coa-Unit5-ppt1 (1).pptx
coa-Unit5-ppt1 (1).pptxcoa-Unit5-ppt1 (1).pptx
coa-Unit5-ppt1 (1).pptxRuhul Amin
 
SSD Caching: Device-Mapper- and Hardware-based solutions compared
SSD Caching: Device-Mapper- and Hardware-based solutions compared SSD Caching: Device-Mapper- and Hardware-based solutions compared
SSD Caching: Device-Mapper- and Hardware-based solutions compared Werner Fischer
 
ZFS Workshop
ZFS WorkshopZFS Workshop
ZFS WorkshopAPNIC
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storageAngelo Rajadurai
 
Computer organization memory hierarchy
Computer organization memory hierarchyComputer organization memory hierarchy
Computer organization memory hierarchyAJAL A J
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databasesAngelo Rajadurai
 
11g r2 flashcache_Tips
11g r2 flashcache_Tips11g r2 flashcache_Tips
11g r2 flashcache_TipsLouis liu
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosqlthinkinlamp
 
rac_for_beginners_ppt.pdf
rac_for_beginners_ppt.pdfrac_for_beginners_ppt.pdf
rac_for_beginners_ppt.pdfHODCA1
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
Raidz on-disk format vs. small blocks
Raidz on-disk format vs. small blocksRaidz on-disk format vs. small blocks
Raidz on-disk format vs. small blocksJoyent
 
Ceph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelCeph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelColleen Corrice
 
Ceph Performance: Projects Leading Up to Jewel
Ceph Performance: Projects Leading Up to JewelCeph Performance: Projects Leading Up to Jewel
Ceph Performance: Projects Leading Up to JewelRed_Hat_Storage
 
My sql with enterprise storage
My sql with enterprise storageMy sql with enterprise storage
My sql with enterprise storageCaroline_Rose
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good? Alkin Tezuysal
 

Similar a FlashCache (20)

coa-Unit5-ppt1 (1).pptx
coa-Unit5-ppt1 (1).pptxcoa-Unit5-ppt1 (1).pptx
coa-Unit5-ppt1 (1).pptx
 
Cachememory
CachememoryCachememory
Cachememory
 
SSD Caching: Device-Mapper- and Hardware-based solutions compared
SSD Caching: Device-Mapper- and Hardware-based solutions compared SSD Caching: Device-Mapper- and Hardware-based solutions compared
SSD Caching: Device-Mapper- and Hardware-based solutions compared
 
ZFS Workshop
ZFS WorkshopZFS Workshop
ZFS Workshop
 
Racing with Droids
Racing with DroidsRacing with Droids
Racing with Droids
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storage
 
Computer organization memory hierarchy
Computer organization memory hierarchyComputer organization memory hierarchy
Computer organization memory hierarchy
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
11g r2 flashcache_Tips
11g r2 flashcache_Tips11g r2 flashcache_Tips
11g r2 flashcache_Tips
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
 
rac_for_beginners_ppt.pdf
rac_for_beginners_ppt.pdfrac_for_beginners_ppt.pdf
rac_for_beginners_ppt.pdf
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Raidz on-disk format vs. small blocks
Raidz on-disk format vs. small blocksRaidz on-disk format vs. small blocks
Raidz on-disk format vs. small blocks
 
RAIDZ on-disk format vs. small blocks
RAIDZ on-disk format vs. small blocksRAIDZ on-disk format vs. small blocks
RAIDZ on-disk format vs. small blocks
 
Ceph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelCeph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to Jewel
 
Ceph Performance: Projects Leading Up to Jewel
Ceph Performance: Projects Leading Up to JewelCeph Performance: Projects Leading Up to Jewel
Ceph Performance: Projects Leading Up to Jewel
 
My sql with enterprise storage
My sql with enterprise storageMy sql with enterprise storage
My sql with enterprise storage
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good?
 

Más de Chris Westin

Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productionengChris Westin
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalChris Westin
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcacheChris Westin
 
SDN/OpenFlow #lspe
SDN/OpenFlow #lspeSDN/OpenFlow #lspe
SDN/OpenFlow #lspeChris Westin
 
cfengine3 at #lspe
cfengine3 at #lspecfengine3 at #lspe
cfengine3 at #lspeChris Westin
 
mongodb-aggregation-may-2012
mongodb-aggregation-may-2012mongodb-aggregation-may-2012
mongodb-aggregation-may-2012Chris Westin
 
Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Chris Westin
 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012Chris Westin
 
Stingray - Riverbed Technology
Stingray - Riverbed TechnologyStingray - Riverbed Technology
Stingray - Riverbed TechnologyChris Westin
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkChris Westin
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica setsChris Westin
 
Architecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionArchitecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionChris Westin
 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011Chris Westin
 
Practical Replication June-2011
Practical Replication June-2011Practical Replication June-2011
Practical Replication June-2011Chris Westin
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011Chris Westin
 
Ganglia Overview-v2
Ganglia Overview-v2Ganglia Overview-v2
Ganglia Overview-v2Chris Westin
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011Chris Westin
 

Más de Chris Westin (20)

Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productioneng
 
Gripshort
GripshortGripshort
Gripshort
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.final
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcache
 
SDN/OpenFlow #lspe
SDN/OpenFlow #lspeSDN/OpenFlow #lspe
SDN/OpenFlow #lspe
 
cfengine3 at #lspe
cfengine3 at #lspecfengine3 at #lspe
cfengine3 at #lspe
 
mongodb-aggregation-may-2012
mongodb-aggregation-may-2012mongodb-aggregation-may-2012
mongodb-aggregation-may-2012
 
Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19
 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012
 
Stingray - Riverbed Technology
Stingray - Riverbed TechnologyStingray - Riverbed Technology
Stingray - Riverbed Technology
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
 
Architecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionArchitecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage Solution
 
Large Scale Cacti
Large Scale CactiLarge Scale Cacti
Large Scale Cacti
 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011
 
Practical Replication June-2011
Practical Replication June-2011Practical Replication June-2011
Practical Replication June-2011
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011
 
Ganglia Overview-v2
Ganglia Overview-v2Ganglia Overview-v2
Ganglia Overview-v2
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

FlashCache

  • 1.
  • 3. FlashCache at Facebook ▪ What ▪ We want to use some Flash storage on existing servers ▪ We want something that is simple to deploy and use ▪ Our IO access patterns benefit from a cache ▪ Who ▪ Mohan Srinivasan – design and implementation ▪ Paul Saab – platform and MySQL integration ▪ Michael Jiang – testing, performance and capacity planning ▪ Mark Callaghan - benchmarketing
  • 4. Introduction ▪ Block cache for Linux - write back and write through modes ▪ Layered below the filesystem at the top of the storage stack ▪ Cache Disk Blocks on fast persistent storage (Flash, SSD) ▪ Loadable Linux Kernel module, built using the Device Mapper (DM) ▪ Primary use case InnoDB, but general purpose ▪ Based on dm-cache by Prof. Ming
  • 5. Caching Modes Write Back Write Through, Write Around ▪ Lazy writing to disk ▪ Non-persistent ▪ Persistent across reboot ▪ Are you a pessimist? ▪ Persistent across device removal
  • 6. Cache Structure ▪ Set associative hash ▪ Hash with fixed sized buckets (sets) with linear probing within a set ▪ 512-way set associative by default ▪ dbn: Disk Block Number, address of block on disk ▪ Set = (dbn / block size / set size) mod (number of sets) ▪ Sequential range of dbns map onto a single sets
  • 7. Cache Structure . . Set 0 . . . . . . . . . . Block 0 Cache set Block 0 . . . Block 511 . SET i . . N set’s worth . . Of blocks . Block 0 . Cache set Block 511 . . . . Block 511 . . . . . Set N-1
  • 8. Replacement and Memory Footprint ▪ Replacement policy is FIFO (default) or LRU within a set ▪ Switch on the fly between FIFO/LRU (sysctl) ▪ Metadata per cache block: 16 bytes in memory, 16 bytes on ssd ▪ On ssd metadata per-slot ▪ <dbn, block state> ▪ In memory metadata per-slot: ▪ <dbn, block state, LRU chain pointers, misc>
  • 9. Reads ▪ Compute cache set for dbn ▪ Cache Hit ▪ Verify checksums if configured ▪ Serve read out of cache ▪ Cache Miss ▪ Find free block or reclaim block based on replacement policy ▪ Read block from disk and populate cache ▪ Update block checksum if configured ▪ Return data to user
  • 10. Write Through - writes ▪ Compute cache set for dbn ▪ Cache hit ▪ Get cached block ▪ Cache miss ▪ Find free block or reclaim block ▪ Write data block to disk ▪ Write data block to cache ▪ Update block checksum
  • 11. Write Back - writes ▪ Compute cache set for dbn ▪ Cache Hit ▪ Write data block into cache ▪ If data block not DIRTY, synchronously update on-ssd cache metadata to mark block DIRTY ▪ Cache miss ▪ Find free block or reclaim block based on replacement policy ▪ Write data block to cache ▪ Synchronously update on-ssd cache metadata to mark block DIRTY
  • 12. Small or uncacheable requests ▪ First invalidate blocks that overlap the requests ▪ There are at most 2 such blocks ▪ For Write Back, if the overlapping blocks are DIRTY they are cleaned first then invalidated ▪ Uncacheable full block reads are served from cache in case of a cache hit. ▪ Perform disk IO ▪ Repeat invalidation to close races which might have caused the block to be cached while the disk IO was in progress
  • 13. Write Back policy ▪ Dirty blocks not recently accessed ▪ A clock-like algorithm picks off Dirty blocks not accessed in the last 15 minutes (configurable) for cleaning ▪ When dirty blocks in a set exceeds configurable threshold, clean some blocks ▪ Blocks selected for writeback based on replacement policy ▪ Default dirty threshold 20%. Set higher for write heavy workloads ▪ Sort selected blocks and pickup any other blocks in set that can be contiguously merged with these ▪ Writes merged by the IO scheduler
  • 14. Write Back – overheads ▪ In-Memory cache metadata memory footprint ▪ 300GB/4KB cache -> ~1.2GB ▪ 160GB/4KB cache -> ~640MB ▪ Cache metadata writes/file system write ▪ Worst case is 2 cache metadata updates per write ▪ (VALID->DIRTY, DIRTY->VALID) ▪ Average case is much lower because of cache write hits and batching of cache metadata updates
  • 15. Write Through/Around – cache overheads ▪ In-Memory Cache metadata footprint ▪ 300GB/4KB cache -> ~1.2GB ▪ 160GB/4KB cache -> ~640MB ▪ Cache metadata writes per file system write ▪ 1 cache data write per file system write (Write Through) ▪ No overhead (for Write Around)
  • 16. Write Back – metadata updates ▪ Cache (on-ssd) metadata only updated on writes and block cleanings (VALID->DIRTY or DIRTY->VALID) ▪ Cache (on-ssd) metadata not updated on cache population for reads ▪ Reload after an unclean shutdown only loads DIRTY blocks ▪ Fast and Slow cache shutdowns ▪ Only metadata is written on fast shutdown. Reload loads both dirty and clean blocks ▪ Slow shutdown writes all dirty blocks to disk first, then writes out metadata to the ssd. Reload only loads clean blocks. ▪ Metadata updates to multiple blocks in same sector are batched
  • 17. Torn Page Problem ▪ Handle partial block write caused by power failure or other causes ▪ Problem exists for Flashcache in Write Back mode ▪ Detected via block checksums ▪ Checksums are disabled by default ▪ Pages with bad checksums are not used ▪ Checksums increase cache metadata writes and memory footprint ▪ Update cache metadata checksums on DIRTY->DIRTY block transitions for Write Back ▪ Each per-cache slot grows by 8 bytes to hold the checksum (a 50% increase from 16 bytes to 24 bytes for the Write Back case).
  • 18. Cache controls for Write Back ▪ Work best with O_DIRECT file access ▪ Global modes – Cache All or Cache Nothing ▪ Cache All has a blacklist of pids and tgids ▪ Cache Nothing has a whitelist of pids and tgids ▪ tgids can be used to tag all pthreads in the group as cacheable ▪ Exceptions for threads within a group are supported ▪ List changes done via FlashCache ioctls ▪ Cache can be read but is not written for non-cacheable tgids and pids ▪ We modified MySQL and scp to use this support
  • 19. Cache Nothing policy ▪ If the thread id is whitelisted, cache all IOs for this thread ▪ If the tgid is whitelisted, cache all IOs for this thread ▪ If the thread id is blacklisted do not cache IOs
  • 20. Cache control example ▪ We use Cache Nothing mode for MySQL servers ▪ The mysqld tgid is added to the whitelist ▪ All IO done by it is cacheable ▪ Writes done by other processes do not update the cache ▪ Full table scans done by mysqldump use a hint that directs mysqld to add the query’s thread id to the blacklist to avoid wiping FlashCache ▪ select /* SQL_NO_FCACHE */ pk, col1, col2 from foobar
  • 21. Utilities ▪ flashcache_create ▪ flashcache_create -b 4k -s 10g mysql /dev/flash /dev/disk ▪ flashcache_destroy ▪ flashcache_destory /dev/flash ▪ flashcache_load
  • 22. sysctl –a | grep flash dev.flashcache.cache_all = 0 dev.flashcache.do_pid_expiry = 0 dev.flashcache.fast_remove = 0 dev.flashcache.max_clean_ios_set = 2 dev.flashcache.zero_stats = 1 dev.flashcache.max_clean_ios_total = 4 dev.flashcache.write_merge = 1 dev.flashcache.debug = 0 dev.flashcache.reclaim_policy = 0 dev.flashcache.dirty_thresh_pct = 20 dev.flashcache.pid_expiry_secs = 60 dev.flashcache.stop_sync = 0 dev.flashcache.max_pids = 100 dev.flashcache.do_sync = 0
  • 23. Removing FlashCache ▪ umount /data ▪ dmesetup remove mysql ▪ flashcache_destroy /dev/flash
  • 24. cat /proc/flashcache_stats reads=4 writes=0 read_hits=0 read_hit_percent=0 write_hits=0 write_hit_percent=0 dirty_write_hits=0 dirty_write_hit_percent=0 replacement=0 write_replacement=0 write_invalidates=0 read_invalidates=0 pending_enqueues=0 pending_inval=0 metadata_dirties=0 metadata_cleans=0 cleanings=0 no_room=0 front_merge=0 back_merge=0 nc_pid_adds=0 nc_pid_dels=0 nc_pid_drops=0 nc_expiry=0 disk_reads=0 disk_writes=0 ssd_reads=0 ssd_writes=0 uncached_reads=169 uncached_writes=128
  • 25. Future Work ▪ Cache mirroring ▪ SW RAID 0 block device as a cache ▪ Online cache resize ▪ No shutdown and recreate ▪ Support for ATA trim ▪ Discard blocks no longer in use ▪ Fix the torn page problem ▪ Use shadow pages
  • 26. Resources ▪ GitHub : facebook/flashcache ▪ Mailing list : flashcache-dev@googlegroups.com ▪ http://facebook.com/MySQLatFacebook ▪ Email : ▪ mohan@fb.com (Mohan Srinivasan) ▪ ps@fb.com (Paul Saab) ▪ michael@fb.com (Michael Jiang) ▪ mcallaghan@fb.com (Mark Callaghan)
  • 27. (c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0