SlideShare a Scribd company logo
1 of 15
Download to read offline
Towards Application
Driven Storage
Javier González, Matias Bjørling, Philippe Bonnet
Lund Linux Con 2015
Solid State Drives
‣ Thousands of IOPS and low latency (<1ms)
‣ Hardware continues to improve
- Parallel architecture
- Larger flash chips
Overview
‣ Abstractions have allowed to
rapidly replace Hard-Drives
‣ Complexity is handled in
embedded software (FTL)
Embedded FTLs
No Future
‣ Dealing with flash chip constraints is a necessity
- No way around the Flash Translation Layer (FTL)
‣ FTLs have enabled wide SSD adoption, but they introduce
critical limitations that are visible today:
- Hardwired data placement policies, over-provisioning,
scheduling, garbage collection, wear-leveling.
- Based on more or less explicit assumptions about the application
workload.
- Redundancies, missed optimizations, and resource
underutilization
Market Specific FTLs
Workarounds Today
‣ SSDs on the market with embedded FTLs targeted at
specific workloads (e.g., 90% reads) or applications (e.g.,
SQL Server, KV store)
‣ The FTL is optimized for a given application and workload
‣ But…
- Who can afford these hardwired optimization?
- Who can afford these hardwired optimization?
- What about new workloads? And new applications?
What can we wish for?
RocksDB
How to engage SSD
vendors (flash, chips)?
Which role does the OS
play in this
architecture?
How to engage
application developers
to switch interfaces?
- Avoid redundancies
- Leverage optimization
opportunities
- Minimize overhead when
manipulating persistent data
- Make better decisions
regarding latency, resource
utilization, and data
movement
Application Driven
Storage
1. Open-Channel SSDs: The kernel
exposes a single address space through a
host-based FTL
2. Applications can implement their own
FTL and bypass the kernel for the IO flow
Open-Channel SSDs
Overview
Physical flash exposed to the
host (Read, Write, Erase)
‣ Host is the king:
- Data placement
- IO scheduling
- Over-provisioning
- Garbage collection
- Weal-leveling
- …
‣ The host knows:
- SSD geometry
• NAND idiosyncrasies
• Die geometry (blocks & pages)
• Channels, timings, etc.
• Bad blocks
• Error-correction codes
- Features and responsibilities
Kernel Integration
‣ LightNVM: Linux kernel support for Open-channel SSDs
- Development: https://github.com/OpenChannelSSD/linux
‣ Generic core features for host-based SSD management:
- List of free, in-use, and bad blocks
- Handling of flash characteristics (features and responsibilities)
‣ Targets that expose a logical address space
- Can be tailored (e.g., key value stores, file systems)
‣ Round-robin with cost-based GC FTL (rrpc)
‣ Status:
- Second patch revision posted to LKML
- HW support: Memblaze eBlaze, a number of stealth hardware
startups, IIT Mandras FPGA-based NVMe implementation.
Open-Channel SSDs
LightNVM
Architecture
File-systems
Null device
Key-Value Target
VFS
Open-Channel SSD Integration
Open-Channel SSDs
NVMe PCIe-based SATA/SAS
Kernel
User-space
Page Target
Block Layer
Programmable SSDs
Design Space
‣ Requirements:
- Programming SSDs should not be reduced to adding functions
on top of a generic FTL
- Programmable SSDs should contribute to streamlining the data
path
‣ Design Space:
- Programmability / Complexity
- Host-based / Embedded
Programmable SSDs
AppNVM Vision
‣ Applications describe declaratively the characteristics of
their IO flows
‣ AppNVM makes applications closer to storage
- Applications can describe their workload and communicate it to
the SSD by means of rules
- The kernel acts as a control plane and transforms this rules into
SSD actions to guarantee that the expected characteristics are
met with optimal resource utilization.
‣ Storage Quality of Service
- Admission control combined with placement/scheduling policies
managed by centralized controller
AppNVM
ArchitectureDataPlane
ControlPlane
AppNVM
Control Plane
‣ A controller sets up SSD actions for a given IO flow, based
on application request.
‣ How does the address space look like?
- Defines how SSD state is defined and exposed to host
- Quantizes flows of IOs
‣ Which actions?
- Managing flash constraints: data placement, over-provisioning,
scheduling
- Functions to minimize data & metadata movement
AppNVM
Data Plane
‣ The data plane directly connects applications to storage
via SSD actions
‣ Relies on mechanisms that bypass the OS for flows of IO:
NVMe, RDMA, RapidIO
- Millions of IOPS (throughput)
- Predictable latency
Future Work
‣ LightNVM:
- Get LightNVM upstream (close, but still some work to do)
- Implement LightNVM support in more platforms. Today:
• QEMU (Keith Busch's qemu-nvme branch)
• A number of hardware prototypes
- Gain traction via LightNVM and Open-Channel SSDs
• NAND vendors, chip vendors, SSD vendors
‣ AppNVM (initial prototype)
- Controller + rule engine embedded on SSD + set of rules
- Application re-design based on IO flow abstraction:
• Distributed file systems
• DBMS storage engines (key-value and SQL)

More Related Content

What's hot

What's hot (18)

Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 
Bluestore
BluestoreBluestore
Bluestore
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
RocksDB storage engine for MySQL and MongoDB
RocksDB storage engine for MySQL and MongoDBRocksDB storage engine for MySQL and MongoDB
RocksDB storage engine for MySQL and MongoDB
 
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen StoragePros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmark
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
GlusterFS And Big Data
GlusterFS And Big DataGlusterFS And Big Data
GlusterFS And Big Data
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Overview of some popular distributed databases
Overview of some popular distributed databasesOverview of some popular distributed databases
Overview of some popular distributed databases
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & Tricks
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous update
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 

Similar to Towards Application Driven Storage

Varrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentationVarrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentation
pittmantony
 
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory ComputingIMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
In-Memory Computing Summit
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
shuwutong
 
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Matt Stubbs
 

Similar to Towards Application Driven Storage (20)

IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teams
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
 
Varrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentationVarrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentation
 
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory ComputingIMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
IMCSummit 2015 - Day 2 General Session - Flash-Extending In-Memory Computing
 
Choose the Right Container Storage for Kubernetes
Choose the Right Container Storage for KubernetesChoose the Right Container Storage for Kubernetes
Choose the Right Container Storage for Kubernetes
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Next Generation Software-Defined Storage
Next Generation Software-Defined StorageNext Generation Software-Defined Storage
Next Generation Software-Defined Storage
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Troubleshooting Storage Devices Using vRealize Operations (formerly vC Ops)
Troubleshooting Storage Devices Using vRealize Operations (formerly vC Ops)Troubleshooting Storage Devices Using vRealize Operations (formerly vC Ops)
Troubleshooting Storage Devices Using vRealize Operations (formerly vC Ops)
 
Towards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptxTowards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptx
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 

Recently uploaded

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Towards Application Driven Storage

  • 1. Towards Application Driven Storage Javier González, Matias Bjørling, Philippe Bonnet Lund Linux Con 2015
  • 2. Solid State Drives ‣ Thousands of IOPS and low latency (<1ms) ‣ Hardware continues to improve - Parallel architecture - Larger flash chips Overview ‣ Abstractions have allowed to rapidly replace Hard-Drives ‣ Complexity is handled in embedded software (FTL)
  • 3. Embedded FTLs No Future ‣ Dealing with flash chip constraints is a necessity - No way around the Flash Translation Layer (FTL) ‣ FTLs have enabled wide SSD adoption, but they introduce critical limitations that are visible today: - Hardwired data placement policies, over-provisioning, scheduling, garbage collection, wear-leveling. - Based on more or less explicit assumptions about the application workload. - Redundancies, missed optimizations, and resource underutilization
  • 4. Market Specific FTLs Workarounds Today ‣ SSDs on the market with embedded FTLs targeted at specific workloads (e.g., 90% reads) or applications (e.g., SQL Server, KV store) ‣ The FTL is optimized for a given application and workload ‣ But… - Who can afford these hardwired optimization? - Who can afford these hardwired optimization? - What about new workloads? And new applications?
  • 5. What can we wish for? RocksDB How to engage SSD vendors (flash, chips)? Which role does the OS play in this architecture? How to engage application developers to switch interfaces? - Avoid redundancies - Leverage optimization opportunities - Minimize overhead when manipulating persistent data - Make better decisions regarding latency, resource utilization, and data movement Application Driven Storage
  • 6. 1. Open-Channel SSDs: The kernel exposes a single address space through a host-based FTL 2. Applications can implement their own FTL and bypass the kernel for the IO flow
  • 7. Open-Channel SSDs Overview Physical flash exposed to the host (Read, Write, Erase) ‣ Host is the king: - Data placement - IO scheduling - Over-provisioning - Garbage collection - Weal-leveling - … ‣ The host knows: - SSD geometry • NAND idiosyncrasies • Die geometry (blocks & pages) • Channels, timings, etc. • Bad blocks • Error-correction codes - Features and responsibilities
  • 8. Kernel Integration ‣ LightNVM: Linux kernel support for Open-channel SSDs - Development: https://github.com/OpenChannelSSD/linux ‣ Generic core features for host-based SSD management: - List of free, in-use, and bad blocks - Handling of flash characteristics (features and responsibilities) ‣ Targets that expose a logical address space - Can be tailored (e.g., key value stores, file systems) ‣ Round-robin with cost-based GC FTL (rrpc) ‣ Status: - Second patch revision posted to LKML - HW support: Memblaze eBlaze, a number of stealth hardware startups, IIT Mandras FPGA-based NVMe implementation. Open-Channel SSDs
  • 9. LightNVM Architecture File-systems Null device Key-Value Target VFS Open-Channel SSD Integration Open-Channel SSDs NVMe PCIe-based SATA/SAS Kernel User-space Page Target Block Layer
  • 10. Programmable SSDs Design Space ‣ Requirements: - Programming SSDs should not be reduced to adding functions on top of a generic FTL - Programmable SSDs should contribute to streamlining the data path ‣ Design Space: - Programmability / Complexity - Host-based / Embedded
  • 11. Programmable SSDs AppNVM Vision ‣ Applications describe declaratively the characteristics of their IO flows ‣ AppNVM makes applications closer to storage - Applications can describe their workload and communicate it to the SSD by means of rules - The kernel acts as a control plane and transforms this rules into SSD actions to guarantee that the expected characteristics are met with optimal resource utilization. ‣ Storage Quality of Service - Admission control combined with placement/scheduling policies managed by centralized controller
  • 13. AppNVM Control Plane ‣ A controller sets up SSD actions for a given IO flow, based on application request. ‣ How does the address space look like? - Defines how SSD state is defined and exposed to host - Quantizes flows of IOs ‣ Which actions? - Managing flash constraints: data placement, over-provisioning, scheduling - Functions to minimize data & metadata movement
  • 14. AppNVM Data Plane ‣ The data plane directly connects applications to storage via SSD actions ‣ Relies on mechanisms that bypass the OS for flows of IO: NVMe, RDMA, RapidIO - Millions of IOPS (throughput) - Predictable latency
  • 15. Future Work ‣ LightNVM: - Get LightNVM upstream (close, but still some work to do) - Implement LightNVM support in more platforms. Today: • QEMU (Keith Busch's qemu-nvme branch) • A number of hardware prototypes - Gain traction via LightNVM and Open-Channel SSDs • NAND vendors, chip vendors, SSD vendors ‣ AppNVM (initial prototype) - Controller + rule engine embedded on SSD + set of rules - Application re-design based on IO flow abstraction: • Distributed file systems • DBMS storage engines (key-value and SQL)