SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Performance evaluation of Linux Discard
Support
(Overview, benchmark results, current status)

Red Hat
Luk´ˇ Czerner
   as
May 27, 2011
Copyright ©    2011 Luk´ˇ Czerner, Red Hat.
                         as
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation
License, Version 1.3 or any later version published by the Free
Software Foundation; with no Invariant Sections, no Front-Cover
Texts, and no Back-Cover Texts. A copy of the license is included
in the COPYING file.
Part I
Discard Background
Agenda



1 SSD Description



2 Thinly Provisioned Storage



3 Introducing Linux Discard Support
SSD Description




Solid-State Drive




   Flash memory block device
   Wear-leveling needed
   Firmware = black box
SSD Description




ATA TRIM Command


  Helps handle garbage collection overhead
  Subsequent READ of TRIM’ed blocks
    1   Read data should NOT change between READ’s
    2   Read data should NOT be retrieved from data previously
        written to any other LBA.
  As long as the device has enough free pages to write to we do
  not necessarily need it.
  In a nutshell: TRIM command tells the device what LBA’a is
  not used by the OS anymore.
Thinly Provisioned Storage




Thin Provisioning




   Unlike in traditional storage, there is no fixed one-to-one
   logical bock to physical storage mapping
   More efficient use of storage space
   Block reclamation interface needed
Thinly Provisioned Storage




SCSI UNMAP / WRITE SAME



  Storage space reclamation interface
  Subsequent READ of unmapped blocks
    1   Read data should NOT change between READ’s
    2   Read data should NOT be retrieved from data previously
        written to any other LBA.
  Unlike with SSD’s we can not afford to wait until we run out
  of space for reclamation.
Introducing Linux Discard Support




Linux Discard Implementation



   Abstraction for the two underlying specifications:
     1   ATA TRIM Command
     2   SCSI UNMAP / WRITE SAME
   API for user-space
         BLKDISCARD ioctl
         Added with v2.6.27-rc9-30-gd30a260
   API for File Sytems
     1   sb issue discard()
     2   blkdev issue discard()
Part II
Discard Performance
Agenda




4 Testing Methodology



5 Results
Testing Methodology




What do we need to find out ?



   Does discard really work ? Is it reliable ?
   How fast/slow is it ?
   Is there any difference between devices from different vendors
   ?
   What is the ideal discard size ?
   SSD performance degradation
Testing Methodology




How do we test it ?


   BLKDISCARD ioctl()
   Automatic discard of different ranges
   Different discard patterns
     1   sequential performance
     2   random IO peformance
     3   discard already discarded blocks
   test-discard - discard benchmarking tool
         http://sourceforge.net/projects/test-discard/
   impression - filesystem aging tool
Results




Sequential discard performance
               300                                          1200
                                     Duration Summary
                                            Throughput

               250                                          1000


               200                                          800




                                                                   Throughput [MB/s]
Duration [s]




               150                                          600


               100                                          400


               50                                           200


                0                                            0
                     10              100                 1000
                          Record size [kB]
Results




Different modes comparison
                    1200
                                          Throughput (sequential)
                                          Throughput (random IO)
                                            Throughput (discard2)
                    1000


                     800
Throughput [MB/s]




                     600


                     400


                     200


                      0
                           10               100                     1000
                                Record size [kB]
Results




Difference between various vendors
                    1200
                                             Throughput Vendor 1
                                             Throughput Vendor 2
                                             Throughput Vendor 3
                    1000


                     800
Throughput [MB/s]




                     600


                     400


                     200


                      0
                           10               100                    1000
                                Record size [kB]
Results




SSD performance degradation
                            70
                                                                             Vendor1
                                                                             Vendor2
                                                                             Vendor3
                            60                                         Vendor3 discard
 Write performance [MB/s]




                            50


                            40


                            30


                            20


                            10



                                 0   50   100    150      200       250     300      350   400
                                                Filesystem saturation [%]
Part III
Discard Support for Linux File systems
Agenda



6 Periodic Discard



7 Discard Batching



8 Different Approach
Periodic Discard




Periodic discard



   Easy to implement
   File system support
     1   ext4 (v2.6.27-5185-g8a0aba7)
     2   btrfs (since upstream)
     3   gfs2 (v2.6.29-9-gf15ab56)
     4   fat, swap, nilfs
   mount -o discard /dev/sdc /mnt/test
   TRIM is non-queueable command - implications ?
Periodic Discard




Benchmarking periodic discard



   Expectations ?
   Testing methodology
     1   Metadata intensive load
     2   Load with removing files
     3   Reasonable file size distribution
   Discard-kit
     1   Using PostMark
     2   http://sourceforge.net/projects/test-discard/files/
Periodic Discard




Ext4 performance (18% hit)
               800
                                       Deleted/s
                                       Append/s
                                         Read/s
               700               Files-created/s
                                 Transactions/s
                                      Read[B/s]
               600                    Write[B/s]
 Opetation/s




               500


               400


               300


               200


               100
                     nodiscard       discard
Periodic Discard




Performance with various file systems
                   750
                                                                    ext4
                   700                                              btrfs

                   650                                              gfs2
  Transactions/s




                   600

                   550

                   500

                   450

                   400

                   350
                         no


                                   di




                                              no


                                                        di




                                                                   no


                                                                             di
                                     sc




                                                          sc




                                                                               sc
                          di




                                               di




                                                                    di
                                      ar




                                                           ar




                                                                                ar
                            sc




                                                 sc




                                                                      sc
                                          d




                                                               d




                                                                                    d
                              ar




                                                   ar




                                                                        ar
                                 d




                                                      d




                          -18%                  -7%                 -63%   d
Discard Batching




Discard Batching - The idea



   Fine-grained discard is not necessarily needed
   Small extents are slow
   With time, freed extents tends to coalesce
   Disadvantages
     1 There is a price for tracking freed extents
     2 Discarding already discarded blocks should be easy, but...
     3 Daemon (in-kernel, user-space) needed.
     4 File system independent solution would most likely be pain to
       do right (if possible).
Discard Batching




Batched discard support



   File system specific solution
   Provide ioctl() interface - FITRIM
   Do not disturb other ongoing IO too much
     1   Prevent allocations while trimming
     2   How to handle huge filesystem ?
   File system support
     1   ext4 (v2.6.36-rc6-35-g7360d17)
     2   ext3 (v2.6.37-11-g9c52749)
     3   xfs (v2.6.37-rc4-63-ga46db60)
Discard Batching




FITRIM ioctl


   Ioctl with one RW parameter defined in linux/fs.h
   struct fstrim range {
     u64 start;
     u64 len;
     u64 minlen;
   }
   fstrim tool
        http://sourceforge.net/projects/fstrim/
   util-linux-ng
        Since v2.18-165-gd9e2d0d
Discard Batching




Batched discard benchmark results
                 6
                                                      FITRIM on ext4
                                                       BLKDISCARD

                 5


                 4
  Duration [s]




                 3


                 2


                 1


                 0
                     0   20       40             60           80         100
                              Filesystem saturation [%]
Different Approach




Alternative approach




   It is always a compromise
   The future of SSD’s and thinly provisioned LUN’s (???)
Part IV
Discard Support in user-space
Agenda




9 e2fsprogs



10 Other utilities
e2fsprogs




Discard in e2fsprogs tools

   Using BLKDISCARD ioctl()
   mke2fs
     1   Refresh SSD’s garbage collector
     2   discard zeroes data - significant speed boost
     3   mkfs.ext4 -E discard /dev/sdc
   e2fsck
     1   After the last check discard free space
     2   Non detected file system errors ? oops
     3   fsck.ext4 -E discard /dev/sdc
   resize2fs
     1   Refresh SSD’s garbage collector
     2   discard zeroes data - significant speed boost
     3   resize2fs -E discard /dev/sdc
e2fsprogs




File system creation
               30
                                         nodiscard
                                           discard

               25


               20
Duration [s]




               15


               10


               5


               0
                    EXT4                 XFS
                           File system
Other utilities




Fstrim tool




   Very simple tool to invoke FITRIM ioctl on mounted file
   system
   Stand-alone tool
       http://sourceforge.net/projects/fstrim/
   Since v2.18-165-gd9e2d0d part of util-linux-ng
Part V
Summary
Summary
  Linux Discard support is a abstraction for underlying
  specification
  Exported via BLKDISCARD ioctl to user-space and
  blkdev issue discard() for filesystems
  Discard testing kit (Discard-kit)
    1   test-discard
    2   PostMark
  Filesystem support
    1   Fine grained (online) discard - mount -o discard
    2   Batched discard support - fstrim from util-linux-ng
  Support in user-space utilities
    1   Filesystem creation (mkfs)
    2   e2fsprogs - mkfs,e2fsck,resize2fs
    3   xfsprogs - mkfs
    4   fstrim
The end.
Thanks for listening.
Useful links




   http://sourceforge.net/projects/fstrim/
   http://sourceforge.net/projects/test-discard/
   http://people.redhat.com/lczerner/discard/

Más contenido relacionado

Similar a Performance evaluation of Linux Discard Support

Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Enkitec
 
Presentation sun storage tek™ 2500 series
Presentation   sun storage tek™ 2500 seriesPresentation   sun storage tek™ 2500 series
Presentation sun storage tek™ 2500 series
xKinAnx
 
Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...
Alpen-Adria-Universität
 
Présentation Aerys
Présentation Aerys Présentation Aerys
Présentation Aerys
iCOMMUNITY
 
Diamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheetDiamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheet
ceed2013
 

Similar a Performance evaluation of Linux Discard Support (20)

Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
 
Presentation sun storage tek™ 2500 series
Presentation   sun storage tek™ 2500 seriesPresentation   sun storage tek™ 2500 series
Presentation sun storage tek™ 2500 series
 
Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...
 
SPC BENCHMARK 2/ENERGY™ EXECUTIVE SUMMARY
SPC BENCHMARK 2/ENERGY™  EXECUTIVE SUMMARYSPC BENCHMARK 2/ENERGY™  EXECUTIVE SUMMARY
SPC BENCHMARK 2/ENERGY™ EXECUTIVE SUMMARY
 
VyattaCore TIPS2013
VyattaCore TIPS2013VyattaCore TIPS2013
VyattaCore TIPS2013
 
Windows Azure Storage – Architecture View
Windows Azure Storage – Architecture ViewWindows Azure Storage – Architecture View
Windows Azure Storage – Architecture View
 
Practical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iPractical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM i
 
An introduction and evaluations of a wide area distributed storage system
An introduction and evaluations of  a wide area distributed storage systemAn introduction and evaluations of  a wide area distributed storage system
An introduction and evaluations of a wide area distributed storage system
 
Présentation Aerys
Présentation Aerys Présentation Aerys
Présentation Aerys
 
Diamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheetDiamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheet
 
Perf Storwize V7000 Eng
Perf Storwize V7000 EngPerf Storwize V7000 Eng
Perf Storwize V7000 Eng
 
How to Create a High-Speed Template Engine in Python
How to Create a High-Speed Template Engine in PythonHow to Create a High-Speed Template Engine in Python
How to Create a High-Speed Template Engine in Python
 
Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014
 
Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)
 
M series technical presentation-part 1
M series technical presentation-part 1M series technical presentation-part 1
M series technical presentation-part 1
 
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
 
Sending Push Notifications using the Windows Push Notification Service and Wi...
Sending Push Notifications using the Windows Push Notification Service and Wi...Sending Push Notifications using the Windows Push Notification Service and Wi...
Sending Push Notifications using the Windows Push Notification Service and Wi...
 
IBM Solid State in eX5 servers
IBM Solid State in eX5 serversIBM Solid State in eX5 servers
IBM Solid State in eX5 servers
 
Linux on System z disk I/O performance
Linux on System z disk I/O performanceLinux on System z disk I/O performance
Linux on System z disk I/O performance
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Performance evaluation of Linux Discard Support

  • 1. Performance evaluation of Linux Discard Support (Overview, benchmark results, current status) Red Hat Luk´ˇ Czerner as May 27, 2011
  • 2. Copyright © 2011 Luk´ˇ Czerner, Red Hat. as Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the COPYING file.
  • 4. Agenda 1 SSD Description 2 Thinly Provisioned Storage 3 Introducing Linux Discard Support
  • 5. SSD Description Solid-State Drive Flash memory block device Wear-leveling needed Firmware = black box
  • 6. SSD Description ATA TRIM Command Helps handle garbage collection overhead Subsequent READ of TRIM’ed blocks 1 Read data should NOT change between READ’s 2 Read data should NOT be retrieved from data previously written to any other LBA. As long as the device has enough free pages to write to we do not necessarily need it. In a nutshell: TRIM command tells the device what LBA’a is not used by the OS anymore.
  • 7. Thinly Provisioned Storage Thin Provisioning Unlike in traditional storage, there is no fixed one-to-one logical bock to physical storage mapping More efficient use of storage space Block reclamation interface needed
  • 8. Thinly Provisioned Storage SCSI UNMAP / WRITE SAME Storage space reclamation interface Subsequent READ of unmapped blocks 1 Read data should NOT change between READ’s 2 Read data should NOT be retrieved from data previously written to any other LBA. Unlike with SSD’s we can not afford to wait until we run out of space for reclamation.
  • 9. Introducing Linux Discard Support Linux Discard Implementation Abstraction for the two underlying specifications: 1 ATA TRIM Command 2 SCSI UNMAP / WRITE SAME API for user-space BLKDISCARD ioctl Added with v2.6.27-rc9-30-gd30a260 API for File Sytems 1 sb issue discard() 2 blkdev issue discard()
  • 12. Testing Methodology What do we need to find out ? Does discard really work ? Is it reliable ? How fast/slow is it ? Is there any difference between devices from different vendors ? What is the ideal discard size ? SSD performance degradation
  • 13. Testing Methodology How do we test it ? BLKDISCARD ioctl() Automatic discard of different ranges Different discard patterns 1 sequential performance 2 random IO peformance 3 discard already discarded blocks test-discard - discard benchmarking tool http://sourceforge.net/projects/test-discard/ impression - filesystem aging tool
  • 14. Results Sequential discard performance 300 1200 Duration Summary Throughput 250 1000 200 800 Throughput [MB/s] Duration [s] 150 600 100 400 50 200 0 0 10 100 1000 Record size [kB]
  • 15. Results Different modes comparison 1200 Throughput (sequential) Throughput (random IO) Throughput (discard2) 1000 800 Throughput [MB/s] 600 400 200 0 10 100 1000 Record size [kB]
  • 16. Results Difference between various vendors 1200 Throughput Vendor 1 Throughput Vendor 2 Throughput Vendor 3 1000 800 Throughput [MB/s] 600 400 200 0 10 100 1000 Record size [kB]
  • 17. Results SSD performance degradation 70 Vendor1 Vendor2 Vendor3 60 Vendor3 discard Write performance [MB/s] 50 40 30 20 10 0 50 100 150 200 250 300 350 400 Filesystem saturation [%]
  • 18. Part III Discard Support for Linux File systems
  • 19. Agenda 6 Periodic Discard 7 Discard Batching 8 Different Approach
  • 20. Periodic Discard Periodic discard Easy to implement File system support 1 ext4 (v2.6.27-5185-g8a0aba7) 2 btrfs (since upstream) 3 gfs2 (v2.6.29-9-gf15ab56) 4 fat, swap, nilfs mount -o discard /dev/sdc /mnt/test TRIM is non-queueable command - implications ?
  • 21. Periodic Discard Benchmarking periodic discard Expectations ? Testing methodology 1 Metadata intensive load 2 Load with removing files 3 Reasonable file size distribution Discard-kit 1 Using PostMark 2 http://sourceforge.net/projects/test-discard/files/
  • 22. Periodic Discard Ext4 performance (18% hit) 800 Deleted/s Append/s Read/s 700 Files-created/s Transactions/s Read[B/s] 600 Write[B/s] Opetation/s 500 400 300 200 100 nodiscard discard
  • 23. Periodic Discard Performance with various file systems 750 ext4 700 btrfs 650 gfs2 Transactions/s 600 550 500 450 400 350 no di no di no di sc sc sc di di di ar ar ar sc sc sc d d d ar ar ar d d -18% -7% -63% d
  • 24. Discard Batching Discard Batching - The idea Fine-grained discard is not necessarily needed Small extents are slow With time, freed extents tends to coalesce Disadvantages 1 There is a price for tracking freed extents 2 Discarding already discarded blocks should be easy, but... 3 Daemon (in-kernel, user-space) needed. 4 File system independent solution would most likely be pain to do right (if possible).
  • 25. Discard Batching Batched discard support File system specific solution Provide ioctl() interface - FITRIM Do not disturb other ongoing IO too much 1 Prevent allocations while trimming 2 How to handle huge filesystem ? File system support 1 ext4 (v2.6.36-rc6-35-g7360d17) 2 ext3 (v2.6.37-11-g9c52749) 3 xfs (v2.6.37-rc4-63-ga46db60)
  • 26. Discard Batching FITRIM ioctl Ioctl with one RW parameter defined in linux/fs.h struct fstrim range { u64 start; u64 len; u64 minlen; } fstrim tool http://sourceforge.net/projects/fstrim/ util-linux-ng Since v2.18-165-gd9e2d0d
  • 27. Discard Batching Batched discard benchmark results 6 FITRIM on ext4 BLKDISCARD 5 4 Duration [s] 3 2 1 0 0 20 40 60 80 100 Filesystem saturation [%]
  • 28. Different Approach Alternative approach It is always a compromise The future of SSD’s and thinly provisioned LUN’s (???)
  • 29. Part IV Discard Support in user-space
  • 31. e2fsprogs Discard in e2fsprogs tools Using BLKDISCARD ioctl() mke2fs 1 Refresh SSD’s garbage collector 2 discard zeroes data - significant speed boost 3 mkfs.ext4 -E discard /dev/sdc e2fsck 1 After the last check discard free space 2 Non detected file system errors ? oops 3 fsck.ext4 -E discard /dev/sdc resize2fs 1 Refresh SSD’s garbage collector 2 discard zeroes data - significant speed boost 3 resize2fs -E discard /dev/sdc
  • 32. e2fsprogs File system creation 30 nodiscard discard 25 20 Duration [s] 15 10 5 0 EXT4 XFS File system
  • 33. Other utilities Fstrim tool Very simple tool to invoke FITRIM ioctl on mounted file system Stand-alone tool http://sourceforge.net/projects/fstrim/ Since v2.18-165-gd9e2d0d part of util-linux-ng
  • 35. Summary Linux Discard support is a abstraction for underlying specification Exported via BLKDISCARD ioctl to user-space and blkdev issue discard() for filesystems Discard testing kit (Discard-kit) 1 test-discard 2 PostMark Filesystem support 1 Fine grained (online) discard - mount -o discard 2 Batched discard support - fstrim from util-linux-ng Support in user-space utilities 1 Filesystem creation (mkfs) 2 e2fsprogs - mkfs,e2fsck,resize2fs 3 xfsprogs - mkfs 4 fstrim
  • 36. The end. Thanks for listening.
  • 37. Useful links http://sourceforge.net/projects/fstrim/ http://sourceforge.net/projects/test-discard/ http://people.redhat.com/lczerner/discard/