SlideShare una empresa de Scribd logo
1 de 57
Descargar para leer sin conexión
Filesystem Performance
From a Database Perspective


      Mark Wong
 markwkm@postgresql.org

    LinuxFest Northwest 2009


    April 25 - 26, 2009
How we are here today




   Generous equipment donations from HP and hosting from
   Command Prompt, Inc.
Setting Expectations


                                     This is not:
 This is:
                                           A comprehensive Linux
                                       ◮
       A guide for how to
   ◮
                                           filesystem evaluation.
       understand the i/o behavior
                                           A filesystem tuning guide
                                       ◮
       of your own system.
                                           (mkfs and mount options).
       A model of simple i/o
   ◮
                                           Representative of your
                                       ◮
       behavior based on
                                           system (unless you have the
       PostgreSQL
                                           exact same hardware.)
       Suggestion for testing your
   ◮
                                           A test of reliability or
                                       ◮
       own system.
                                           durability (mostly).
__      __
         / ~~~/  . o O ( Know your system! )
   ,----(      oo    )
  /      __      __/
 /|          ( |(
^    /___ / |
   |__|   |__|-quot;
Contents




        Testing Criteria
    ◮
               What do we want to test.
           ◮

               How we test it with fio.
           ◮


        Test Results
    ◮
Hardware Details

        HP DL380 G5
    ◮

        2 x Quad Core Intel(R) Xeon(R) E5405
    ◮

        HP 32GB Fully Buffered DIMM PC2-5300 8x4GB DR LP
    ◮
        Memory
        HP Smart Array P800 Controller w/ 512MB Cache
    ◮

        HP 72GB Hot Plug 2.5 Dual Port SAS 15,000 rpm Hard
    ◮
        Drives

              __     __
             / ~~~/  . o O ( Thank you HP! )
       ,----(     oo    )
     /       __     __/
    /|          ( |(
   ^     /___ / |
       |__|   |__|-quot;
Test Criteria

   PostgreSQL dictates the following (even though some of them are
   actually configurable:
         8KB block size
     ◮

         No use of posix fadvise
     ◮

         No use of direct i/o
     ◮

         No use of async i/o
     ◮

         Uses processes, not threads.
     ◮

   Based on the hardware:
         56GB data set - To minimize any caching in the 32GB of
     ◮
         system memory. (Want to use 64GB but these 72GB disks are
         in millions of bytes, i.e. not big enough for the 1 disk tests.)
         8 processes performing i/o - One fio process per core, except
     ◮
         for sequential i/o tests, otherwise the i/o won’t be sequential.
Linux v2.6.28-gentoo Filesystems Tested
          ext2
      ◮

          ext3
      ◮

          ext4
      ◮

          jfs
      ◮

          reiserfs
      ◮

          xfs
      ◮

              a8888b. . o O ( What is your favorite filesystem? )
             d888888b.
             8Pquot;YPquot;Y88
             8|o||o|88
             8’     .88
             8‘._.’ Y8.
           d/        ‘8b.
         .dP      .     Y8b.
       d8:’       quot;  ‘::88b.
      d8quot;               ‘Y88b
     :8P        ’        :888
      8a.       :       _a88P
   ._/quot;Yaa_ :        .| 88P|
         YPquot;        ‘| 8P ‘.
   /       ._____.d|        .’
   ‘--..__)888888P‘._.’
Hardware RAID Configurations

  These are the RAID configurations supported by the HP Smart
  Array p8001 disk controller:
         RAID 0
    ◮

         RAID 1+0
    ◮

         RAID 5
    ◮

         RAID 6
    ◮


             __     __
            / ~~~/  . o O ( Which RAID would you use? )
      ,----(     oo    )
    /       __     __/
   /|          ( |(
  ^     /___ / |
      |__|   |__|-quot;


    1
        Firmare v5.22
Test Program




  Use fio2 (Flexible I/O) v1.23 by Jens Axboe to test:
         Sequential Read
    ◮

         Sequential Write
    ◮

         Random Read
    ◮

         Random Write
    ◮

         Random Read/Write
    ◮




    2
        http://brick.kernel.dk/snaps/
__      __
         / ~~~/  . o O ( fio profiles )
   ,----(      oo    )
  /      __      __/
 /|          ( |(
^    /___ / |
   |__|   |__|-quot;
Sequential Read


   [seq-read]
   rw=read
   size=56g
   directory=/test
   fadvise_hint=0
   blocksize=8k
   direct=0
   numjobs=1
   nrfiles=1
   runtime=1h
   time_based
   exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Sequential Write


   [seq-write]
   rw=write
   size=56g
   directory=/test
   fadvise_hint=0
   blocksize=8k
   direct=0
   numjobs=1
   nrfiles=1
   runtime=1h
   time_based
   exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Random Read


  [random-read]
  rw=randread
  size=7g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  direct=0
  numjobs=8
  nrfiles=1
  runtime=1h
  time_based
  exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Random Write


  [random-write]
  rw=randwrite
  size=7g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  direct=0
  numjobs=8
  nrfiles=1
  runtime=1h
  exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Random Read/Write


  [read-write]
  rw=rw
  rwmixread=50
  size=7g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  direct=0
  numjobs=8
  nrfiles=1
  runtime=1h
  time_based
  exec_prerun=echo 3 > /proc/sys/vm/drop_caches
__      __
         / ~~~/  . o O ( Results! )
   ,----(      oo    )
  /      __      __/
 /|          ( |(
^    /___ / |
   |__|   |__|-quot;


There is more information than what is in this presentation, raw
data is at:
http://207.173.203.223/~ markwkm/community10/fio/linux-2.6.28-gentoo/

      Raw mpstat, vmstat, iostat, and sar data.
  ◮
A few words about statistics...




   Not enough time or data to calculate good statistics. My excuses:
         1 hour per test
     ◮

         5 tests per filesystem
     ◮

         6 filesystems per RAID configuration (1 filesystem consistently
     ◮
         halts testing, requiring manual intervention...)
         7 RAID configurations for this presentation
     ◮

         210+ hours of continuous testing for this presentation.
     ◮
Examine the following charts for a feel of how reliable the results
may be.
Random Read with Linux 2.6.28-gentoo ext2 on 1 Disk
Random Write with Linux 2.6.28-gentoo ext2 on 1 Disk
Read/Write Mix (Reads) with Linux 2.6.28-gentoo ext2 on
1 Disk
Read/Write Mix (Writes) with Linux 2.6.28-gentoo ext2
on 1 Disk
Sequential Read with Linux 2.6.28-gentoo ext2 on 1 Disk
Sequential Write with Linux 2.6.28-gentoo ext2 on 1 Disk
A couple notes about the following charts




        If you see missing data, it means the test didn’t complete
    ◮
        successfully.
        Charts are not all scaled the same, focus is on trends within a
    ◮
        chart. (But maybe that is not a good choice?)
Results for all the filesystems when using only a single disk...
Results for all the filesystems when striping across four disks...
Let’s look at that step by step for RAID 0.
Trends might not be the same when adding disks to other RAID
configurations. (We didn’t test it because we moved on to
database systems testings.)
What is the best use of disks with respect to capacity?
Filesystem Readahead



   Values are in number of 512-byte segments, set per device.
   To get the current readahead size:

   $ blockdev --getra /dev/sda
   256


   To set the readahead size to 8 MB:

   $ blockdev --setra 16384 /dev/sda
What does this all mean for my database system?




   It depends... Let’s look at one sample of just using difference
   filesystems.
Raw data for comparing filesystems




   Initial inspection of results suggests we are close to being bound by
   processor and storage performance, meaning we don’t really know
   who performs the best:
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext2.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext3.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext4.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/jfs.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/reiserfs.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/xfs.1500.2/
Linux I/O Elevator Algorithm



   To get current algorithm in []’s:

   $ cat /sys/block/sda/queue/scheduler
   noop anticipatory [deadline] cfq


   To set a different algorithm:

   $ echo cfg > /sys/block/sda/queue/scheduler
   noop anticipatory [deadline] cfq
Why use deadline?3



          Attention! Database servers, especially those using ”TCQ”
    ◮
          disks should investigate performance with the ’deadline’ IO
          scheduler. Any system with high disk performance
          requirements should do so, in fact.
          Also, users with hardware RAID controllers, doing striping,
    ◮
          may find highly variable performance results with using the
          as-iosched. The as-iosched anticipatory implementation is
          based on the notion that a disk device has only one physical
          seeking head. A striped RAID controller actually has a head
          for each physical device in the logical RAID device.




     3
         Linux source code: Documentation/block/as-iosched.txt
Final Notes




   You have to test your own system to see what it can do, don’t take
   someone else’s word for it.
http://www.slideshare.net/markwkm
__      __
         / ~~~/  . o O ( Thank you! )
   ,----(      oo    )
  /      __      __/
 /|          ( |(
^    /___ / |
   |__|   |__|-quot;
Acknowledgements



  Haley Jane Wakenshaw

            __      __
           / ~~~/ 
     ,----(      oo    )
    /      __      __/
   /|          ( |(
  ^    /___ / |
     |__|   |__|-quot;
License




   This work is licensed under a Creative Commons Attribution 3.0
   Unported License. To view a copy of this license, (a) visit
   http://creativecommons.org/licenses/by/3.0/us/; or, (b)
   send a letter to Creative Commons, 171 2nd Street, Suite 300, San
   Francisco, California, 94105, USA.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingUsing Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
 
PostgreSql cheat sheet
PostgreSql cheat sheetPostgreSql cheat sheet
PostgreSql cheat sheet
 
systemd
systemdsystemd
systemd
 
4. linux file systems
4. linux file systems4. linux file systems
4. linux file systems
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar Leibovich
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Linux directory structure by jitu mistry
Linux directory structure by jitu mistryLinux directory structure by jitu mistry
Linux directory structure by jitu mistry
 
Linux file system
Linux file systemLinux file system
Linux file system
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Lesson 2 Understanding Linux File System
Lesson 2 Understanding Linux File SystemLesson 2 Understanding Linux File System
Lesson 2 Understanding Linux File System
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Window scheduling algorithm
Window scheduling algorithmWindow scheduling algorithm
Window scheduling algorithm
 
Linux basic commands
Linux basic commandsLinux basic commands
Linux basic commands
 
VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Block I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktrace
 

Similar a Filesystem Performance from a Database Perspective

Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
PostgreSQL Experts, Inc.
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
oscon2007
 
Systems Automation with Puppet
Systems Automation with PuppetSystems Automation with Puppet
Systems Automation with Puppet
elliando dias
 
Resume_CQ_Edward
Resume_CQ_EdwardResume_CQ_Edward
Resume_CQ_Edward
caiqi wang
 

Similar a Filesystem Performance from a Database Perspective (20)

PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
 
Inspection and maintenance tools (Linux / OpenStack)
Inspection and maintenance tools (Linux / OpenStack)Inspection and maintenance tools (Linux / OpenStack)
Inspection and maintenance tools (Linux / OpenStack)
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
 
Introduction to clarity
Introduction to clarityIntroduction to clarity
Introduction to clarity
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Systems Automation with Puppet
Systems Automation with PuppetSystems Automation with Puppet
Systems Automation with Puppet
 
Database Hardware Benchmarking
Database Hardware BenchmarkingDatabase Hardware Benchmarking
Database Hardware Benchmarking
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival SkillsEvergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
MySQL Tuning
MySQL TuningMySQL Tuning
MySQL Tuning
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
 
Resume_CQ_Edward
Resume_CQ_EdwardResume_CQ_Edward
Resume_CQ_Edward
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Measuring Firebird Disk I/O
Measuring Firebird Disk I/OMeasuring Firebird Disk I/O
Measuring Firebird Disk I/O
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 

Más de Mark Wong

OHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 MockumentaryOHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
Mark Wong
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
Mark Wong
 

Más de Mark Wong (17)

OHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 MockumentaryOHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
 
OHAI, my name is Chelnik! Postgres Open 2013 Report
OHAI, my name is Chelnik! Postgres Open 2013 ReportOHAI, my name is Chelnik! Postgres Open 2013 Report
OHAI, my name is Chelnik! Postgres Open 2013 Report
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
 
Android & PostgreSQL
Android & PostgreSQLAndroid & PostgreSQL
Android & PostgreSQL
 
PGTop for Android: Things I learned making this app
PGTop for Android: Things I learned making this appPGTop for Android: Things I learned making this app
PGTop for Android: Things I learned making this app
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Developing PGTop for Android
Developing PGTop for AndroidDeveloping PGTop for Android
Developing PGTop for Android
 
Pg in-the-brazilian-armed-forces-presentation
Pg in-the-brazilian-armed-forces-presentationPg in-the-brazilian-armed-forces-presentation
Pg in-the-brazilian-armed-forces-presentation
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
PostgreSQL Portland Performance Practice Project - Database Test 2 TuningPostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 BackgroundPostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
 
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
pg_top is 'top' for PostgreSQL: pg_top + pg_proctabpg_top is 'top' for PostgreSQL: pg_top + pg_proctab
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
 
pg_top is 'top' for PostgreSQL
pg_top is 'top' for PostgreSQLpg_top is 'top' for PostgreSQL
pg_top is 'top' for PostgreSQL
 
What Is Going On?
What Is Going On?What Is Going On?
What Is Going On?
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Filesystem Performance from a Database Perspective

  • 1. Filesystem Performance From a Database Perspective Mark Wong markwkm@postgresql.org LinuxFest Northwest 2009 April 25 - 26, 2009
  • 2. How we are here today Generous equipment donations from HP and hosting from Command Prompt, Inc.
  • 3. Setting Expectations This is not: This is: A comprehensive Linux ◮ A guide for how to ◮ filesystem evaluation. understand the i/o behavior A filesystem tuning guide ◮ of your own system. (mkfs and mount options). A model of simple i/o ◮ Representative of your ◮ behavior based on system (unless you have the PostgreSQL exact same hardware.) Suggestion for testing your ◮ A test of reliability or ◮ own system. durability (mostly).
  • 4. __ __ / ~~~/ . o O ( Know your system! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 5. Contents Testing Criteria ◮ What do we want to test. ◮ How we test it with fio. ◮ Test Results ◮
  • 6. Hardware Details HP DL380 G5 ◮ 2 x Quad Core Intel(R) Xeon(R) E5405 ◮ HP 32GB Fully Buffered DIMM PC2-5300 8x4GB DR LP ◮ Memory HP Smart Array P800 Controller w/ 512MB Cache ◮ HP 72GB Hot Plug 2.5 Dual Port SAS 15,000 rpm Hard ◮ Drives __ __ / ~~~/ . o O ( Thank you HP! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 7. Test Criteria PostgreSQL dictates the following (even though some of them are actually configurable: 8KB block size ◮ No use of posix fadvise ◮ No use of direct i/o ◮ No use of async i/o ◮ Uses processes, not threads. ◮ Based on the hardware: 56GB data set - To minimize any caching in the 32GB of ◮ system memory. (Want to use 64GB but these 72GB disks are in millions of bytes, i.e. not big enough for the 1 disk tests.) 8 processes performing i/o - One fio process per core, except ◮ for sequential i/o tests, otherwise the i/o won’t be sequential.
  • 8. Linux v2.6.28-gentoo Filesystems Tested ext2 ◮ ext3 ◮ ext4 ◮ jfs ◮ reiserfs ◮ xfs ◮ a8888b. . o O ( What is your favorite filesystem? ) d888888b. 8Pquot;YPquot;Y88 8|o||o|88 8’ .88 8‘._.’ Y8. d/ ‘8b. .dP . Y8b. d8:’ quot; ‘::88b. d8quot; ‘Y88b :8P ’ :888 8a. : _a88P ._/quot;Yaa_ : .| 88P| YPquot; ‘| 8P ‘. / ._____.d| .’ ‘--..__)888888P‘._.’
  • 9. Hardware RAID Configurations These are the RAID configurations supported by the HP Smart Array p8001 disk controller: RAID 0 ◮ RAID 1+0 ◮ RAID 5 ◮ RAID 6 ◮ __ __ / ~~~/ . o O ( Which RAID would you use? ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot; 1 Firmare v5.22
  • 10. Test Program Use fio2 (Flexible I/O) v1.23 by Jens Axboe to test: Sequential Read ◮ Sequential Write ◮ Random Read ◮ Random Write ◮ Random Read/Write ◮ 2 http://brick.kernel.dk/snaps/
  • 11. __ __ / ~~~/ . o O ( fio profiles ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 12. Sequential Read [seq-read] rw=read size=56g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=1 nrfiles=1 runtime=1h time_based exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 13. Sequential Write [seq-write] rw=write size=56g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=1 nrfiles=1 runtime=1h time_based exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 14. Random Read [random-read] rw=randread size=7g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h time_based exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 15. Random Write [random-write] rw=randwrite size=7g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 16. Random Read/Write [read-write] rw=rw rwmixread=50 size=7g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h time_based exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 17. __ __ / ~~~/ . o O ( Results! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot; There is more information than what is in this presentation, raw data is at: http://207.173.203.223/~ markwkm/community10/fio/linux-2.6.28-gentoo/ Raw mpstat, vmstat, iostat, and sar data. ◮
  • 18. A few words about statistics... Not enough time or data to calculate good statistics. My excuses: 1 hour per test ◮ 5 tests per filesystem ◮ 6 filesystems per RAID configuration (1 filesystem consistently ◮ halts testing, requiring manual intervention...) 7 RAID configurations for this presentation ◮ 210+ hours of continuous testing for this presentation. ◮
  • 19. Examine the following charts for a feel of how reliable the results may be.
  • 20. Random Read with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 21. Random Write with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 22. Read/Write Mix (Reads) with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 23. Read/Write Mix (Writes) with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 24. Sequential Read with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 25. Sequential Write with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 26. A couple notes about the following charts If you see missing data, it means the test didn’t complete ◮ successfully. Charts are not all scaled the same, focus is on trends within a ◮ chart. (But maybe that is not a good choice?)
  • 27. Results for all the filesystems when using only a single disk...
  • 28.
  • 29. Results for all the filesystems when striping across four disks...
  • 30.
  • 31. Let’s look at that step by step for RAID 0.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Trends might not be the same when adding disks to other RAID configurations. (We didn’t test it because we moved on to database systems testings.)
  • 39. What is the best use of disks with respect to capacity?
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46. Filesystem Readahead Values are in number of 512-byte segments, set per device. To get the current readahead size: $ blockdev --getra /dev/sda 256 To set the readahead size to 8 MB: $ blockdev --setra 16384 /dev/sda
  • 47.
  • 48. What does this all mean for my database system? It depends... Let’s look at one sample of just using difference filesystems.
  • 49.
  • 50. Raw data for comparing filesystems Initial inspection of results suggests we are close to being bound by processor and storage performance, meaning we don’t really know who performs the best: ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext2.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext3.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext4.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/jfs.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/reiserfs.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/xfs.1500.2/
  • 51. Linux I/O Elevator Algorithm To get current algorithm in []’s: $ cat /sys/block/sda/queue/scheduler noop anticipatory [deadline] cfq To set a different algorithm: $ echo cfg > /sys/block/sda/queue/scheduler noop anticipatory [deadline] cfq
  • 52. Why use deadline?3 Attention! Database servers, especially those using ”TCQ” ◮ disks should investigate performance with the ’deadline’ IO scheduler. Any system with high disk performance requirements should do so, in fact. Also, users with hardware RAID controllers, doing striping, ◮ may find highly variable performance results with using the as-iosched. The as-iosched anticipatory implementation is based on the notion that a disk device has only one physical seeking head. A striped RAID controller actually has a head for each physical device in the logical RAID device. 3 Linux source code: Documentation/block/as-iosched.txt
  • 53. Final Notes You have to test your own system to see what it can do, don’t take someone else’s word for it.
  • 55. __ __ / ~~~/ . o O ( Thank you! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 56. Acknowledgements Haley Jane Wakenshaw __ __ / ~~~/ ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 57. License This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, (a) visit http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA.