TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Exploiting Your File System to Build Robust & Efficient Workflows
1. Exploiting Your File System to Build Robust &
Efficient Workflows
Jason Johnson
jajohnson@softlayer.com
Exploiting Your File System to Build Robust &
Efficient Workflows
Jason Johnson
jajohnson@softlayer.com
7. The Disk Array Controller
● Adaptec 5405Z
● PCIe x8
● 1.2 GHz Dual Core RAID on Chip (ROC)
● 128-1024 MB Battery-Backed DDR
● 1-4 GB NAND
● Up to 256 SATA or SAS HDD's
● arcconf
9. “...you *must* disable the
individual hard disk write cache in
order to ensure to keep the file
system intact after a power
failure.”
XFS.org FAQ
10. Initial 1MB Sector Alignment
Sector Size Starting Sector Drive Type
512 B 2048 SATA & SAS
2 KB 512 SSD
4 KB 256 Advanced Format & SSD
blockdev --getpbsz /dev/sdc
blockdev --getss /dev/sdc
“Aligning IO on a hard disk RAID”
http://www.mysqlperformanceblog.com/2011/06/09/aligning-io-on-a-hard-disk-raid-the-theory/
30. inotify
Event Mask Fired when...
IN_ACCESS File was accessed (read)
IN_ATTRIB Metadata changed
IN_CLOSE_WRITE File opened for writing was closed
IN_CLOSE_NOWRITE File not opened for writing was closed
IN_CREATE File/directory created in watched directory
IN_DELETE File/directory deleted from watched directory
IN_DELETE_SELF Watched file/directory was itself deleted
IN_MODIFY File was modified
IN_MOVE_SELF Watched file/directory was itself moved
IN_MOVED_FROM File moved out of watched directory
IN_MOVED_TO File moved into watched directory
IN_OPEN File was opened
31.
32. inotify in [language]
Language Source
Python pip install pyinotify
PHP pecl install inotify
Go go's exp repository
Ruby gem install rb-inotify
C #include <sys/inotify.h>
38. Summary
● Caching
● File system choice
● Benchmarking w/ sysbench
● Efficiency through proper configuration
● Robustness through cooperation & decoupling
● Discovering & understanding your write pattern
● Benchmark & Graph everything
● Never Assume Anything (atime, stripe width, etc.)
39. Jason Johnson jajohnson@softlayer.com
https://github.com/jasonjohnson
http://www.slideshare.net/jasonajohnson
“A Case for Redundant Arrays of Inexpensive Disks (RAID)”
http://www.cs.cmu.edu/~garth/RAIDpaper/Patterson88.pdf
“Practical File System Design”
http://www.nobius.org/~dbg/practical-file-system-design.pdf
“XFS Papers and Documentation”
http://xfs.org/index.php/XFS_Papers_and_Documentation
“Kernel Documentation on File Systems”
https://www.kernel.org/doc/Documentation/filesystems/
“MySQL Performance Blog”
http://www.mysqlperformanceblog.com/
“MySQL DBA”
http://mysqldba.blogspot.com/
“MySQL Server System Variables”
http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html
Notas del editor
Good afternoon! Title
Begins Database server? Video Encoder? Where to go from here? Up or Down Understand the Abstraction Go Down to Physical
Common Big Virtual Disk RAID Controller Couple Drives Not “set it and forget it”
Platters Spindle Actuator Actuator Coil Actuator Arm Heads
DDR Flushes to NAND Configuration Tool Stunned Discrete GPU for your File System
Data Corruption? (hands) Fallible Disable All Caching Eliminate Class of Errors
Sector Alignment 1MB Offset Room for Partition Table Use These Tools Verify Correctness
Sectors Clearly Communicating in LBA Logical Size Offset Check. All Makes Sense Not Scary
No Caching Sysbench 16 Data-Bearing Disks Hardware Controller XFS Designed for This! But... verify through testing.
Stripe Size 256k Entire Width Cache Disabled Add Physical Drives All Controller Brands Different
What are we comparing? EXT4 vs. XFS Naive Naive External Tuned Tuned External
From Naive to Modestly Tuned Stripe Width Stride 128MB external journal Mount requires extra information
Review Graph
Review Graph
Again, From Naive to Modestly Tuned Stripe Width Stripe Unit Size External Journal Device Additional Information Needed by Mount
Review Graph
Review Graph
We want 330 IOPS Our $$$
Neck and Neck Slight Advantage at 256 Threads
Noticeable Advantage at 256 Threads 5,200 IOPS Reached Practical Limit Early Fully Tuned? sysctl for XFS?
Predictable Write Pattern We Make It Efficient
InnoDB Pages Linux Pages Unit of Work XFS Allocation Groups EXT4 Metadata Groupings
Review The Configuration Plug-in Values from Sysbench Google “MySQL System Variables” Explain Values
Small Benchmark 10 Million Inserts, One Transaction Each Ramp up Threads Text EXTERNALLY Deadlock Potential Spin-locks Contending with Benchmark
Transactions per Second For the Percentage Increase Folks For the Real Figures Folks 125% increase (in some cases) High-End nearing 16,000
Before Next Section ---------------------------------- Who has written code like this? (hands) Scanning Race Condition Creation Behind Us ----------------------------------- There is a better way!
It Can Tell Us No Scanning or Polling No Races ------------------------------------------ IN_CLOSE_WRITE IN_MOVED_TO
Event Stream No Races File System Obeys Rules Can't Move Files Being Written
Go's extracted from stdlib FreeBSD's kqueue
Familiar? SMTP or Maildir ----------------------------------- Fall Over ----------------------------------- Inbox Partial Content Source & Victim Locked Fetching Serialized
Request Must Know How to Respond But... ONE I/O THREAD?!!
Predictable, Simple Hash Tenants CAN & WILL Clobber, Though Sticky
Notification-based Movement Serialized I/O per-queue Parallel I/O per-server Highly Available Front-End Decoupled Delivery Basic UNIX Command Maintenance -------------------------------------- Learn From MySQL Random Writes & Random Size ZFS & ZIL kqueue vs. inotify Fix One