1. Using Isilon All-Flash Storage for SAS GRID
A Technical Deep Dive
Boni Bruno, CISSP, CISM, CGEIT
Chief Solutions Architect, Analytics
Dell EMC
2. 2
SAS – Statistical Analysis Systems
• Business Intelligence, Advanced Analytics, Data Management, Predictive
Analysis
• SAS is not a relational database, (RDBMS)
– SAS is an interpretive programming language
– Data is stored in SAS proprietary formatted files
• Native access to all major databases
• Application Front Ends for thick, thin, Grid, and multi-platform tiered solutions
• Used by nearly every Dell EMC Enterprise customer
– 100’s of TB of SAS data is common
3. 3
Why Dell EMC for SAS Analytics?
Dell EMC holds leadership positions in some of the biggest and largest growth
categories in the IT infrastructure business, and that means you can confidently source
all your IT needs from one provider — Dell EMC
• converged infrastructure1
• in traditional and all-flash storage2
• virtualized data center
infrastructure3
• cloud IT infrastructure4
• server virtualization and cloud
systems management software
(VMware)5
• in data protection6
• in software-defined storage7
1 IDC WW Quarterly Converged Systems Tracker, June 2016, Vendor Revenue — EMC FY 2015; 2 IDC WW Quarterly Enterprise Storage Systems Tracker, June 2016, Vendor Revenue — EMC CY 2015; 3 Dell EMC Annual Report, 2015; 4 IDC WW
Quarterly Cloud IT Infrastructure Tracker, Q1 June 2016, Vendor Revenue — EMC FY 2015; 5 IDC WW Virtual Machine and Cloud System Market Shares 2015, July 2016; 6 Dell EMC Pulse, Gartner Recognizes EMC as a Leader in the 2016 Data Center
Backup and Recovery Software Magic Quadrant, June 2016; 7 IDC white paper, "Software Defined Storage: A Pervasive Approach to IT Transformation Driven by the 3rd Platform," November 2015
4. F810 Overview
New All-Flash Node with Inline Data Reduction
Key Features
Benefits
Hardware-accelerated, real-time compression
Supports 3.8TB, 7.7TB and 15.4TB SSD capacities
Fully supported in heterogeneous Isilon Gen6 clusters
Dell EMC 2:1 Data Reduction Guarantee & other Storage
Loyalty Program elements
Ideal for demanding workloads that require extreme performance and efficiency
Up to 33% more effective storage/PB than major competitive offerings
Simple configuration, transparent operation
Fully supported with all other Isilon OneFS features
5. Why Storage is Critical in Analytics…
• Analytics require massive amounts of data to meet business needs
• Speed of access to data is critical in order to “feed” increasing
processing power
• Enhanced compression techniques to reduce cost without hindering
performance
• Easily scalable as the environment grows (modular)
• Even as analytics move to RAM, it has to be stored somewhere and
accessed quickly
• Ability to eliminate duplicate data as time goes on, to further reduce
storage
6. Typical SAS Grid Architecture
Grid Node #1
SAS
Grid
Resource
Mgmt
Typically IBM
Platform LSF
Users
Grid Node #2
Job
Submission
Browser
SAS
Client
Tools
Batch
(shell)
Shared Storage
Customer Data
Customer Home Directories
SAS Code
Etc.
Dedicated to Grid Node
(Fast, Never Shared or NFS)
Job 1
Job 3
Job 2
Temp Storage,
SASWORK
Temp Storage,
SASWORK
Fiber or Network
(High Speed)
Many Grid Nodes
(100s of total cores)
7. Grid Node #1
Grid Node #12
Batch 1
Submission
Batch
Scripts
(shell)
4 Isilon F8x0 Nodes
Each Batch Has its own copy of the data
Input NFS mount for each grid node/batch
Output NFS mount for each node/batch
Dedicated to on Grid Node
(20+ disk RAID-0)
Job 1
Job 1
Job 2
Temp Storage,
SASWORK
Temp Storage,
SASWORK
2 x 10 GbE
Per SAS Node
Batch 12
Submission
Job 2
1 x 40 GbE
Per Node
Load Sharing Facility (LSF) was NOT used in this scenario to spawn jobs
in order to create a more repeatable job launch across all nodes (predictable job spread).
It also helped reduce setup time. This is a common practice at SAS, partners and customers.
12 nodes!!
Each batch is 33 SAS jobs.
Common to have 10s or 100s of NFS mounts in typical grid
(typical for groups/projects to have 1 or more mounts each)
1 x 40 GbE
Per Node
Dell EMC
SAS GRID v 9.4M6
Test Lab
8. Dell EMC
SAS GRID v 9.4M6
Test Lab Grid
Node 1
Grid
Node 3
Node1
Node2
Node3
Node4
2x10 GbE
Bond
LACP 802.3ad
2x40 GbE
1x40 GbE
Grid
Node 8
Grid
Node 2
Node1
Node2
Node3
Node4
Isilon Models:
Isilon F810-4U-Single-256GB-1x1GE-2x40GE SFP+-24TB SSD, OneFS v is 8.1.3
Isilon F800-4U-Single-256GB-1x1GE-2x40GE SFP+-24TB SSD, OneFS v is 8.2.0
/saswork
22 disks
raid 0
FYI: CPUs used in f8x0 E5-2697A v4
PowerEdge R730 Servers:
Intel Xeon CPU E5-2698 V4 2.2 GHz
2 Sockets Per Grid Node
20 Cores Per Socket (40 Threads)
40 Total Cores / 80 Threads
Each Grid node 256 GB RAM
Network
40GbEStorage
Interconnect
40GbEStorage
Interconnect
F810 with
HW compression
/sasdata
F800 /sasdata
9. NFS CLIENT Mount Options
EXAMPLE FROM SAS GRID NODE 2
# F800 with SAS Compression
f800n2:/ifs/f800c/wrk2/multiuser /f800c nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
f800n3:/ifs/f800c/wrk2/sas7bdat /f800c/sas7bdat nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
f800n4:/ifs/f800c/wrk2/output /f800c/output nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
#F800 with no SAS Compression
f800n2:/ifs/f800/wrk2/multiuser /f800 nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
f800n3:/ifs/f800/wrk2/sas7bdat /f800/sas7bdat nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
f800n4:/ifs/f800/wrk2/output /f800/output nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
#F810 with SAS Compression and HW compression
f810n2:/ifs/f810c/wrk2/multiuser /f810c nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
f810n3:/ifs/f810c/wrk2/sas7bdat /f810c/sas7bdat nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
f810n4:/ifs/f810c/wrk2/output /f810c/output nfs nfsvers=3,tcp,rw,hard,intr,retrans=2,nosuid,noatime,nodiratime 0 0
11. Testing Focus Areas…
Performance
• How does compression and newer hardware effect runtimes?
Scalability
• We tested up to 12 Grid Nodes
• Most existing NFS Clusters are 1:1
• Do runtimes for individual jobs increase(get slower)?
Compression
• SAS Binary Compression does help with larger files 20-50%
• What happens when we throw in Isilon F810 HW compression too?
Deduplication
• Lots of Replicated Data in Analytic Systems, Can We Save More Space?
Cost
• Can we deploy less nodes due to compression and maintain performance?
12. D4t4 Financial Services Workload
• Suite: Multiuser Analytic Workload
• Created By D4t4 For Financial Services Customers
• Work Patterns And Data Volumes Match Real Customer Jobs
• Simulates SAS Grid Users
• Mix Of Programs That Simulate Different User Scenarios
• Interactive And Batch SAS Jobs
• Designed To Evaluate:
• Scalability Of HW Resources (Focus On Storage Performance)
• Sustained Performance At Scale
• Monitor Response Times Of Large And Small Jobs
• Easily Adjustable To Match Customer Workload
• Ability Of A System To Achieve Customer Requirements
13. SAS IO Requirements / How Data Flows
CPU Core
(Typically 2 Threads)
Sustained feed R+W
100-150 MBps per core
Peak feed R+W
300-400 MBps per core
System RAM
IO does occur here to.. file cache & more with Viya
Connections
Network, Fiber, SATA, etc.
To and From Sources, RAM, Cores
Data on Disk
Project, Tables, etc.
SAS Work
Temporary – High Speed
Network
RDBMs, Streams, etc.
~40-50%
~40-60%
~10-20%
Typical
IO Percentage
To/From Source
SAS Rule: Sustain IO Throughput of around 150 MBps Total (combined R+W) per core
Yes… cores range in speed and performance, but this is a good target throughput…
Data Source/Target
Running SAS
Jobs
14. Multiuser Analytics Workload Execution
SAS Grid
Node #1
scale
SAS Grid
Node #2
Batch 1 Launch
Isilon Shared Storage
work
work
Batch 1
Data
Batch 2
Data
Batch 2 Launch
Batch #
Data
Network
scale
40GbEStorageInterconnect
15. Multiuser Workload Batch Details
• Single Node Batch Includes:
• 33 SAS Programs Executed
• Staggered Launch – Timed Script to simulate onboard/real world
• Each Batch Averages ~15-20 Simultaneous Jobs at Peak
• Simulate typical 8 to 12 core SAS Grid server workload during average day
• Data Volumes Per Batch (SAS uncompressed)
• Input Data (SAS7bdat): 1.3 TB
• Output Data Created: 1.2 TB
• SASWork / Temporary Space Peak Usage: ~350 GB (grows and shrinks over period)
• Job Types:
• SAS Studio / Report User – interactive report/coding user (sleep periods are added to create
the feel of real users working on the system at random periods)
• SAS Modeler – execution of complex analytics like logistic, regression
• SAS Data Set construction in support of Modeling / Analytics (building analytics data sets)
• ETL workflow simulation, reading from remote source and populating tables (includes index
creation, merge, where, sorts)
• Advanced Analytics user – larger datasets with more advanced analytics and data
manipulation
16. Multiuser Workload Batch Details (cont.)
• SAS Procedures / Methods Used in Code
• Datasets, PRINT, MEANS, CONTENTS, SQL, HPLOGISTIC, SORT, REG, GLM, DELETE
• Data step (sequential and random read/write)
• Data Details (Uncompressed SAS & Isilon)
• Modeling Data
• User Data
• Random Generated With Fields That Mimic Financial Services
• In Reality, Stressing The IO Is The Key To Performance Testing For SAS Grid!
18. IO Throughput for SAS Grid – Deeper Look
• SAS requires IO throughput of 150 MBps/CPU Core
• SAS grid nodes typically have from 8 to 12 CPU cores for NFS
• Typical for dual 10 GbE configuration
• Therefore 12 core node needs 1800 MBps sustained throughput
• IO comes from: SASWork, Data Storage, Other (Network, RDBMS)
• IO throughput percentages for data sources is typically:
• SASWork (~50%), Data (~40%), Other (~10%) - this varies by customer! (see note below)
• If your 12 node SAS Grid has 12 cpu cores each:
• A Single Grid Node Need ~720 MBps sustainable R+W Throughput from NFS
• The Entire Grid Needs ~8 GBps sustainable R+W Throughput from NFS**
**4 x F810s with 12 Grid Nodes
During IO Throughput R+W Tests
NOTE: 40 node grid – Average sustained IO Throughput for 12 core Grid node at major financial institution is 650 MBps with 2 x 10 GbE to NFS
19. Further Details About The Multiuser Analytic Workload
Workload High Level Concept: The Multiuser Analytic Workload was written to be launch workload like that found in a financial services SAS Grid. The workload is similar in design
to SAS’s Mixed Analytic Workload developed during the past 20+ years at SAS to simulate a typical SAS multi-user workload (SAS’s version included jobs from healthcare, government,
etc).
The multiuser workload can be run on a single SMP system or a multi-node SAS Grid environment. It is designed to be modified in order to ramp the workload up and down to stress
the system’s CPU, RAM and I/O capability based on its performance potential (size). SAS IO, being the most critical component of any customer’s SAS environment, is one of the
prime focuses of the scenario and most SAS tests.
• SAS programs in the workload includes data and functions that simulate the following SAS user personas:
• SAS Studio / Report User – interactive report/coding user (sleep periods are added to create the feel of real users working on the system at random periods)
• SAS Modeler – execution of complex analytics like logistic, regression
• SAS Data Set construction in support of Modeling / Analytics (building analytics data sets)
• ETL workflow simulation, reading from remote source and populating tables (includes index creation, merge, where, sorts)
• Advanced Analytics user – larger datasets with more advanced analytics and data manipulation
• The above jobs are (simultaneous executions) of jobs are launched in a timed launch sequence to simulate users coming and going from the grid.
Run Philosophy: It is very common to run this test scenario in different mixes of the types of users (SAS jobs) in order to more closely resemble a customer's environment. This was
NOT designed to behave like a TPC or SPEC benchmark where the results are always the same and the test is run in exactly the prescribed fashion. Its meant to stress the system,
especially related to I/O in order to confirm it can achieve the recommended SAS requirements. The target IO capability as of this writing is 150 MBps per CPU core. The test is tuned
up and down to ensure that under multi-user workload that throughput can be maintained.
Goal: Meeting SAS’s Requirements for IO Throughput: SAS requires a system to be able to sustain 150 MBps per CPU core. This means the total IO (Read+Write) to temporary
(SASWORK) or permanent storage locations like RDBMS, SAN and/or NAS storage devices must be able to sustain 150 MBps per CPU core at any time. i.e. If 50% of your IO is to
SASWORK, then the other 50% needs to come from the permanent stores like NFS. Therefore NFS would need to maintain a throughput of 75 MBps in order to properly support a
single CPU core system. As a further example, if we had a 10 CPU core system, the sum of the IO capability of the NFS files system would need to support 750 MBps if the other 50%
was supported by SASWORK. The larger the SAS compute server is, the more IO you will need to provide.
Test Execution: Jobs are launched with a shell script on 1 or more machines (SMP or multi-grid node SAS environment). The script used on each grid node launched 33 jobs in a
controlled time launch sequence on 1 or more servers at the same time. Data is pre-generated (compressed or uncompressed) and duplicated on all the machines (local or shared
file system). In this test scenario the data was located on NFS (shared storage – Isilon). A SASWORK local file system was created to handle 50% of the IO workload (dedicated to
each grid node). Output data directory was also placed on the NFS file system. Scripts are launched on each grid node participating in the scenario and used its own data copy
located on the shared storage. No data was shared between grid nodes for this test (many customers do share some data, but typical analytic SAS shops create and then manage
their own input/output data for individual projects. It was typical to see 16 or more simultaneous SAS jobs running on each grid node during the test at any one time. This amount
of simultaneous data was chosen to simulate a typical SAS Grid node with 2 x 10 Gb Ethernet connections to NAS/NFS.
20. Performance: F810 faster than F800
SAS Job Name in Test Suite F800 - sas compression=none
F800 – sas
compress=binary
F810 – sas compress=binary & HW
compression
citi1_1 0:53:26 0:53:44 0:41:02
citi1_3 0:53:12 0:53:26 0:41:13
citi2_1 2:17:14 1:24:45 0:49:28
citi2_3 2:17:03 1:23:58 0:49:26
comp_glm_1a 0:00:39 0:00:42 0:00:37
comp_glm_4a 0:00:45 0:00:53 0:00:44
comp_glm_4b 0:00:43 0:00:51 0:00:46
etl_inbound_1 0:05:02 0:43:29 0:12:12
etl_inbound_4 0:07:41 0:40:07 0:12:37
fscheck_a 0:00:01 0:00:02 0:00:02
fscheck_c 0:00:00 0:00:01 0:00:01
fscheck_f 0:00:00 0:00:02 0:00:01
fscheck_i 0:00:01 0:00:00 0:00:00
fscheck_l 0:00:00 0:00:00 0:00:00
fscheck_m 0:00:01 0:00:05 0:00:04
hplogistic_1 0:20:30 0:09:44 0:12:25
hplogistic_2 0:17:08 0:10:23 0:12:04
rtumble_1 0:36:21 0:07:41 0:07:47
rwrw_1 0:18:25 0:54:42 0:34:05
rwrw_2 0:17:29 0:51:10 0:32:12
rwtumble_1 0:36:51 0:10:16 0:10:25
smallnoise_11b 0:01:05 0:01:04 0:00:59
smallnoise_17 0:01:09 0:01:04 0:00:59
smallnoise_18 0:01:13 0:01:09 0:00:59
smallnoise_5 0:01:18 0:01:01 0:00:59
smallnoise_6a 0:01:08 0:01:01 0:00:59
smallnoise_6 0:01:16 0:01:02 0:00:59
smallnoise_9 0:01:04 0:01:00 0:00:59
sort_1 0:20:07 0:27:55 0:03:41
where_test_1 0:10:24 0:24:30 0:02:19
wr_junk_10 1:21:08 0:52:13 0:36:34
wr_junk_1 1:25:18 0:56:18 0:37:16
wr_junk_3 1:25:16 0:56:22 0:37:19
Sum of ALL Jobs Runtimes 13:52:58 12:10:40 7:21:13
Average individual Job Runtime 25:14 22:08 13:22
Times in
H:MM:ss
*some jobs vary pending compression type and combination, but overall F810 with SAS Binary Compression is Best
21. Scalability: F810 Maintains Throughput While Adding More
NFS Clients and SAS Programs
Test
Scenario
Number of SAS
Programs Run
SAS Grid
Nodes
Avg Job Runtime
MM:ss
Max Job
Runtime
HH.MM:ss
Standard
Deviation in
Job Runtime
comparing all jobs
Sustained Throughput
at peak times on Isilon
Isi stats reports
(R+W)
1 33 1 13:12 49:28 16:58 650 to 750 MBps
2 66 2 12:51 47:18 16:12 1 to 1.4 GBps
3 132 4 13:11 49:20 16:42 2 to 2.5 GBps
4 264 8 13:02 49:57 16:28 4.5 to 5 GBps
5 396 12 12:28 49:30 15:47 6.5 to 7 GBps
• Average Runtime = Sum of Runtimes / Number of Jobs
• Maximum Job Runtimes = slowest job in entire Scenario
• Grid Node = 12+ core linux server with dual 10GbE to NFS
All tests run on 4 node F810 cluster.
3:1 Ratio of Dual 10 GbE NFS Clients to Isilon Nodes
for all the above test scenarios.
22. Performance: Test Details: F810 with SAS compression
Isilon Stats during 12 node grid run
Isilon is 42% idle even with 12 GRID nodes and 396
simultaneous jobs running!!!
23. 0.0
500.0
1000.0
1500.0
2000.0
2500.0
3000.0
0 300 600 900 1200 1500 1800 2100 2400 2700 3000 3300
MB/s
Seconds
Total IO Throughput in MB/s from NMON
Worker 2 During 2 Node Scenario
Isilon F810 with HW and SAS Compression
NFS Read MB/s NFS Write MB/s SASwork Read MB/s SASwork Write MB/s
0.0
500.0
1000.0
1500.0
2000.0
2500.0
3000.0
3500.0
4000.0
0 300 600 900 1200 1500 1800 2100 2400 2700 3000 3300
MB/s
Seconds
Total IO Throughput in MB/s from NMON
Worker 12 During 12 Node Scenario
Isilon F810 with HW and SAS Compression
NFS Read MB/s NFS Write MB/s SASwork Read MB/s SASwork Write MB/s
Scalability: Comparing IO Patterns on Grid Nodes
During 2 Node and 12 Node Run Comparison
24. F800 sas compress
F810 HW compress + sas compress
F800 no compression Performance: NMON CPU Utilization on Grid Node
Comparison of Configurations Tested
CPU during single batch run of 33 SAS jobs
Graphs scaled to match for visual comparison
Significantly shorter Runtime
Better overall throughput
25. Scalability:
Bank2 Job
simulate model/data manipulation
DATA step to NFS – 150,000,000 obs, 126 vars
PROC Print 5 obs
PROC Datasets / create index on NFS
PROC Print 100 obs with sum
PROC MEANS
DATA step to work
PROC Datasets / create 2nd index on NFS
Grid Nodes
F800
HH:mm
F810
HH:mm
1 1:24 0:49
2 1:25, 1:22 0:48, 0:49
4 1:26, 1:21, 1:25, 1:22 0:48, 0:49, 0:49, 0:47
8
1:25, 1:22, 1:25, 1:21,
1:24, 1:25, 1:18, 1:23
0:45, 0:48, 0:49, 0:45,
0:48, 0:47, 0:45, 0:46
12 Not run
0:49, 0:47, 0:45, 0:49,
0:46, 0:44, 0:50, 0:48,
0:50, 0:49
Predictable and
Repeatable Runtimes as
System is Scaled Up
26. Compression: Ratio of Input Data on All Systems
f800 f800 f810 with hardware compress
no sas compress with SAS compress with SAS compress
63G citi1input_1.sas7bdat 22G citi1input_1.sas7bdat 2.8G citi1input_1.sas7bdat
63G citi1input_2.sas7bdat 22G citi1input_2.sas7bdat 2.8G citi1input_2.sas7bdat
63G citi1input_3.sas7bdat 22G citi1input_3.sas7bdat 2.8G citi1input_3.sas7bdat
63G citi1input_4.sas7bdat 22G citi1input_4.sas7bdat 2.8G citi1input_4.sas7bdat
184G citi2input_1.sas7bdat 57G citi2input_1.sas7bdat 7.2G citi2input_1.sas7bdat
185G citi2input_2.sas7bdat 57G citi2input_2.sas7bdat 7.2G citi2input_2.sas7bdat
185G citi2input_3.sas7bdat 57G citi2input_3.sas7bdat 7.2G citi2input_3.sas7bdat
185G citi2input_4.sas7bdat 57G citi2input_4.sas7bdat 7.2G citi2input_4.sas7bdat
4.6M glminput_1.sas7bdat 6.3M glminput_1.sas7bdat 2.8M glminput_1.sas7bdat
4.8M glminput_2.sas7bdat 6.6M glminput_2.sas7bdat 2.8M glminput_2.sas7bdat
22G multiuser_1.sas7bdat 17G multiuser_1.sas7bdat 14G multiuser_1.sas7bdat
22G multiuser_2.sas7bdat 17G multiuser_2.sas7bdat 14G multiuser_2.sas7bdat
22G multiuser_3.sas7bdat 17G multiuser_3.sas7bdat 14G multiuser_3.sas7bdat
22G multiuser_4.sas7bdat 17G multiuser_4.sas7bdat 14G multiuser_4.sas7bdat
13G ranrw_medium_1.sas7bdat 825M ranrw_medium_1.sas7bdat 103M ranrw_medium_1.sas7bdat
13G ranrw_medium_2.sas7bdat 825M ranrw_medium_2.sas7bdat 103M ranrw_medium_2.sas7bdat
1.6G ranrw_skinny_1.sas7bdat 480M ranrw_skinny_1.sas7bdat 78M ranrw_skinny_1.sas7bdat
1.6G ranrw_skinny_2.sas7bdat 480M ranrw_skinny_2.sas7bdat 78M ranrw_skinny_2.sas7bdat
544K ranrw_small_1.sas7bdat 544K ranrw_small_1.sas7bdat 64K ranrw_small_1.sas7bdat
544K ranrw_small_2.sas7bdat 544K ranrw_small_2.sas7bdat 64K ranrw_small_2.sas7bdat
51G ranrw_wide_1.sas7bdat 1.7G ranrw_wide_1.sas7bdat 210M ranrw_wide_1.sas7bdat
51G ranrw_wide_2.sas7bdat 1.7G ranrw_wide_2.sas7bdat 210M ranrw_wide_2.sas7bdat
40G simdata_1.sas7bdat 55G simdata_1.sas7bdat 19G simdata_1.sas7bdat
16G simdata_2.sas7bdat 22G simdata_2.sas7bdat 7.3G simdata_2.sas7bdat
12G simdata_tnk_1.sas7bdat 9.6G simdata_tnk_1.sas7bdat 8.8G simdata_tnk_1.sas7bdat
12G simdata_tnk_2.sas7bdat 9.6G simdata_tnk_2.sas7bdat 8.8G simdata_tnk_2.sas7bdat
25G sortinput_1.sas7bdat 5.2G sortinput_1.sas7bdat 1.7G sortinput_1.sas7bdat
99G sortinput_2.sas7bdat 21G sortinput_2.sas7bdat 6.6G sortinput_2.sas7bdat
1433.6 503 149 GB on Disk
9.6:1
Ratio to Uncompressed
Data on F800
3.3:1
Ratio to SAS Compressed
Data on F800
27. Compression - Total Disk Space Used During Tests
Isilon Model
SAS Compress =
Binary
Isilon HW Compress
SAS7bdat Data Directory
(after test runs)
Output Data
(after test runs)
F800 - - 1331 GB 1228 GB
F800 Yes - 503 GB 748 GB
F810 Yes Yes 149 GB 119 GB
• Increased compression over plain SAS compression
• SAS compression reduces network traffic
• Isilon compression further reduces disk space requirement.
• Sizes listed here are for a single Batch run (input / output for single 33 job run).
28. Compression: Occasionally SAS Compression Causes Issues
Table Output size: 10,000,000 obs 112 vars
ETL Inbound Job – Data coming from DATABASE or other source to Disk
SAS inbound Data Step – Very Common Activity (simdata_tnk.sas7bdat)
With follow up Datasteps as data is modified for analytics.
Isilon Model
SAS Compress =
Binary
Isilon HW
Compress
File Size:
du -sh
Runtime to
create file:
MM:ss
Data step Copy
file from NFS to
NFS lib
MM:ss
All steps,
Total SAS Job
MM:ss
F800 - - 12 GB 1:40 3:35 18:25
F800 Yes - 9.6 GB 6:08 30:17 54:42
F810 - Yes 8 GB 1:10 8:24 14:00
F810 Yes Yes 8.8 GB 8:53 7:35 34:05
• In this particular use case, compression (SAS’s) seems to cause an issue.
• The good news… you can turn SAS compression off on individual jobs!
29. Deduplication against f810c Filesystem Size Used Avail Use% Mounted on
BEFORE:
10.246.24.202:/ifs/f810c/wrk2/multiuser 87T 4.6T 79T 6% /f810c
AFTER:
10.246.24.202:/ifs/f810c/wrk2/multiuser 87T 2.7T 81T 4% /f810c
Dedup Assessment Job Run:
Job Report Details
Time:
2020-04-01 23:22:39
Event ID:
3.13524
Job ID:
1205
Job Type:
DedupeAssessment
Phase:
1
Report:
Dedupe job report:{
Start time = 2020-Apr-02:01:55:03
End time = 2020-Apr-02:02:22:38
Iteration count = 1
Scanned blocks = 597296572
Sampled blocks = 36254886
Deduped blocks = 512736028
Dedupe percent = 85.8428
Created dedupe requests = 32182564
Successful dedupe requests = 32182564
Unsuccessful dedupe requests = 0
Skipped files = 1512
Previously assessed files = 0
Index entries = 4072317
Index lookup attempts = 4072317
Index lookup hits = 0
}
Elapsed time: 1655 seconds
Aborts: 0
Errors: 0
Scanned files: 455
Directories: 179
1 path:
/ifs/f810c
CPU usage: max 113% (dev 2), min 0% (dev 2), avg 43%
Virtual memory size: max 542760K (dev 2), min 430260K (dev 2), avg 498608K
Resident memory size: max 105316K (dev 1), min 21684K (dev 2), avg 53200K
Read: 27939643 ops, 228881555456 bytes (218278.5M)
Write: 2415628 ops, 19788824576 bytes (18872.1M)
Other jobs read: 53 ops, 434176 bytes (0.4M)
Other jobs write: 93379 ops, 764960768 bytes (729.5M)
Non-JE read: 1815 ops, 14868480 bytes (14.2M)
Non-JE write: 901805 ops, 7387586560 bytes (7045.4M)
Dedup Job Run Results:
Job Report Details
Time:
2020-04-02 03:32:08
Event ID:
3.13534
Job ID:
1207
Job Type:
Dedupe
Phase:
1
Report:
Dedupe job report:{
Start time = 2020-Apr-02:02:34:40
End time = 2020-Apr-02:06:32:08
Iteration count = 3
Scanned blocks = 1182629476
Sampled blocks = 45504643
Deduped blocks = 528351533
Dedupe percent = 44.676
Created dedupe requests = 34065196
Successful dedupe requests = 33986741
Unsuccessful dedupe requests = 78455
Skipped files = 1195
Previously assessed files = 455
Index entries = 10387523
Index lookup attempts = 7479509
Index lookup hits = 1164297
}
Elapsed time: 14248 seconds
Aborts: 0
Errors: 0
Scanned files: 317
Directories: 179
1 path:
/ifs/f810c
CPU usage: max 194% (dev 4), min 0% (dev 1), avg 121%
Virtual memory size: max 539432K (dev 1), min 441384K (dev 3), avg 504675K
Resident memory size: max 89376K (dev 1), min 22352K (dev 2), avg 55837K
Read: 113141338 ops, 926853840896 bytes (883916.7M)
Write: 175404067 ops, 1436910116864 bytes (1370344.3M)
Other jobs read: 15 ops, 122880 bytes (0.1M)
Other jobs write: 493183 ops, 4040155136 bytes (3853.0M)
Non-JE read: 1043 ops, 8544256 bytes (8.1M)
30. Cost: Reduced Node Requirement
• Storage Ratio: 3 To 1 On Average
• Less Rack Space
• Performance: 3 To 1 SAS Grid Nodes To Isilon Nodes
• Older Systems Tended To Be 1:1 Or 1.5:1 With 12 Core Systems
• Deduplication: Potentially another 20-40% Space Required
• Further Decrease in Storage Cost (Nodes/Disks)
Notas del editor
Hello, my name is Boni Bruno, Chief Solutions Architect for Dell Technologies. I focus on analytics solutions for our UDS products and have developed various collateral around using our storage products with various technologies like Hadoop, Spark, Kafka, ML, running analytics on Isilon in Google Cloud, etc.
I’ve been working extensively on testing SAS GRID with our All-Flash Isilon Systems, specifically our Isilon F800 and F810 models. I recently gave a tech jam session on running SAS GRID with our All-Flash Isilon F800/F810 models with great interest and feedback. I’ve been asked to do a technical deep dive on my testing so that’s exactly what I will be covering in this presentation.
With that said, let’s dive right into the presentation.
SAS has been around for over 40 years with an amazing history and growth as a company, not just financially speak, they also provide a comprehensive suite of analytics products covering business intelligence, advance analytics, data management, predictive analysis and more.
It’s important to understand that SAS is not a relational database, rather SAS provides an interpretive programming language and stores data in proprietary SAS formatted files.
SAS also provides native access to a variety of databases as well as big data platforms like Hadoop.
Nearly every enterprise customer we have is using SAS in one form or another so many of you will likely be engaged to present why Isilon is a good fit for SAS. I highlight why we are a good fit as we progress through this presentation.
So why consider Dell EMC for SAS analytics. At a high level, Dell EMC has provided infrastructure solutions for many SAS customers already and we know our solutions work well, we are also fortunate to hold the #1 market position in converged infrastructure, virtualized data center infrastructure, and both traditional and all-flash storage, this is based IDC reports.
Dell EMC makes numerous storage solutions that has worked well with SAS for example our VMAX and PowerMax products and well as XIO, or VxFlex have been deployed with SAS in the field, but lately we’ve seen customers looking to our scale-out Isilon NAS products to house SAS data. This is the primary reason for me doing a formal performance validation for SAS GRID with Isilon.
[CLICK]
The validation is did focuses on our Isilon all-flash storage systems and why using Isilon all-flash systems with SAS for data storage make sense. I’ll cover design considerations, performance numbers, and some new features introduced with our F810 model and how these new features can benefit SAS customers.
The F810 model is the latest model we have in the F800 series. I’m exciting to say this model has produced some excellent performance results with SAS GRID. Those of you not familiar with our F800 line. These are the all-flash models. All of our F800 series models are 4U in size, you can equip them with 3.8 TB, 7.7TB, or 15.4TB SSD drives. This translates into SAS customers being able to get just under a 1 PB of data storage in a 4U form factor when using the 15.4TB SSD drives.
As with all of our Isilon models, this is a true SCALE-OUT solution providing SAS customers an easy ability to add storage nodes to support more capacity and performance as needed.
What’s unique about this F810 model specifically is that this model comes with a HW acceleration FPGA card that provides in-line data compression. This is a key value proposition to SAS customers as this significantly saves on storage space and increases I/O performance. Again, ,we will get in the details in the up coming slides.
Before we dive into the tested architecture, it’s important to understand the criticality of storage for Analytics and related workloads.
Clearly the massive amounts of data to meet business needs is growing daily in many cases. This has lead our Isilon business unit into developing enhanced compression techniques with our newer products as well as the need to make them more scalable and higher performing than ever before. Isilon clusters can now grow to 244 nodes in a single cluster with a single name space, truly amazing.
Even if a lot of analytics is done in RAM, customers are having to store more and more data as time goes on to shared storage.
BTW – The F810 model I mentioned early now has a new feature to allow our customers the ability to dedup data as needed. I will get into the dedup results later in the presentation.
So let’s talk about SAS GRID. SAS provides a lot of products as mentioned earlier. For our testing, we specifically wanted to test SAS GRID. A typical SAS GRID environment has users that run SAS desktop clients or thin clients or clients can simply ssh into the SAS grid to submit various jobs.
The SAS GRID Resource Manager distributes these jobs across the numerous grid nodes in the SAS GRID network. While these jobs are running, there is a lot of I/O generated for the creation of temporary and staging files as well as I/O going to and from the shared storage environment.
It’s important to understand that SAS refers to this temp/staging environment as the SAS WORK environment and the shared storage environment as the SAS DATA Environment.
An important design consideration and best practice to strictly adhere to is that SAS WORK should always be fast local block storage only, Isilon should never be used for SAS WORK, rather use Isilon for SAS DATA only. If any of you have seen my presentations on using Hadoop with Isilon, putting SAS WORK on Isilon is equivalent to putting Hadoop SCRATCH SPACE on ISILON, you simply never want to do it.
In speaking with D4T4, they recommended not using the Load Sharing Facility for our test lab and performance testing. LSF is not good when you want to control the job spread and ensure repeatable job launches. Not using LSF is a common practice for validation and I/O performance testing as we did in our SAS GRID/Isilon test lab.
With that being said, I can now discuss the test lab systems and network I built for SAS GRID and Isilon. The specific SAS GRID software version tested is version 9.4M6 with both our Isilon F800 and F810 models.
Each SAS Grid Node has 40 cores and 256GB RAM with dual 10GbE connections to the network. Each Isilon node is connected 40GbE to the access network and the private Isilon backend network is also 40GbE.
Note: For 40 cores, you really need 25GbE connections, but I digress.
Testing ranged from using a single SAS compute node in the GRID to scaling up to12 SAS compute nodes in the GRID. The backend Isilon system stayed as a single 4-node chassis as SAS GRID compute nodes increased from 1 to 12.
Before we dive into the tested architecture, it’s important to understand the criticality of storage for Analytics and related workloads.
Clearly the massive amounts of data to meet business needs is growing daily in many cases. This has lead our Isilon business unit into developing enhanced compression techniques with our newer products as well as the need to make them more scalable and higher performing than ever before. Isilon clusters can now grow to 244 nodes in a single cluster with a single name space, truly amazing.
Even if a lot of analytics is done in RAM, customers are having to store more and more data as time goes on to shared storage.
BTW – The F810 model I mentioned early now has a new feature to allow our customers the ability to dedup data as needed. I will get into the dedup results later in the presentation.
Before we dive into the tested architecture, it’s important to understand the criticality of storage for Analytics and related workloads.
Clearly the massive amounts of data to meet business needs is growing daily in many cases. This has lead our Isilon business unit into developing enhanced compression techniques with our newer products as well as the need to make them more scalable and higher performing than ever before. Isilon clusters can now grow to 244 nodes in a single cluster with a single name space, truly amazing.
Even if a lot of analytics is done in RAM, customers are having to store more and more data as time goes on to shared storage.
BTW – The F810 model I mentioned early now has a new feature to allow our customers the ability to dedup data as needed. I will get into the dedup results later in the presentation.
The testing focus areas for this lab environment are as follows:
1. We wanted to see how well the F800’s performed with a multi-user mix workload. I’ll talk about the workload in the next slide. We also wanted to understand the value of using the new in-line compression capabilities that comes with the newer F810 model.
2. Historically speaking, most NFS clusters are deployed with SAS using a ratio of 1 SAS compute node to one NFS storage node. We wanted to see if we can increase this ratio to 2 to 1 or even 3 to 1 using the F810 without decreasing job runtimes as SAS compute nodes increased.
3. SAS offers software compression, we wanted to see what happens when you add Isilon’s HW compression and the benefits in performance and space savings.
4. We also wanted to see the effectiveness of Isilon’s new dedup feature.
Lastly, if Isilon performs well in these four areas of focus then the overall TCO will be better for our SAS customers and we want happy SAS customers.
Now I had the option to just do a basic SAS benchmark with Isilon, but instead I decided to engage our go to SAS partner D4t4. D4T4 has ex-SAS employees on staff with over 20 years of SAS experience. They developed an comprehensive multiuser analytics workload representative of various jobs typically run by our financial services customers.
In working with the senior SAS architects at D4T4, we were able to simulate a lot of users submitting real-world mix workloads to stress test the storage I/O environment which is a top concern for many SAS customers. I’ll get into more details in upcoming slides.
At a high level, SAS requires certain I/O performance for SAS GRID. Specifically SAS GRID Sizing guidelines specify a total I/O per CPU core to be in the range of 100-150 MBps. This is divided among the data sources and targets, namely SAS WORK, SAS DATA, and other network connections pertaining to database connectivity, streams, etc.
The DATA on DISK, also referred to as SAS DATA, shown here in purple is were Isilon fits in. Many of our customers may have petabytes of SAS DATA consisting of long term project tables and storage performance and scalability is vital. SAS DATA represents 40-50% of the overall I/O requirement on average for SAS.
As I mentioned earlier, SAS WORK should never be on Isilon, SAS WORK represents 40-60% of the overall I/O and typically will leverage local NVMe or high speed fiber connected storage. The other network traffic makes up the rest of the I/O percentage and typically is in the 10-20% range.
The key thing to remember here is that SAS wants around 150MBps of sustained read and write throughput per core. I’ll get into what we were able to sustain with our deployed SAS GRID using Isilon shortly.
As far as running the D4T4 workload, it was easy running batches of the workload on each node as we scaled up the number of SAS nodes. Each batch was executed on each grid SAS compute node and the results were recorded to determine if repeatable and predictable I/O throughput can be achieved with Isilon.
You we see in upcoming slides we actually achieved that with no problem.
.
Each batch consisted of 33 SAS programs and each batch had 1.3 TB of uncompressed SAS input dataset. We didn’t create an RDBMS which is normal, but by not having an RDBMS which would typically offload some of the I/O away from Isilon, this method actually put more I/O load on Isilon which is good and is a point in our favor.
The 33 jobs were launched through a script. As jobs run it uses that scratch area called SAS WORK for temp storage on the PowerEdge 730’s for combines, sorts, merges data that comes from Isilon to local SAS WORK and the output data goes back to Isilon. This is pretty normal for bank environments were users pull data from various sources and work heaving in the SAS work environment then put it back on the permanent storage which is perfect for our scale-out all-flash Isilon nodes.
For those of you familiar with SAS, the 33 jobs simulate everything from a modeler to a report user that comes in and out over a period of time to someone doing an ETL inbound data build with an analytics table, the code does some sorts, merges, and other common things you find with SAS analytics.
The workload manipulates data to running logistic regressions, jobs blows through files, merges, and sorts, a majority of the jobs was sequential but some of the jobs did some random reads and writes from Isilon.
BTW, the data generation was patterned after a SAS modeler that works at financial services organization. Again, many kudos to D4T4 for providing this dataset and workload scripts, it made the testing much more comprehensive and representative of actual SAS production environments.
We have a joint webinar coming up on May 19th, our marketing teams should be sending out registration links to that event next week. So keep a lookout for that.
This slide shows how the jobs were launched.
The key thing I want to point out here is that when SAS read and writes data, it uses predefined block sizes. On average using a block size of 64K, 128K, or 256K is typical in production environments as databases aren’t streaming large amounts of data.
So unlike Hadoop workloads where Isilon is configured to use 128MB or 256MB block sizes over HDFS typically, with SAS 128KB or 256KB block sizes is much more common with NFS.
SAS GRID nodes typically have from 8 to 12 CPU cores when with dual 10 GbE configurations. So going by SAS recommended guidelines of 150MBps/CPU core, a 12 core node needs a total of 1.8Gps of sustained I/O throughput.
Recall that ~50% of the I/O comes from SAS WORK, ~40% of the I/O comes from SAS DATA, i.e. Isilon, and the remainder of the I/O comes from database connections.
Based on that, a 12 node SAS Grid with 12 CPU cores each would need a aggregate sustained R+W throughput of ~GBps from Isilon or ~720 MBps of sustained read and write throughput per SAS node from Isilon.
[CLICK]
Isilon was able to maintain 9GBps with our deployed 12 node SAS GRID with the small block sizes. When D4T4 saw this they were very happy. They have sold H500’s in the past to SAS customers. Based on these results, D4T4 is not moving to looking to use F810’s with H5600’s as a recommended storage architecture for SAS customers moving forward. This is great news coming from SAS experts who live and breathe SAS analytics day in and day out. Note: The CPU utilization on Isilon never exceeded 70% during all this testing. There is still room to grow, but I recommend not going beyond a 3 SAS compute to 1 Isilon node ratio.
This slide talks about the testing methodology.
I’m writing a white paper with Tom Keefer from D4T4 covering all our SAS Grid testing and findings with using Isilon for SAS DATA.
The white paper will be available by the end of this month.
This is really an exciting slide.
Before I get into the results, it’s important to note that SAS users or analytics people in general don’t care about Gigabytes or I/O throughput, they just care about the time it takes to run their SAS jobs.
It’s funny that In many cases they don’t even know the size of their datasets, rather they know how many billions of records are in a table or how wide their tables are. Keeping that in mind, this slide covers the response time for a batch run of 33 jobs and how the runtimes varied with SAS DATA being on an Isilon F800 with no SAS compression, an F800 with SAS compression, and our newer F810 model that provides HW compression along with the SAS compression.
SAS compression is software based and customers will typically have this turned on. The software compressions does but a little more load on the compute node but it also sends less traffic over the network which benefits shared storage solutions like Isilon. When you add in the hw compression capabilities of the F810, you can see the sum of all the runtimes decreased from 12 hours and 10 min with SAS SW based compression to 7 hours 21min when using SAS SW compression with the F810 HW compression. That’s a 40% decrease which is fantastic!
From an individual user perspective, the average individual job runtime with SAS compression was 22min, this when down to 13min when using the F810, again a 40% decrease in runtime. Our SAS partner D4T4 are very happy with these results.
Our next focus area is scalability. SAS GRID customers shrink and grow their GRID SIZES all the time, when you grow the SAS GRID at peak times to deal with end of month jobs and the like what’s critical is to get predictable runtimes as you scale.
[CLICK]
If you look at the 3rd column in this table, you can see we grew the grid size from 1 to 2 to 4 to 8 and finally to 12 SAS GRID nodes.
[CLICK]
Correspondingly the mix workload increased from 33 jobs to 66, to 132, to 264, and finally 396 simultaneous jobs.
As we scaled the SAS nodes and aggregate job count, we recorded both the
[CLICK] average job runtimes and [CLICK] max job runtimes. We never increased the Isilon node count during testing.
Historically speaking, we typically recommend having a ratio of one Isilon storage node to one SAS compute node. This is typical with our H500 models that we have deployed in the field.
[CLICK]
However, with the F810 results , for the first time ever, I can confidently say the F810 easily breaks the 1 to 1 ratio barrier. As the results show, the run times stayed consistent as we scaled from 33 jobs running on 1 SAS node to 396 jobs running across 12 SAS nodes in the grid while using just a single F810 4-node chassis. Absolutely Beautiful! Our SAS integration partner is now looking to standardize on F810 models for SAS deployments in the financial services sector.
During the 12 node SAS GRID workload testing I took Isilon statistics at different points of the testing to make sure the I/O distribution on Isilon stayed even.
[CLICK]
Here you can see a nice even distribution across the 4 nodes in the F810 chassis.
[CLICK]
This remained consistent.
[CLICK]
Throughout the various batch job runs.
[CLICK]
What was very interesting was the fact that the CPU utilization peaked at only 68% under the 12 node testing with 365 simultaneous jobs running. This means the single f810 still have room to support more load. Again, very pleased with these results!
Using NMON we can see the I/O throughput for both SAS WORK and SAS DATA on individual nodes as we scaled up the SAS nodes.
The chart on the left shows the I/O throughput on SAS node 2 when the GRID size had 2 nodes with a single batch run. NMON shows both the NFS traffic and the local I/O traffic. You can see that the mix workload generates a lot of I/O traffic for both SAS WORK and SAS DATA.
The chart on the right shows the I/O throughput on the same SAS node 2 when the GRID size was increased to 12 nodes.
What is nice here is that both NMON graphs show similar I/O patterns which mean consistent i/o throughput, if the I/O subsystem had problems you would see longer run times, weird I/O wait times, but that wasn’t the case, we had a good balanced system here with consistent i/o patterns.
This slide shows the CPU utilization on a sas node with a single batch run when using the F800 with no compression, when using the F800 with SAS compression, and when using the SAS compression combined with the HW compression of the F810.
The patterns are similar but notice how the runtimes are better with the f810.
There is some wait time shown and that’s because of SAS WORK and the 10GbE nics, the local drives were spinning disks and not NVMe drives and 40 core systems really should be using 25GbE nics, but overall the value of the F810 can easily be observed here which is good.
This slide just provides more technical evidence that both the F800 and the F810 scale well as the SAS nodes increase , in this case we are highlighting the results of a specific Bank2 job. For these kind of workloads, the HW compression of the F810 clearly makes a difference.
All the input data for the workload was ~ 1.4TB uncompressed. Most SAS customers will have SAS SW compression turn on so the input dropped down to ½ a TB with SAS SW compression as a lot of analytic tables compress really well. But when you add the HW compression of the F810, the input data was further reduced to 149GB providing roughly a 3:1 compression ratio over using just SAS SW compression which is very good.
You have the ability with SAS to turn on SW compression on and off and it’s common to experiment with compression when doing say ETL jobs. Again, a lot of analytic tables are very repetitive and compress well, whether you get 2 to 1 or 3 to 1 or 4 to 1 will vary from customer to customer, but overall we are very pleased with these results.
Here we show details of the compression results on the output data from the job runs highlighted here in the last column of the table, due to running several merges and sorts on the data, even with SAS compression turn on, the output data grew to 748GB, but when adding the HW compression of the F810, the output data was significantly lower at 119GB, that an 84% reduction in output data. This was checked three times as this was really impressive. Again, results will vary from SAS customer to SAS customer, but this is very promising.
One thing I noticed is that SAS SW compression can sometimes increase runtimes of some jobs. If you notice this with some of your jobs, just turn off SAS SW compression for those specific jobs.
Note: This has nothing to do with Isilon, this is just a SAS thing and SAS is aware of it, again the good news is that you can turn SAS compression off on individual jobs.
The F810 also includes the ability to run dedup on the filesystem. In cases where you want to save even more space, you can run a dedup assessment to give you an idea o the space savings you can potential obtain. The left side of this slide shows an example output of the dedup assessment job and the right side shows the results of an actual dedup job run on Isilon OneFS.
I just chose a sample SAS node to see the impact of running dedup, if you look at the top right upper corner of this slide you can see that prior to running dedup, the multi-user directory used 4.6TB of space, after the dedup, this went down to 2.7TB which is a reduction of 41%.
In summary, we are very pleased with the results of our SAS GRID performance testing with the F810. For the first time ever, we were able to observe space savings using compression on average of 3 to 1 and performance gains that allow us to support 3 SAS compute nodes to 1 Isilon storage node. Considering many banks may have 100’s of nodes, this can provide significant cost savings with respect to storage costs. And lastly, the F810 deduplication feature can potentially save an additional 20-40% in storage space further decreasing storage costs.
That concludes the deep dive session.
I’m currently working with D4T4 on the publication of a whitepaper based on this work. This will be available by the end of May 2020.
Thank you.