Timely genome analysis requires a fresh approach to platform design for big data problems. Louisiana State University has tested enterprise cluster deployments of Redis with a unique solution that allows flash memory to act as extended RAM. Learn about how this solution allows large amounts of data to be handled with a fraction of the memory needed for a typical deployment.
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry Leatherland, IBM
1. Revolutionizing the Datacenter
Join the Conversation #OpenPOWERSummit
Accelerating Genome Assembly with
Power8
Seung-Jong Park, Ph.D.
School of EECS, CCT, Louisiana State University
Join the Conversation #OpenPOWERSummit
2. Agenda
The Genome Assembly Problem
Accelerating Graph Construction with POWER8
Accelerating Graph Simplification with IBM CAPI®
Flash and Redis NoSQL database.
25/8/2016
7. Experimental Test Beds
75/8/2016
System Type IBM PKY Cluster LSU SuperMikeII
Processor Two 10-core IBM Power8 Two 8-core Intel SandyBridge Xeon
Maximum #Nodes used in various
experiments
40 120
#Physical cores/node 20 (8 Simultaneous Multi-Thread) 16 (Hyper threading disabled)
#vcores/node 160 16
RAM/node (GB) 256 32
#Disks/node 5 3
#Disks/node used for shuffled data 3 1
Total Storage space/node used for shuffled
data
1.8 0.5
Network 56Gbps InfiniBand (non-blocking) 40Gbps InfiniBand (2:1 blockings)
8. Datasets
85/8/2016
Genome data set Input size Shuffle data
size
Output size
Rice genome 12GB 70GB 50GB
Bumble bee genome 90GB 600GB 95GB
Metagenome 3.2TB 20TB 8.6TB
Input data set to stage 2 Key-value Stores
With Redis NoSql and IBM Power8-CAPI -Flash
10. Hadoop Scalability with POWER8 SMTs
Tested with small size rice genome data on 2 node
Almost linear scalability with increasing SMTs
105/8/2016
11. Rice Genome
Analyzing small size (12GB) data
Eliminate the impact of network and disk I/O
7.5X performance improvement per server
115/8/2016
12. Bumble Bee Genome
Analyzing Medium size (90GB) Bumble Bee genome
7.5x improvement in terms of Performance/server
125/8/2016
13. Metagenome Stage 1
Analyzing huge (3.2TB) metagenome data
Only 6.5 hours on 40-node IBM Power8 cluster
More than 9x improvement in terms of performance
per server
135/8/2016
14. IBM Data Engine for NoSQL
Performance and Value
Stage 2 Requires Large Memory access that isn’t readily available via
traditional compute processing.
15. Custom
Hardware
Application
POWER8
CAPP
Coherence Bus
PSL
FPGA or ASIC
Customizable Hardware
Application Accelerator
• Specific system SW, middleware, or user application
• Written to durable interface provided by PSL
POWER8
PCIe Gen 3
Transport for encapsulated messages
Processor Service Layer (PSL)
• Present robust, durable interfaces to applications
• Offload complexity / content from CAPP
Virtual Addressing
• Accelerator can work with same memory addresses that the processors use
• Pointers de-referenced same as the host application
• Removes OS & device driver overhead
Hardware Managed Cache Coherence
• Enables the accelerator to participate in “Locks” as a normal thread Lowers
latency over IO communication model
POWER8 CAPI (Coherent Accelerator Processor
Interface)
16. Redis Labs Exploits the IBM Data Engine for NoSQL
Redis stores key-value pairs
• Key-value pairs may be variable size, in any
format (Text, Document, JPEG, Video, etc.)
Basic operations are “SET” and “GET”
> SET 100001 “CAPI is Fast”
> GET 100001
“CAPI is Fast”
> ...
Database Characteristics
• 90 GB MAX Capacity, up to 10 GB RAM, and 80 GB Flash
• key-value pairs are 1,000 bytes of random data
• DB filled with ~50GB of data (42.5 million keys)
Client Characteristics
• 288 clients, randomly issuing Redis GETs or SETs
• ~50% of keys from RAM, ~50% from CAPI-Accelerated Flash
Demo System:
• IBM Power System S812L
• 1 POWER8 Socket
• 2 IBM DataEngine for NoSQL CAPI Accelerators
• 1 FlashSystem 840
• Ubuntu 14.10
• Redis Labs Enterprise Cluster (Beta)
Set Key = Value
Retrieve Key
10Gb Uplinks
Power8 Server
Flash Array w/ up
To 56TB
Demonstration Platform
(POWER8 + CAPI Flash)
Infrastructure Attributes
- up to 192 threads in 2U Server drawer
- up to 56 TB of memory based Flash per 2U Drawer
- Shared Memory & Cache for dynamic tuning
WWW
OpenPower Partner Redis Labs’s highly-differentiated product
offering built on CAPI is available today.
Demo Link
17. IBM Data Engine for NoSQL + Redis Labs Value
Built on Open APIs
• Leverages IBM DataEngine for NoSQL APIs
Redis Labs Enterprise Cluster provides
near Speed of RAM, with the Capacity of
Flash
• Leverages IBM DataEngine for NoSQL CAPI Accelerator for
high-speed, low-latency link to Flash
Controls use of Memory, Flash, and Cost!
• Hot Data Maintained in RAM
• Provides ISPs and MSPs up to 72% Cost Savings
When 80% of Data is in Flash
Redis Labs Enterprise Cluster allows the user to select the ratio of
RAM and flash with a simple slider, when using POWER8 with the
IBM Data Engine for NoSQL.
18. Load Balancer
500GB Cache
Node
10Gb Uplink
POWER8 Server
Flash Array w/ up
to 56TB
Differentiated NoSQL
(POWER8 + FlashSystem with CAPI)
Infrastructure Attributes
- 192 threads in 4U server drawer
- 56 TB of flash per 2U drawer
- Shared Memory & cache for dynamic tuning
- Elimination of I/O and network overhead
- Cluster solution in a box
Today’s NoSQL in memory (x86)
Infrastructure Requirements
- Large distributed (Scale out)
- Large memory per node
- Networking bandwidth needs
- Load balancing
Power CAPI-attached FlashSystem for NoSQL regains
infrastructure control and reigns in the cost to deliver services.
WWW10Gb Uplink
WWW
Backup Nodes
500GB Cache
Node
500GB Cache
Node
500GB Cache
Node500GB Cache
Node
What CAPI Means for NoSQL Solutions
19. Big Redis w/ CAPI Flash Offers New Performance / Cost Points
Users pick the performance / cost point that meets their solution
needs, be it IOPs Rate or Latency requirements.
*typical workload
0% 18% 45% 72% 81%
AverageLatency(ms)
1
5
8
9
10
% Implementation Savings
100% 80% 50% 20% 10%
IOPS at 1 ms Latency
382K 208K 188K 175K
2.5M
366-750K
1.35M
483-950K
671-1250K
IOPS at Max Throughput
DRAM / FLASH Ratio