SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Patrick Stuedi, IBM Research
Running Spark on a High-
Performance Cluster
using RDMA Networking
and NVMe Flash
Hardware Trends
community
target
20172010
1Gbps
50us
10Gbps
20us
100 MB/s
100ms
1000 MB/s
200us
Hardware Trends
2017
10 GB/s
50us
100Gbps
1us
community
target
our
target
20172010
1Gbps
50us
10Gbps
20us
100 MB/s
100ms
1000 MB/s
200us
User-Level APIs
Storage App
HDD
FileSystem
iSCSI
Block Layer
OS Kernel
Network App
NIC
Sockets
Ethernet
TCP/IP
OS Kernel
Storage App
SSD, NIC
SPDK
Network App
NIC
OS KernelRDMA, DPDK
Kernel
Kernel
Traditional / Kernel-based User-Level / Kernel-Bypass
Needed to
achieve
1us RTT!
Remote Data Access
512 B 4 KB 64 KB 1 MB
0
200
400
600
800
1000
1200
NVMe over Fabrics
iSCSI
data size
time[usec]
DRAM NVMe Flash
512 B 4 KB 64 KB 1 MB
0
50
100
150
200
250
RDMA
Sockets
data size
time[usec]
User-level APIs offer 2-10x better performance than
traditional kernel-based APIs
Let's Use it!
Spark
NVMe Flash
DRAM
High-Performance RDMA network
SQL Streaming MLlib GraphX
Performance!
Realtime!
Disaggregation!
Case Study: Sorting in Spark
MapCores ReduceInput Output
read & classify shuffle store
Net Ser
Net Ser
Net Ser
Sort
Sort
Sort
Experiment Setup
●
Total data size: 12.8 TB
●
Cluster size: 128 nodes
●
Cluster hardware:
– DRAM: 512 GB DDR 4
– Storage: 4x 1.2 TB NVMe SSD
– Network: 100GbE Mellanox RDMA
Flash bandwidth
per node matches
network bandwidth
How is the Network Used?
Hardware limit
0
20
40
60
80
100
0 100 200 300 400 500
Throughput[Gbit/s]
Elapsed time (seconds)
What is the Problem?
JVM
Netty
TeraSort
Sorter
Serializer
sockets
Spark
● Spark uses legacy networking and storage APIs: no kernel-bypass
● Spark itself introduces additional I/O layers: Netty, serializer, sorter, etc.
TCP/IP
Ethernet
NIC
filesystem
block layer
iSCSI
SSD
Example: Shuffle (Map)
Buffer
Buffer
Buffer
Sorter
Kryo
BufferedOutputStream
BufferJNI backend
BufferFile System Buffer Cache
Object
Example: Shuffle (Map)
Buffer
Buffer
Buffer
Sorter
Kryo
BufferJNI
BufferFS
Object
Stream
Example: Shuffle (Map+Reduce)
Buffer
Buffer
Buffer
Sorter
Kryo
BufferJNI
BufferFS
Object
Stream
Buffer
Buffer Buffer SocketBuffer
Netty
Buffer
Buffer
Kryo
Object
network
Sorter
Example: Shuffle (Map+Reduce)
Buffer
Buffer
Buffer
Sorter
Kryo
BufferJNI
BufferFS
Object
Stream
Buffer
Buffer Buffer SocketBuffer
Netty
Buffer
Buffer
Kryo
Object
network
Sorter
Difficult at
100 Gbps!
How can we fix this...
● Not just for shuffle
– Also for broadcast, RDD transport, inter-job sharing, etc.
● Not just for RDMA and NVMe hardware
– But for any possible future high-performance I/O hardware
● Not just for co-located compute/storage
– Also for resource disaggregation, heterogeneous resource distribution, etc.
● Not just improve things
– Make it perform at the hardware limit
The CRAIL Approach
Example: Crail Shuffle (Map)
Buffer
Crail Serializer
CrailOutputStream
Object
local
DRAM
remote
DRAM
local
Flash
remote
Flash
memcpy RDMA NVMe NVMeF
No
buffering
Example: Crail Shuffle (Reduce)
Buffer
Crail Serializer
CrailInputStream
local
DRAM
remote
DRAM
local
Flash
remote
Flash
memcpy RDMA NVMe NVMeF
No
buffering
Object
Example: Crail Shuffle (Reduce)
Buffer
Crail Serializer
CrailInputStream
local
DRAM
remote
DRAM
local
Flash
remote
Flash
memcpy RDMA NVMe NVMeF
No
buffering
Object
Similar for
broadcast,
Input/output,
RDD, etc.
Performance: Configuration
●
Experiments
– Memory-only: Broadcast, GroupBy, Sorting, SQL
– Memory/Flash: disaggregation, tiering
●
Cluster size: 8 nodes, except Sorting: 128 nodes
●
Cluster hardware:
– DRAM: 512 GB DDR 4
– Storage: 4x 1.2 TB NVMe SSD
– Network: 100GbE Mellanox RDMA
●
Spark 2.1.0, Alluxio 1.4, Crail 1.0
●
Hadoop.tmp.dir: RamFS for microbenchmarks, flash SSD for workloads
Spark Broadcast
val bcVar = sc.Broadcast(new Array[Byte](128))
sc.parallelize(1 to tasks, tasks).map(_ => {
bcVar.value.length
}).count
Spark
driver
Spark
Executor
/spark/broadcast/broadcast_0 Crail
Store
Broadcast writing
RPC+2xRDMA+RPC
CDF
Broadcast reading
RPC+RDMA1
2
3
4
Spark GroupBy (80M keys, 4K)
val pairs = sc.parallelize(1 to tasks, tasks).flatmap(_ => {
var values = new array[(Long,Array[Byte])](numKeys)
values = initValues(values)
}).cache().groupByKey().count()
Spark
Exec0
/shuffle/1/ f0 f1 fN
/shuffle/2/ f0 f1 fN
/shuffle/3 f0 f1 fN
Spark
Exec1
Spark
ExecN
Spark
Exec0
Spark
Exec1
Spark
ExecN
hash hash hash1
2
map: Hash
Append (file)
reduce: fetch
Bucket (dir)
Crail
Spark/
Vanilla
5x
2.5x
2x
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100 110 120
Throughput(Gbit/s)
Elapsed time (seconds)
1 core
4 cores
8 cores
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100 110 120
Throughput(Gbit/s)
Elapsed time (seconds)
1 core
4 cores
8 cores
Spark/
Crail
Sorting 12.8 TB on 128 nodes
Spark Spark/Crail
0
200
400
600
reduce
map
Spark/Crail Network UsageSorting Runtime
Task ID
Throughput(Gbit/s)
6x
time(sec)
How fast is this?
Spark/Crail Winner 2014 Winner 2016
Size (TB) 12.8 100 100
Time (sec) 98 1406 134
Total cores 2560 6592 10240
Network HW (Gbit/s) 100 10 100
Rate/core (GB/min) 3.13 0.66 4.4
Native C
distributed
sorting
benchmark
Spark
www.sortingbenchmark.org
Sorting rate of
Crail/Spark only 27%
slower than rate of
Winner 2016
Spark SQL JoinSpark
Exec0
/shuffle/1/ f0 f1 fN
/shuffle/2/ f0 f1 fN
/shuffle/3 f0 f1 fN
Spark
Exec1
Spark
ExecN
hash hash hash
1
2
map: Hash
Append (file)
val ds1 = sparkSession.read.parquet(“...”) //64GB dataset
val ds2 = sparkSession.read.parquet(“...”) //64GB dataset
val resultDS = ds1.joinWith(ds2, ds1(“key”) === ds2(“key”)) //~100GB
resultDS.write.format(“parquet”).mode(SaveMode.Overwrite).save(“...”)
parquet parquet parquet
read from
Hdfs/crail/alluxio
deser
sort
join
write
sort
join
write
sort
join
write
deser deser
3 Sort
Merge join
hdfs/crail/alluxio
4
CrailStore
0
10
20
30
40
50
output
join
hash
input
Time(sec)
Spark/
Alluxio
Spark/
Crail
Storage Tiering: DRAM & NVMe
Spark
Exec0
Spark
Exec1
Spark
ExecN
Memory Tier
Flash
Tier
Network
Input/Output/
Shuffle/
Spark
With Crail, replacing memory with NVMe flash
for storing input/output/shuffle reduces the 200GB sorting runtime by only 48%
0
20
40
60
80
100
120
Reduce
MapVanilla
Spark
(100%
in-memory)
memory/flash ratio
Time(sec)
sorting runtime
DRAM & Flash Disaggregation
Spark
Exec1
Spark
ExecN
Using Crail, a Spark 200GB sorting workload can be run with
memory and flash disaggregated at no extra cost
Spark
Exec0
storage
nodes
compute
nodes
0
40
80
120
160
200
Reduce
Map
Crail/
DRAM
Crail/
NVMe
Vanilla/
Alluxio
Input/Output/
Shuffle
Time(sec)
sorting runtime
Conclusions
●
Effectively using high-performance I/O hardware in
Spark is challenging
●
Crail is an attempt to re-think how data processing
frameworks (not only Spark) should interact with
network and storage hardware
– User-level I/O, storage disaggregation, memory/flash convergence
●
Spark's modular architecture allows Crail to be used
seamlessly
Crail for Spark is Open Source
●
www.crail.io
●
github.com/zrlio/spark-io
●
github.com/zrlio/crail
●
github.com/zrlio/parquetgenerator
●
github.com/zrlio/crail-terasort
Thank You.
Contributors:
Animesh Trivedi, Jonas Pfefferle, Michael Kaufmann, Bernard
Metzler, Radu Stoica, Ioannis Koltsidas, Nikolos Ioannou,
Christoph Hagleitner, Peter Hofstee, Evangelos Eleftheriou, ...

Más contenido relacionado

La actualidad más candente

Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedis Labs
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkIlya Ganelin
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in SparkDatabricks
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 

La actualidad más candente (20)

Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in Spark
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 

Similar a Running Spark on High-Performance Hardware using Crail

Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed_Hat_Storage
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryDatabricks
 
Experiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerExperiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerJomaSoft
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...Chris Fregly
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data DeduplicationRedWireServices
 
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Databricks
 
Offloading for Databases - Deep Dive
Offloading for Databases - Deep DiveOffloading for Databases - Deep Dive
Offloading for Databases - Deep DiveUniFabric
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3UniFabric
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail Internet World
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014Philippe Fierens
 
Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006Sal Marcus
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfRedis Labs
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseDatabricks
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 

Similar a Running Spark on High-Performance Hardware using Crail (20)

Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
 
Experiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerExperiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 Server
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
 
Offloading for Databases - Deep Dive
Offloading for Databases - Deep DiveOffloading for Databases - Deep Dive
Offloading for Databases - Deep Dive
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
 
Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConf
 
Meetup Oracle Database MAD_BCN: 4 Saborea Exadata
Meetup Oracle Database MAD_BCN: 4 Saborea ExadataMeetup Oracle Database MAD_BCN: 4 Saborea Exadata
Meetup Oracle Database MAD_BCN: 4 Saborea Exadata
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Último (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Running Spark on High-Performance Hardware using Crail

  • 1. Patrick Stuedi, IBM Research Running Spark on a High- Performance Cluster using RDMA Networking and NVMe Flash
  • 4. User-Level APIs Storage App HDD FileSystem iSCSI Block Layer OS Kernel Network App NIC Sockets Ethernet TCP/IP OS Kernel Storage App SSD, NIC SPDK Network App NIC OS KernelRDMA, DPDK Kernel Kernel Traditional / Kernel-based User-Level / Kernel-Bypass Needed to achieve 1us RTT!
  • 5. Remote Data Access 512 B 4 KB 64 KB 1 MB 0 200 400 600 800 1000 1200 NVMe over Fabrics iSCSI data size time[usec] DRAM NVMe Flash 512 B 4 KB 64 KB 1 MB 0 50 100 150 200 250 RDMA Sockets data size time[usec] User-level APIs offer 2-10x better performance than traditional kernel-based APIs
  • 6. Let's Use it! Spark NVMe Flash DRAM High-Performance RDMA network SQL Streaming MLlib GraphX Performance! Realtime! Disaggregation!
  • 7. Case Study: Sorting in Spark MapCores ReduceInput Output read & classify shuffle store Net Ser Net Ser Net Ser Sort Sort Sort
  • 8. Experiment Setup ● Total data size: 12.8 TB ● Cluster size: 128 nodes ● Cluster hardware: – DRAM: 512 GB DDR 4 – Storage: 4x 1.2 TB NVMe SSD – Network: 100GbE Mellanox RDMA Flash bandwidth per node matches network bandwidth
  • 9. How is the Network Used? Hardware limit 0 20 40 60 80 100 0 100 200 300 400 500 Throughput[Gbit/s] Elapsed time (seconds)
  • 10. What is the Problem? JVM Netty TeraSort Sorter Serializer sockets Spark ● Spark uses legacy networking and storage APIs: no kernel-bypass ● Spark itself introduces additional I/O layers: Netty, serializer, sorter, etc. TCP/IP Ethernet NIC filesystem block layer iSCSI SSD
  • 14. Example: Shuffle (Map+Reduce) Buffer Buffer Buffer Sorter Kryo BufferJNI BufferFS Object Stream Buffer Buffer Buffer SocketBuffer Netty Buffer Buffer Kryo Object network Sorter Difficult at 100 Gbps!
  • 15. How can we fix this... ● Not just for shuffle – Also for broadcast, RDD transport, inter-job sharing, etc. ● Not just for RDMA and NVMe hardware – But for any possible future high-performance I/O hardware ● Not just for co-located compute/storage – Also for resource disaggregation, heterogeneous resource distribution, etc. ● Not just improve things – Make it perform at the hardware limit
  • 17. Example: Crail Shuffle (Map) Buffer Crail Serializer CrailOutputStream Object local DRAM remote DRAM local Flash remote Flash memcpy RDMA NVMe NVMeF No buffering
  • 18. Example: Crail Shuffle (Reduce) Buffer Crail Serializer CrailInputStream local DRAM remote DRAM local Flash remote Flash memcpy RDMA NVMe NVMeF No buffering Object
  • 19. Example: Crail Shuffle (Reduce) Buffer Crail Serializer CrailInputStream local DRAM remote DRAM local Flash remote Flash memcpy RDMA NVMe NVMeF No buffering Object Similar for broadcast, Input/output, RDD, etc.
  • 20. Performance: Configuration ● Experiments – Memory-only: Broadcast, GroupBy, Sorting, SQL – Memory/Flash: disaggregation, tiering ● Cluster size: 8 nodes, except Sorting: 128 nodes ● Cluster hardware: – DRAM: 512 GB DDR 4 – Storage: 4x 1.2 TB NVMe SSD – Network: 100GbE Mellanox RDMA ● Spark 2.1.0, Alluxio 1.4, Crail 1.0 ● Hadoop.tmp.dir: RamFS for microbenchmarks, flash SSD for workloads
  • 21. Spark Broadcast val bcVar = sc.Broadcast(new Array[Byte](128)) sc.parallelize(1 to tasks, tasks).map(_ => { bcVar.value.length }).count Spark driver Spark Executor /spark/broadcast/broadcast_0 Crail Store Broadcast writing RPC+2xRDMA+RPC CDF Broadcast reading RPC+RDMA1 2 3 4
  • 22. Spark GroupBy (80M keys, 4K) val pairs = sc.parallelize(1 to tasks, tasks).flatmap(_ => { var values = new array[(Long,Array[Byte])](numKeys) values = initValues(values) }).cache().groupByKey().count() Spark Exec0 /shuffle/1/ f0 f1 fN /shuffle/2/ f0 f1 fN /shuffle/3 f0 f1 fN Spark Exec1 Spark ExecN Spark Exec0 Spark Exec1 Spark ExecN hash hash hash1 2 map: Hash Append (file) reduce: fetch Bucket (dir) Crail Spark/ Vanilla 5x 2.5x 2x 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 100 110 120 Throughput(Gbit/s) Elapsed time (seconds) 1 core 4 cores 8 cores 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 100 110 120 Throughput(Gbit/s) Elapsed time (seconds) 1 core 4 cores 8 cores Spark/ Crail
  • 23. Sorting 12.8 TB on 128 nodes Spark Spark/Crail 0 200 400 600 reduce map Spark/Crail Network UsageSorting Runtime Task ID Throughput(Gbit/s) 6x time(sec)
  • 24. How fast is this? Spark/Crail Winner 2014 Winner 2016 Size (TB) 12.8 100 100 Time (sec) 98 1406 134 Total cores 2560 6592 10240 Network HW (Gbit/s) 100 10 100 Rate/core (GB/min) 3.13 0.66 4.4 Native C distributed sorting benchmark Spark www.sortingbenchmark.org Sorting rate of Crail/Spark only 27% slower than rate of Winner 2016
  • 25. Spark SQL JoinSpark Exec0 /shuffle/1/ f0 f1 fN /shuffle/2/ f0 f1 fN /shuffle/3 f0 f1 fN Spark Exec1 Spark ExecN hash hash hash 1 2 map: Hash Append (file) val ds1 = sparkSession.read.parquet(“...”) //64GB dataset val ds2 = sparkSession.read.parquet(“...”) //64GB dataset val resultDS = ds1.joinWith(ds2, ds1(“key”) === ds2(“key”)) //~100GB resultDS.write.format(“parquet”).mode(SaveMode.Overwrite).save(“...”) parquet parquet parquet read from Hdfs/crail/alluxio deser sort join write sort join write sort join write deser deser 3 Sort Merge join hdfs/crail/alluxio 4 CrailStore 0 10 20 30 40 50 output join hash input Time(sec) Spark/ Alluxio Spark/ Crail
  • 26. Storage Tiering: DRAM & NVMe Spark Exec0 Spark Exec1 Spark ExecN Memory Tier Flash Tier Network Input/Output/ Shuffle/ Spark With Crail, replacing memory with NVMe flash for storing input/output/shuffle reduces the 200GB sorting runtime by only 48% 0 20 40 60 80 100 120 Reduce MapVanilla Spark (100% in-memory) memory/flash ratio Time(sec) sorting runtime
  • 27. DRAM & Flash Disaggregation Spark Exec1 Spark ExecN Using Crail, a Spark 200GB sorting workload can be run with memory and flash disaggregated at no extra cost Spark Exec0 storage nodes compute nodes 0 40 80 120 160 200 Reduce Map Crail/ DRAM Crail/ NVMe Vanilla/ Alluxio Input/Output/ Shuffle Time(sec) sorting runtime
  • 28. Conclusions ● Effectively using high-performance I/O hardware in Spark is challenging ● Crail is an attempt to re-think how data processing frameworks (not only Spark) should interact with network and storage hardware – User-level I/O, storage disaggregation, memory/flash convergence ● Spark's modular architecture allows Crail to be used seamlessly
  • 29. Crail for Spark is Open Source ● www.crail.io ● github.com/zrlio/spark-io ● github.com/zrlio/crail ● github.com/zrlio/parquetgenerator ● github.com/zrlio/crail-terasort
  • 30. Thank You. Contributors: Animesh Trivedi, Jonas Pfefferle, Michael Kaufmann, Bernard Metzler, Radu Stoica, Ioannis Koltsidas, Nikolos Ioannou, Christoph Hagleitner, Peter Hofstee, Evangelos Eleftheriou, ...