Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Orchestrate a Data Symphony
Speaker:
Haoyuan Li, Alluxio
For more Alluxio events: https://www.alluxio.io/events/
8. Endless requirements
More opportunities come with more challenges
How to
modernize my
data infra to
Cloud?
Why can’t we
also support
Presto for
querying?
Why can’t I
train my model
on a public
cloud?
Job is taking
forever, can’t
you add more
resource?
How do I access
remote HDFS
data in Google
Dataproc?
…
9. HDFS
HIVE
HDFS
Spark
NFS
TENSOR
FLOW
DATA IN DISPARATE STORAGE SYSTEMS
OBJECT
STORE
PRESTO
COMPUTE SPREAD ACROSS MANY DIFFERENT FRAMEWORKS
WAN
HDFS
WAN
S3
Spark
AZURE
PRESTO
Data silos across data centers, regions, cloudsComplex. Error prone. Time consuming.
18. A Data Orchestration Approach
HDFS
HIVE Spark
NFS
TENSOR
FLOW
DATA IN DISPARATE STORAGE SYSTEMS
PREST
O
COMPUTE SPREAD ACROSS MANY DIFFERENT FRAMEWORKS
S3
SPARK
DATA
ORCHESTRATION
DATA
ORCHESTRATION
DATA
ORCHESTRATION
DATA
ORCHESTRATION
DATA
ORCHESTRATION
ANY
DATA
APP
DATA
ORCHESTRATION
19. Alluxio – An Open Source Implementation of Data Orchestration
Intelligent
Caching
Data
Management
Global
Namespace