SlideShare a Scribd company logo
1 of 43
Alluxio (formerly Tachyon):
Unified Namespace and Tiered Storage
Calvin Jia, Jiri Simsa
One of the Things to Watch at
Strata
TechCrunch article:
“… An interesting item that made the top
terms list is “alluxio,” which is the recently
renamed Tachyon project. Alluxio is a virtual
distributed storage system, and it has a
memory-centric architecture that enables
data sharing across clusters at memory
speed. … “
2
Who Are We?
• Calvin Jia
• SWE @ Alluxio, Inc.
• #1 Alluxio contributor
• Twitter: @JiaCalvin
• Jiri Simsa
• SWE @ Alluxio, Inc
• CMU Ph.D. & Google
• Twitter: @jsimsa
3
Alluxio Inc.
• Founded by Alluxio creators and top
committers
• Formerly Tachyon Nexus, Inc.
• $7.5 million Series A by Andreessen Horowitz
• Committed to the Alluxio Open Source
Project
• Company Website: http://www.alluxio.com
4
Outline
• Alluxio Introduction
• Tiered Storage
• Unified Namespace
5
ALLUXIO:
Open Source Memory Speed
Virtual Distributed Storage
6
Memory Speed
• Memory-centric architecture designed for memory I/O
Virtual
• Abstracts persistent storage from applications
Distributed
• Designed to scale with nothing but commodity hardware
Open Source
• One of the fastest growing project communities
7
Contributor Growth
• Over 200 Contributors
– 3x growth over the last year
8
Organizations
• Over 50 Organizations
9
Alluxio Ecosystem
10
Memory is Getting Faster
11
Memory is Getting Cheaper
12
Simple Examples
• Data sharing between frameworks
• Data resilience during application crashes
• Consolidate memory usage and alleviate
GC issues
13
Spark Job
Spark
Memory
block 1
block 3
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
Data Sharing Between Frameworks
Inter-process sharing slowed down by network and/or disk I/O
14
Data Sharing Between Frameworks
Spark Job
Spark Memory
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3 block 4
storage engine &
execution engine
same process
Inter-process sharing can happen at memory speed
15
Data Resilience during Crashes
Spark Task
Spark Memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
Process crash requires network and/or disk I/O to re-read the data
16
Data Resilience during Crashes
Crash
Spark Memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
Process crash requires network and/or disk I/O to re-read the data
17
HDFS / Amazon S3
Data Resilience during Crashes
block 1
block 3
block 2
block 4
Crash
storage engine &
execution engine
same process
Process crash requires network and/or disk I/O to re-read the data
18
Data Resilience during Crashes
Spark Task
Spark Memory
block manager
storage engine &
execution engine
same process
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3 block 4
Process crash only needs memory I/O to re-read the data
19
Data Resilience during Crashes
Crash
storage engine &
execution engine
same process
Process crash only needs memory I/O to re-read the data
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3 block 4
20
HDFS / Amazon S3
Consolidating Memory
Spark Job1
Spark
Memory
block 1
block 3
Spark Job2
Spark
Memory
block 3
block 1
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
Data duplicated at memory-level
21
Consolidating Memory
Spark Job1
Spark mem
Spark Job2
Spark mem
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3 block 4
Data not duplicated at memory-level
22
Case Study: Barclays
Making the Impossible Possible with Tachyon: Accelerate Spark
Jobs from Hours to Seconds
• Application: SparkSQL + Spark RDDs
• Alluxio Storage Layer: MEM
• Backend Storage: None
• Result: Speeding up Spark jobs from hours to seconds
23
Common Questions
– Memory speed sharing among distributed applications
HDFS interface compatible
– GC overhead introduced by in-memory caching
Off-Heap Memory Management
– Data set could be larger than available memory
Tiered storage
24
Outline
• Alluxio Introduction
• Tiered Storage
• Unified Namespace
25
Motivation
• Memory resources are still constrained
• Alluxio data management logic is not
limited to memory
• Storage resources available on compute
clusters
26
Tiered Storage
MEM
SSD
HDD
27
Tiered Storage
• Extends Alluxio with support for SSDs and/or
HDDs storage
• Different tiers have different characteristics
– Keep hot data in fast but limited storage
– Keep warm data in slower but abundant storage
• Workers manage their own storage
• Data allocation and eviction is driven by
application access
28
Tiered Storage Architecture
Machine Type 1
Compute Client
Alluxio Master
Memory, SSD, HDD
Machine Type 2
Compute Client
Alluxio Worker
Memory, SSD, HDD
29
Tiered Storage Architecture
Machine Type 2
Compute Client
• Alluxio Client
Alluxio Worker
• Tiered Block Store
• Evictor
• Allocator
Memory, SSD, HDD
30
Automatic Data Migration
• Data can be evicted to lower layers if it is “cooling down”
• Data can be promoted to upper layers if it is “warming
up”
Evict stale data to
lower tier
Promote hot data to
upper tier
31
Pluggable Policies
• Policies can be customized to suit
workloads
• Defaults provided for general scenarios
• Advanced users can optimize with
additional knowledge
– For example: Optimize for iterations
32
Case Study: Baidu
Baidu Queries Data 30 Times Faster with Alluxio
• Application: Spark
• Alluxio Storage: MEM + HDD
• Backend Storage: Baidu’s File System
• 200+ nodes deployment, 2PB+ managed space
• Result: Speeding up data querying by 30x
33
Outline
• About Alluxio
• Tiered Storage
• Unified Namespace
34
Big Data Ecosystem
35
Big Data Ecosystem
36
Big Data Ecosystem
37
Motivation
• At large organizations, data spans many storage
systems (object storage, network / distributed file
systems, DBs)
• Application logic needs to integrate with different types
of storage systems
• Data needs to be moved around to work around
application limitations
• In-house storage layers are built to address limitations
of legacy storage systems
38
Transparent Naming
• Applications can transparently and efficiently interact
with remote storage through Alluxio.
• Applications do not need to use different APIs for
interacting with different storage systems.
alluxio://host:port/
data users
reports sales alice bob
s3n://bucket/directory
data users
reports sales alice bob
Alluxio Storage System
39
Single Namespace
• Applications can read and write different storage
systems.
• Decouples data location from application
alluxio://host:port/
data users
reports sales alice bob
hdfs://host:port/
users
alice bob
s3n://bucket/directory
reports sales
Alluxio Storage System A
Storage System B
40
Architecture
Alluxio Interface
UFS Interface
HDFSS3 Swift …
S3
adapter
Swift
adapter
HDFS
adapter ALLUXIO
41
Alluxio Benefits
42
• Enable new workloads across storage systems
• Work with the framework of your choice
• Scale storage and compute independently
Resources
• Alluxio Project: http://www.alluxio.org
• Development: https://github.com/Alluxio/alluxio
• Meet Friends: http://www.meetup.com/Alluxio
• Alluxio Inc: http://www.alluxio.com
• Contact us: info@alluxio.com
43

More Related Content

What's hot

What's hot (20)

Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road AheadAlluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
 
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
 

Viewers also liked

232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현
NAVER D2
 

Viewers also liked (6)

232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현
 
Play node conference
Play node conferencePlay node conference
Play node conference
 
NODE.JS 글로벌 기업 적용 사례 그리고, real-time 어플리케이션 개발하기
NODE.JS 글로벌 기업 적용 사례  그리고, real-time 어플리케이션 개발하기NODE.JS 글로벌 기업 적용 사례  그리고, real-time 어플리케이션 개발하기
NODE.JS 글로벌 기업 적용 사례 그리고, real-time 어플리케이션 개발하기
 
Node.js in Flitto
Node.js in FlittoNode.js in Flitto
Node.js in Flitto
 
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
 
Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존
 

Similar to Alluxio Presentation at Strata San Jose 2016

Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native Era
Alluxio, Inc.
 

Similar to Alluxio Presentation at Strata San Jose 2016 (20)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage System
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native Era
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with AlluxioRunning Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxio
 
Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
 
Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution
 
Improving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioImproving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using Alluxio
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAccelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with Alluxio
 

Recently uploaded

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 

Recently uploaded (20)

Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 

Alluxio Presentation at Strata San Jose 2016

  • 1. Alluxio (formerly Tachyon): Unified Namespace and Tiered Storage Calvin Jia, Jiri Simsa
  • 2. One of the Things to Watch at Strata TechCrunch article: “… An interesting item that made the top terms list is “alluxio,” which is the recently renamed Tachyon project. Alluxio is a virtual distributed storage system, and it has a memory-centric architecture that enables data sharing across clusters at memory speed. … “ 2
  • 3. Who Are We? • Calvin Jia • SWE @ Alluxio, Inc. • #1 Alluxio contributor • Twitter: @JiaCalvin • Jiri Simsa • SWE @ Alluxio, Inc • CMU Ph.D. & Google • Twitter: @jsimsa 3
  • 4. Alluxio Inc. • Founded by Alluxio creators and top committers • Formerly Tachyon Nexus, Inc. • $7.5 million Series A by Andreessen Horowitz • Committed to the Alluxio Open Source Project • Company Website: http://www.alluxio.com 4
  • 5. Outline • Alluxio Introduction • Tiered Storage • Unified Namespace 5
  • 6. ALLUXIO: Open Source Memory Speed Virtual Distributed Storage 6
  • 7. Memory Speed • Memory-centric architecture designed for memory I/O Virtual • Abstracts persistent storage from applications Distributed • Designed to scale with nothing but commodity hardware Open Source • One of the fastest growing project communities 7
  • 8. Contributor Growth • Over 200 Contributors – 3x growth over the last year 8
  • 9. Organizations • Over 50 Organizations 9
  • 11. Memory is Getting Faster 11
  • 12. Memory is Getting Cheaper 12
  • 13. Simple Examples • Data sharing between frameworks • Data resilience during application crashes • Consolidate memory usage and alleviate GC issues 13
  • 14. Spark Job Spark Memory block 1 block 3 Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data Sharing Between Frameworks Inter-process sharing slowed down by network and/or disk I/O 14
  • 15. Data Sharing Between Frameworks Spark Job Spark Memory Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 storage engine & execution engine same process Inter-process sharing can happen at memory speed 15
  • 16. Data Resilience during Crashes Spark Task Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 16
  • 17. Data Resilience during Crashes Crash Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 17
  • 18. HDFS / Amazon S3 Data Resilience during Crashes block 1 block 3 block 2 block 4 Crash storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 18
  • 19. Data Resilience during Crashes Spark Task Spark Memory block manager storage engine & execution engine same process HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Process crash only needs memory I/O to re-read the data 19
  • 20. Data Resilience during Crashes Crash storage engine & execution engine same process Process crash only needs memory I/O to re-read the data HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 20
  • 21. HDFS / Amazon S3 Consolidating Memory Spark Job1 Spark Memory block 1 block 3 Spark Job2 Spark Memory block 3 block 1 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data duplicated at memory-level 21
  • 22. Consolidating Memory Spark Job1 Spark mem Spark Job2 Spark mem HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Data not duplicated at memory-level 22
  • 23. Case Study: Barclays Making the Impossible Possible with Tachyon: Accelerate Spark Jobs from Hours to Seconds • Application: SparkSQL + Spark RDDs • Alluxio Storage Layer: MEM • Backend Storage: None • Result: Speeding up Spark jobs from hours to seconds 23
  • 24. Common Questions – Memory speed sharing among distributed applications HDFS interface compatible – GC overhead introduced by in-memory caching Off-Heap Memory Management – Data set could be larger than available memory Tiered storage 24
  • 25. Outline • Alluxio Introduction • Tiered Storage • Unified Namespace 25
  • 26. Motivation • Memory resources are still constrained • Alluxio data management logic is not limited to memory • Storage resources available on compute clusters 26
  • 28. Tiered Storage • Extends Alluxio with support for SSDs and/or HDDs storage • Different tiers have different characteristics – Keep hot data in fast but limited storage – Keep warm data in slower but abundant storage • Workers manage their own storage • Data allocation and eviction is driven by application access 28
  • 29. Tiered Storage Architecture Machine Type 1 Compute Client Alluxio Master Memory, SSD, HDD Machine Type 2 Compute Client Alluxio Worker Memory, SSD, HDD 29
  • 30. Tiered Storage Architecture Machine Type 2 Compute Client • Alluxio Client Alluxio Worker • Tiered Block Store • Evictor • Allocator Memory, SSD, HDD 30
  • 31. Automatic Data Migration • Data can be evicted to lower layers if it is “cooling down” • Data can be promoted to upper layers if it is “warming up” Evict stale data to lower tier Promote hot data to upper tier 31
  • 32. Pluggable Policies • Policies can be customized to suit workloads • Defaults provided for general scenarios • Advanced users can optimize with additional knowledge – For example: Optimize for iterations 32
  • 33. Case Study: Baidu Baidu Queries Data 30 Times Faster with Alluxio • Application: Spark • Alluxio Storage: MEM + HDD • Backend Storage: Baidu’s File System • 200+ nodes deployment, 2PB+ managed space • Result: Speeding up data querying by 30x 33
  • 34. Outline • About Alluxio • Tiered Storage • Unified Namespace 34
  • 38. Motivation • At large organizations, data spans many storage systems (object storage, network / distributed file systems, DBs) • Application logic needs to integrate with different types of storage systems • Data needs to be moved around to work around application limitations • In-house storage layers are built to address limitations of legacy storage systems 38
  • 39. Transparent Naming • Applications can transparently and efficiently interact with remote storage through Alluxio. • Applications do not need to use different APIs for interacting with different storage systems. alluxio://host:port/ data users reports sales alice bob s3n://bucket/directory data users reports sales alice bob Alluxio Storage System 39
  • 40. Single Namespace • Applications can read and write different storage systems. • Decouples data location from application alluxio://host:port/ data users reports sales alice bob hdfs://host:port/ users alice bob s3n://bucket/directory reports sales Alluxio Storage System A Storage System B 40
  • 41. Architecture Alluxio Interface UFS Interface HDFSS3 Swift … S3 adapter Swift adapter HDFS adapter ALLUXIO 41
  • 42. Alluxio Benefits 42 • Enable new workloads across storage systems • Work with the framework of your choice • Scale storage and compute independently
  • 43. Resources • Alluxio Project: http://www.alluxio.org • Development: https://github.com/Alluxio/alluxio • Meet Friends: http://www.meetup.com/Alluxio • Alluxio Inc: http://www.alluxio.com • Contact us: info@alluxio.com 43

Editor's Notes

  1. Good afternoon everyone, and welcome to the Alluxio features talk. We will give an introduction to Alluxio and specifically go over two fundamental features in Alluxio. By the end of the talk, you should have a good idea as to why we believe Alluxio is qualified as a “Data Innovation”. First, could I get a show of hands of who’s already attended the Alluxio talk early today? Great, you guys will have a lot more insight if you watch the recording of that talk after this one.
  2. I want to start off by introducing us. I’m Calvin, the top contributor to the project I’ve been working on the Alluxio project for a little over 3 years now. I’m currently a software engineer at Alluxio, Inc. Joining me for this talk is my colleague, Jiri. He’s also a software engineer at Alluxio, Inc and has experience working at Google as well as a PhD from CMU. Both of our twitter accounts are here if you want to follow us for the latest news about the project.
  3. I mentioned we are both working at Alluxio Inc, which is a company dedicated to growing and building the Alluxio open source project. We were formerly known as Tachyon Nexus and are backed by A16Z. If you are interested in learning more about us, our company site is alluxio.com. And of course, if we’ve impressed you enough and you want to work with us, we are hiring!
  4. Now let’s dive into the talk. There will be three sections, the first of which is an introduction to Alluxio.
  5. Alluxio – Open Source Memory Speed Virtual Distributed Storage. That’s a lot of adjectives, is probably the first thing you thought. The second might be, Hey that sounds really familiar, isn’t that Tachyon? Much like how the company was originally Tachyon Nexus, the Tachyon project has recently become Alluxio with the 1.0 release.
  6. More importantly, you are probably wondering what all those adjectives meant. Let’s start with Open Source, this means the system’s source code is available for anyone to download, look at, or contribute to. We have a large community working together on the project and are growing at a rapid pace. Memory speed is referring to the system architecture designed to take advantage of the growing amounts of memory in machines. Virtual describes the abstraction Alluxio provides to storage systems and applications, essentially allowing the two layers to be separate from each other. And finally distributed refers to the fact Alluxio can scale to many machines as long as you have more commodity hardware to throw at it.
  7. Here is a more visual representation of where Alluxio sits and its function in the big data stack. Above Alluxio are many compute frameworks, such as map reduce and spark. These are connected by Alluxio to various storage systems which may not necessarily need to be file systems. However, Alluxio is more than just a connection layer, it provides great benefits by acting as a storage system which provides a view of all your data but only holds what is necessary.
  8. The previous diagram implied that Alluxio is something new in the stack, not a replacement for anything. Why would a new layer emerge or be useful, don’t applications just directly communicate with storage? To see the answer to this question, we need to take a look at technologies trends, in particular, memory. Memory is awesome, its super performant and allows workloads to run at blazing speeds. In the past decade, we’ve seen a exponential growth in RAM throughput and steadily declining costs. Ands its not just Alluxio which has realized this direction, many compute frameworks have embraced the idea of being memory centric to achieve impressive results.
  9. The add to the exponential throughput improvements of memory I/O are the steadily declining costs. The two factors generate the perfect situation for commodity technologies to be seriously designed with memory in mind.
  10. I’ll go through some simple examples of the high level point I mentioned. Data sharing between frameworks, Data resilience during application crashes, and Consolidate memory usage and reduce GC
  11. Video recommendation system similarity Top list