Alluxio Tech Talk
Mar 12, 2019
Speaker:
Bin Fan, Alluxio
Matt Fuller, Starburst
As data analytic needs have increased with the explosion of data, the importance of the speed of analytics and the interactivity of queries has increased dramatically
In this tech talk, we will introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency analytics platform. This stack provides a strong solution to run fast SQL across multiple storage systems including HDFS, S3, and others in public cloud, hybrid cloud, and multi-cloud environments.
You’ll learn about:
- The architecture of Presto, an open source distributed SQL engine, as well as innovations by Starburst like as it’s cost-based optimizer
- How Presto can query data from cloud object storage like S3 at high performance and cost-effectively with Alluxio
- How to achieve data locality and cross-job caching with Alluxio no matter where the data is persisted and reduce egress costs
In addition, we’ll present some real world architectures & use cases from internet companies like JD.com and NetEase.com running the Presto and Alluxio stack at the scale of hundreds of nodes.
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
1. Interactive Big Data Analytics with the
Starburst + Alluxio Stack for the Cloud
1
Matt Fuller (matt@starburstdata.com) | Co-Founder, Starburst
Bin Fan (binfan@alluxio.com) | Founding Engineer, Alluxio
3. Motivation
3
Trends:
- Running Interactive SQL Queries over Big Data
- Cloud and object stores become the scalable and
cost-effective way to serve massive amount of data
Challenges:
- How to efficiently access data across Multi cloud / Hybrid
cloud
- SLA w.r.t. slow or variant I/O performance
4. Starburst Presto + Alluxio
4
A truly separated compute
and storage stack enabling
interactive big data analytics:
• on any object store
• across clusters of HDFS
• across multiple different
storage systems
• fast interactive SQL analytics
6. About Me
Matt Fuller
Co-Founder at Starburst
Previously Teradata, Hadapt, Vertica
6
Email
matt@starburstdata.com
LinkedIn
https://www.linkedin.com/in/mfuller/
7. Company Overview
Founded 2017
• Founding team of largest committers to
open source project Presto
• Former Teradata, Vertica, Hadapt,
Netezza, and Ab Initio
Enterprise Presto Offering
• AWS, Azure, On Premises
GCP & Kubernetes (coming soon)
Headquartered Boston
• Locations in Boston, New York, and
Central Europe
Customers Globally
8. Starburst Offering
• Enterprise Presto
• Latest Cost Based Query Optimizer
• Fully Tested, Stable Releases
• Management
• Starburst Mission Control
• Presto Coordinator High Availability
• Autoscaling with Graceful Shutdown
• Presto Security Audit Logging
8
• Ecosystem
• Apache Ranger Integration
• Apache Sentry Integration
• Enterprise ODBC & JDBC drivers
• Support
• 24x7 Support SLA from the Presto
Experts
• Long Term Presto Version Support
• Hot fixes and Security Patches
• Access to Customer Success team of Data
Architects
• Starburst & Presto Roadmap Influence
10. What is Presto?
Community-driven
open source project
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
11. What is Presto?
Community-driven
open source project
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
12. What is Presto?
Community-driven
open source project
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
13. What is Presto?
Community-driven
open source project
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
14. Nobody Knows Presto Like We Do
Presto commits by company, 2017-2018
Source: Github
15. Many Well Known Presto Users
See more at https://github.com/prestodb/presto/wiki/Presto-Users
16. Some key Presto contributions from our team
Presto-Admin
For easy installation &
management of Presto
Security
Integrations
Such as Kerberos, LDAP,
and in-transit encryption
ANSI SQL syntax
Enhancements to fully
support TPC-H and TPC-DS
ODBC and JDBC
drivers
To enable BI tools such as
Power BI, Tableau, Qlik, etc.
Presto Connectors
SQL Server, Cassandra,
and Kafka
Spill to disk
Capabilities for large
intermediate data sets
Query Performance
Cost-Based Query
Optimizer
Providing
performance boost
Improved performance
such as Window Functions
17. “Syntactic Optimizer” (without Cost Based Optimizer)
CUSTOMER ORDERS
LINEITEMCROSS JOIN
CROSS JOIN
FILTER
...
CUSTOMER ORDERS
LINEITEM
JOIN
ON CUSTKEY
JOIN
ON ORDERKEY
18. Cost Based Optimizer
ORDERS CUSTOMERS
JOIN
ON CUSTKEY
LINEITEM
JOIN
ON ORDERKEY
61M ROWS
15M ROWS 1.3M ROWS
FILTER
LINEITEM
15M ROWS
61M ROWS
3K ROWS
FILTER
LINEITEM
ORDERS
JOIN
ON ORDERKEY
CUSTOMER
JOIN
ON CUSTKEY
1.3M ROWS
15M ROWS 3K ROWS
3K ROWS
3K ROWS
61M ROWS
LINEITEM
22. About Me
• Bin Fan
• PhD CS@CMU
• Founding Engineer@Alluxio
22
Email: binfan@alluxio.com
Github: @apc999
Twitter: @binfan
23. Company
Overview
• Founded Feb. 2015 – Haoyuan Li
• PhD research at UC Berkeley AMPLab
• Initially Tachyon Nexus
• Venture backed: Andreessen Horowitz etc.
• Open Source
• Tachyon Open Sourced in Dec. 2012
• Open source v2.0-preview Mar. 2019
• 900+ Github contributors, 4000 Github stars
• Office in San Mateo, CA
• Team: Google, Palantir, Vmware, AMD, Cisco…
24. Data Ecosystem with Alluxio
• Data Locality: move data to
where it is needed
• Data Abstraction: API
translation to different file
systems and object stores
• Data Accessibility: Unified
namespace across different
storage systems
Alluxio: a Virtual Distributed File System
Java File API
HDFS
Interface
S3 Interface REST API
HDFS Connector S3 Connector Swift Connector NFS Connector
POSIX
Interface
24
32. A Common File System Abstraction
32
• Common interface across apps
• HDFS-compatible interface:
change hdfs://foo/bar to
alluxio://foo/bar
• Other interfaces: Native Alluxio Java
FS, POSIX and S3.
• Cloud storage becomes “hidden”
to apps
• Less vendor lock-in!
Compute Zone
Standalone or managed with Mesos or Yarn
Storage in Different Availability Zone
Either on-prem or cloud
TensorflowPrestoMR
HDFS API POSIX API
33. Data Path: Improved I/O Performance
33
• A New Tier Above Cloud Storage for Compute
• Distributed buffer cache
• Restore locality to compute
• Read:
• Cache-hit read: served by Alluxio workers (local worker preferred)
• Cache-miss read: served by cloud storage, then cache to Alluxio worker
• Write:
• Burst buffer, then async propagate to S3 (Alluxio 2.0)
• Challenges:
• Locality: expose location information to applications; serve local apps
through ramdisk (rather than network)
34. Metadata Path: Familiar Semantics
34
• Listing / renaming on object store can be expensive
• Common operations for batch or SQL analytics
• Overwriting Put is eventually consistent
• Alluxio loads and manages metadata in master
• Apps can continue assuming HDFS-like semantics and performance
implication
• Challenges
• Data modification bypassing Alluxio: when and how to re-sync
• Slow lists in object store: batch operations
• Too many objects: off-heap metadata (Alluxio 2.0)
35. Performance Tuning Tips: Presto + Alluxio
35
• Data Locality
• Enable Locality Aware Scheduling
• Hostname matching
• Higher Parallelism
• Tune worker threads
• Tune number of splits in a batch
• Tune Alluxio client timeout
• Increase Netty timeout for Alluxi 1.8
https://www.alluxio.com/blog/top-5-performance-tuning-tips-for-running-presto-on-alluxio-1
36. Case Study:
- Leading Online Retailer (NASDAQ: JD)
- Building Ad-hoc SQL Query Engine
- Pain Point:
- Presto workers may read remotely from HDFS datanodes
- Large query variance
https://www.slideshare.net/Alluxio/alluxio-in-jd
36
40. Case Study:
40
- Leading Online Gaming Service Company (NASDAQ: NTES)
- Partner with Blizzard to operate service of “WoW”, “Hearthstone”
- Coming “Diablo Immortal”
- Building Ad-hoc SQL Query Engine
- Large data volume: ~30 TB raw data daily
- A separate satellite compute cluster
- Pain Point:
- Requirement in response time: < 15s
- Large startup latency on submitting SQL jobs as YARN app
https://www.alluxio.com/blog/presto-on-alluxio-how-netease-games-leveraged-alluxio-to-boost-ad-hoc-sql-on-
hdfs
42. Result: Smoother Response During Peak Time
42
Response time (ms)
Presto w/ Alluxio
Presto w/o Alluxio
43. - Presto + Alluxio as the Stack
- Truly separated compute and storage
- Improve data and metadata performance on cloud storage
- Alluxio Architecture and Data Flow
- Master, Worker, Under Storage
- Cache-{hit, miss} reads, Sync/Async writes
- Use Cases on Presto + Alluxio
Conclusion
43
45. Metadata Path: Efficient Renames
45
• Rename files on S3 can be expensive
• Common operations for MR in commit phase
• Write results to tmp paths
• Rename tmp files to final paths (another copy, slow)
• Rename with Alluxio async writes
• t0: writes to tmp paths in Alluxio: near-compute, fast writes
• t1: rename tmp paths to final path in Alluxio: cheap renames
• t2: persist files in final paths in Alluxio to S3: 2PC to avoid partial data
• Speculative execution allowed
46. Data Transformation
46
• Pressure in all industries to be
“data driven”
• Majority of companies still figuring out
the transformation
• Increased collection of numerous,
low-value data
• Challenge of overcoming data silos to
convert data into business value
• Limited success of Data Warehouse,
Mart, and Lakes – cost of
copying/moving data is substantial
• Single Data Plane for Business
value
47. Migration to Cloud
47
• Decoupling of compute and
storage
• Enterprise move from turnkey
solution to self managed data
platforms on IaaS
• Lacking agility at Data Storage
level
• Requires Storage Abstraction
48. Data Path: Async Persist to S3 (Alluxio 2.0)
48
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
Master
Alluxio
Worker
Under Store
• Async Writes
• Step1: App writes to Alluxio
• Step2: Alluxio writes to UFS
• Benefits
• Apps writes in Alluxio speed
• Data gets persisted
• Challenges
• File rename/delete before
persist: 2PC
• Fault-tolerance: journal async
requests
49. Alluxio
49
• Our implementation of the data access layer – a virtual
distributed file system
• Open source project with over 900 contributors from 100s of
organizations worldwide
• Deployed in many top internet and financial companies
50. The Data Access Layer
50
• Abstraction layer between applications and storage systems
• Present a stable storage interface to applications, including
semantics, security, and performance
• Eliminate weakness of data silos instead of data silos
themselves
• Enable transparent migration of underlying storage systems
• Enable application API to storage API translation in a single
layer