SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
Interactive Big Data Analytics with the
Starburst + Alluxio Stack for the Cloud
1
Matt Fuller (matt@starburstdata.com) | Co-Founder, Starburst
Bin Fan (binfan@alluxio.com) | Founding Engineer, Alluxio
Agenda
2
1. Why Presto+Alluxio Stack
2. Presto Overview
3. Alluxio Overview
4. Joint Use Cases
5. Best Practise
Motivation
3
Trends:
- Running Interactive SQL Queries over Big Data
- Cloud and object stores become the scalable and
cost-effective way to serve massive amount of data
Challenges:
- How to efficiently access data across Multi cloud / Hybrid
cloud
- SLA w.r.t. slow or variant I/O performance
Starburst Presto + Alluxio
4
A truly separated compute
and storage stack enabling
interactive big data analytics:
• on any object store
• across clusters of HDFS
• across multiple different
storage systems
• fast interactive SQL analytics
Download Starburst | www.starburstdata.com/presto-enterprise
Starburst Overview
About Me
Matt Fuller
Co-Founder at Starburst
Previously Teradata, Hadapt, Vertica
6
Email
matt@starburstdata.com
LinkedIn
https://www.linkedin.com/in/mfuller/
Company Overview
Founded 2017
• Founding team of largest committers to
open source project Presto
• Former Teradata, Vertica, Hadapt,
Netezza, and Ab Initio
Enterprise Presto Offering
• AWS, Azure, On Premises
GCP & Kubernetes (coming soon)
Headquartered Boston
• Locations in Boston, New York, and
Central Europe
Customers Globally
Starburst Offering
• Enterprise Presto
• Latest Cost Based Query Optimizer
• Fully Tested, Stable Releases
• Management
• Starburst Mission Control
• Presto Coordinator High Availability
• Autoscaling with Graceful Shutdown
• Presto Security Audit Logging
8
• Ecosystem
• Apache Ranger Integration
• Apache Sentry Integration
• Enterprise ODBC & JDBC drivers
• Support
• 24x7 Support SLA from the Presto
Experts
• Long Term Presto Version Support
• Hot fixes and Security Patches
• Access to Customer Success team of Data
Architects
• Starburst & Presto Roadmap Influence
Starburst: SQL on Anything
Query anything, anywhere
9
What is Presto?
Community-driven
open source project
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
What is Presto?
Community-driven
open source project
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
What is Presto?
Community-driven
open source project
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
What is Presto?
Community-driven
open source project
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
Nobody Knows Presto Like We Do
Presto commits by company, 2017-2018
Source: Github
Many Well Known Presto Users
See more at https://github.com/prestodb/presto/wiki/Presto-Users
Some key Presto contributions from our team
Presto-Admin
For easy installation &
management of Presto
Security
Integrations
Such as Kerberos, LDAP,
and in-transit encryption
ANSI SQL syntax
Enhancements to fully
support TPC-H and TPC-DS
ODBC and JDBC
drivers
To enable BI tools such as
Power BI, Tableau, Qlik, etc.
Presto Connectors
SQL Server, Cassandra,
and Kafka
Spill to disk
Capabilities for large
intermediate data sets
Query Performance
Cost-Based Query
Optimizer
Providing
performance boost
Improved performance
such as Window Functions
“Syntactic Optimizer” (without Cost Based Optimizer)
CUSTOMER ORDERS
LINEITEMCROSS JOIN
CROSS JOIN
FILTER
...
CUSTOMER ORDERS
LINEITEM
JOIN
ON CUSTKEY
JOIN
ON ORDERKEY
Cost Based Optimizer
ORDERS CUSTOMERS
JOIN
ON CUSTKEY
LINEITEM
JOIN
ON ORDERKEY
61M ROWS
15M ROWS 1.3M ROWS
FILTER
LINEITEM
15M ROWS
61M ROWS
3K ROWS
FILTER
LINEITEM
ORDERS
JOIN
ON ORDERKEY
CUSTOMER
JOIN
ON CUSTKEY
1.3M ROWS
15M ROWS 3K ROWS
3K ROWS
3K ROWS
61M ROWS
LINEITEM
Starburst Presto Architecture
Processor
Processor
Processor
COORDINATOR
WORKER
WORKER
DATA SOURCES
Parser Optimizer Scheduler
Azure
SQL Database
ADLS Gen 1 & 2 Blob Storage S3
Query Execution Model
STAGE 0STAGE 1
TASKS
OPERATOR
Alluxio Overview
Download Alluxio | www.alluxio.org/download
Questions? | www.alluxio.org/slack
About Me
• Bin Fan
• PhD CS@CMU
• Founding Engineer@Alluxio
22
Email: binfan@alluxio.com
Github: @apc999
Twitter: @binfan
Company
Overview
• Founded Feb. 2015 – Haoyuan Li
• PhD research at UC Berkeley AMPLab
• Initially Tachyon Nexus
• Venture backed: Andreessen Horowitz etc.
• Open Source
• Tachyon Open Sourced in Dec. 2012
• Open source v2.0-preview Mar. 2019
• 900+ Github contributors, 4000 Github stars
• Office in San Mateo, CA
• Team: Google, Palantir, Vmware, AMD, Cisco…
Data Ecosystem with Alluxio
• Data Locality: move data to
where it is needed
• Data Abstraction: API
translation to different file
systems and object stores
• Data Accessibility: Unified
namespace across different
storage systems
Alluxio: a Virtual Distributed File System
Java File API
HDFS
Interface
S3 Interface REST API
HDFS Connector S3 Connector Swift Connector NFS Connector
POSIX
Interface
24
Production Deployments
AND MORE!
11/16/18 25
Alluxio Architecture
Alluxio
Master
Zookeeper
Standby
Master
Alluxio
Worker
Alluxio
Worker Under Store
RAM / SSD / HDD
RAM / SSD / HDD
Control Path
Data Path
26
Read Data not Cached in Alluxio + Caching
27
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
WorkerUnder Store 12
3
4
4
Read Cached Data in Alluxio
Alluxio
Worker
RAM / SSD / HDD
Application
Alluxio
Client
28
1
2
3
Write data only to Alluxio
Alluxio
Worker
RAM / SSD / HDD
Application
Alluxio
Client
29
1
2
3
Write to Alluxio and Under Store Synchronously
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
Worker
Under Store
30
12
2
3
Alluxio, Presto, the Cloud
A Common File System Abstraction
32
• Common interface across apps
• HDFS-compatible interface:
change hdfs://foo/bar to
alluxio://foo/bar
• Other interfaces: Native Alluxio Java
FS, POSIX and S3.
• Cloud storage becomes “hidden”
to apps
• Less vendor lock-in!
Compute Zone
Standalone or managed with Mesos or Yarn
Storage in Different Availability Zone
Either on-prem or cloud
TensorflowPrestoMR
HDFS API POSIX API
Data Path: Improved I/O Performance
33
• A New Tier Above Cloud Storage for Compute
• Distributed buffer cache
• Restore locality to compute
• Read:
• Cache-hit read: served by Alluxio workers (local worker preferred)
• Cache-miss read: served by cloud storage, then cache to Alluxio worker
• Write:
• Burst buffer, then async propagate to S3 (Alluxio 2.0)
• Challenges:
• Locality: expose location information to applications; serve local apps
through ramdisk (rather than network)
Metadata Path: Familiar Semantics
34
• Listing / renaming on object store can be expensive
• Common operations for batch or SQL analytics
• Overwriting Put is eventually consistent
• Alluxio loads and manages metadata in master
• Apps can continue assuming HDFS-like semantics and performance
implication
• Challenges
• Data modification bypassing Alluxio: when and how to re-sync
• Slow lists in object store: batch operations
• Too many objects: off-heap metadata (Alluxio 2.0)
Performance Tuning Tips: Presto + Alluxio
35
• Data Locality
• Enable Locality Aware Scheduling
• Hostname matching
• Higher Parallelism
• Tune worker threads
• Tune number of splits in a batch
• Tune Alluxio client timeout
• Increase Netty timeout for Alluxi 1.8
https://www.alluxio.com/blog/top-5-performance-tuning-tips-for-running-presto-on-alluxio-1
Case Study:
- Leading Online Retailer (NASDAQ: JD)
- Building Ad-hoc SQL Query Engine
- Pain Point:
- Presto workers may read remotely from HDFS datanodes
- Large query variance
https://www.slideshare.net/Alluxio/alluxio-in-jd
36
Solution: Colocate Alluxio with Presto
37
Query Time
38
39
Query Time
Case Study:
40
- Leading Online Gaming Service Company (NASDAQ: NTES)
- Partner with Blizzard to operate service of “WoW”, “Hearthstone”
- Coming “Diablo Immortal”
- Building Ad-hoc SQL Query Engine
- Large data volume: ~30 TB raw data daily
- A separate satellite compute cluster
- Pain Point:
- Requirement in response time: < 15s
- Large startup latency on submitting SQL jobs as YARN app
https://www.alluxio.com/blog/presto-on-alluxio-how-netease-games-leveraged-alluxio-to-boost-ad-hoc-sql-on-
hdfs
Solution: Presto + Alluxio
41
Result: Smoother Response During Peak Time
42
Response time (ms)
Presto w/ Alluxio
Presto w/o Alluxio
- Presto + Alluxio as the Stack
- Truly separated compute and storage
- Improve data and metadata performance on cloud storage
- Alluxio Architecture and Data Flow
- Master, Worker, Under Storage
- Cache-{hit, miss} reads, Sync/Async writes
- Use Cases on Presto + Alluxio
Conclusion
43
zhuanlan.zhihu.com/alluxio
www.alluxio.com
info@alluxio.com
twitter.com/alluxio
linkedIn.com/alluxio
Thank you
binfan@alluxio.com
Metadata Path: Efficient Renames
45
• Rename files on S3 can be expensive
• Common operations for MR in commit phase
• Write results to tmp paths
• Rename tmp files to final paths (another copy, slow)
• Rename with Alluxio async writes
• t0: writes to tmp paths in Alluxio: near-compute, fast writes
• t1: rename tmp paths to final path in Alluxio: cheap renames
• t2: persist files in final paths in Alluxio to S3: 2PC to avoid partial data
• Speculative execution allowed
Data Transformation
46
• Pressure in all industries to be
“data driven”
• Majority of companies still figuring out
the transformation
• Increased collection of numerous,
low-value data
• Challenge of overcoming data silos to
convert data into business value
• Limited success of Data Warehouse,
Mart, and Lakes – cost of
copying/moving data is substantial
• Single Data Plane for Business
value
Migration to Cloud
47
• Decoupling of compute and
storage
• Enterprise move from turnkey
solution to self managed data
platforms on IaaS
• Lacking agility at Data Storage
level
• Requires Storage Abstraction
Data Path: Async Persist to S3 (Alluxio 2.0)
48
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
Master
Alluxio
Worker
Under Store
• Async Writes
• Step1: App writes to Alluxio
• Step2: Alluxio writes to UFS
• Benefits
• Apps writes in Alluxio speed
• Data gets persisted
• Challenges
• File rename/delete before
persist: 2PC
• Fault-tolerance: journal async
requests
Alluxio
49
• Our implementation of the data access layer – a virtual
distributed file system
• Open source project with over 900 contributors from 100s of
organizations worldwide
• Deployed in many top internet and financial companies
The Data Access Layer
50
• Abstraction layer between applications and storage systems
• Present a stable storage interface to applications, including
semantics, security, and performance
• Eliminate weakness of data silos instead of data silos
themselves
• Enable transparent migration of underlying storage systems
• Enable application API to storage API translation in a single
layer

Más contenido relacionado

La actualidad más candente

XenDesktop / XenAppの可用性を最大化するポッドアーキテクチャとは
XenDesktop / XenAppの可用性を最大化するポッドアーキテクチャとはXenDesktop / XenAppの可用性を最大化するポッドアーキテクチャとは
XenDesktop / XenAppの可用性を最大化するポッドアーキテクチャとはCitrix Systems Japan
 
データ分析を支える技術 データ分析基盤再入門
データ分析を支える技術 データ分析基盤再入門データ分析を支える技術 データ分析基盤再入門
データ分析を支える技術 データ分析基盤再入門Satoru Ishikawa
 
今から始めようMicrosoft PowerApps! (2017年版) - 吉田の備忘録
今から始めようMicrosoft PowerApps! (2017年版) - 吉田の備忘録今から始めようMicrosoft PowerApps! (2017年版) - 吉田の備忘録
今から始めようMicrosoft PowerApps! (2017年版) - 吉田の備忘録Taiki Yoshida
 
Azureをフル活用したサーバーレスの潮流について
Azureをフル活用したサーバーレスの潮流についてAzureをフル活用したサーバーレスの潮流について
Azureをフル活用したサーバーレスの潮流について真吾 吉田
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
 
Neo4j の「データ操作プログラミング」から 「ビジュアライズ」まで
Neo4j の「データ操作プログラミング」から 「ビジュアライズ」までNeo4j の「データ操作プログラミング」から 「ビジュアライズ」まで
Neo4j の「データ操作プログラミング」から 「ビジュアライズ」までKeiichiro Seida
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestAlluxio, Inc.
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Databricks
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j
 
データ分析基盤について
データ分析基盤についてデータ分析基盤について
データ分析基盤についてYuta Inamura
 
DynamoDBを導入した話
DynamoDBを導入した話DynamoDBを導入した話
DynamoDBを導入した話dcubeio
 
GraalVMでのFlight Recorderを使ったパフォーマンス解析(JJUG CCC 2023 Spring)
GraalVMでのFlight Recorderを使ったパフォーマンス解析(JJUG CCC 2023 Spring)GraalVMでのFlight Recorderを使ったパフォーマンス解析(JJUG CCC 2023 Spring)
GraalVMでのFlight Recorderを使ったパフォーマンス解析(JJUG CCC 2023 Spring)NTT DATA Technology & Innovation
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
業務システムとマイクロサービス
業務システムとマイクロサービス業務システムとマイクロサービス
業務システムとマイクロサービス土岐 孝平
 
今から始めるWebClient(JSUG勉強会 2020年その6 LT大会)
今から始めるWebClient(JSUG勉強会 2020年その6 LT大会)今から始めるWebClient(JSUG勉強会 2020年その6 LT大会)
今から始めるWebClient(JSUG勉強会 2020年その6 LT大会)NTT DATA Technology & Innovation
 
Xây dụng và kết hợp Kafka, Druid, Superset để đua vào ứng dụng phân tích dữ l...
Xây dụng và kết hợp Kafka, Druid, Superset để đua vào ứng dụng phân tích dữ l...Xây dụng và kết hợp Kafka, Druid, Superset để đua vào ứng dụng phân tích dữ l...
Xây dụng và kết hợp Kafka, Druid, Superset để đua vào ứng dụng phân tích dữ l...Đông Đô
 
概念モデリング再入門 + DDD
概念モデリング再入門 + DDD概念モデリング再入門 + DDD
概念モデリング再入門 + DDDHiroshima JUG
 
データベース入門1
データベース入門1データベース入門1
データベース入門1tadaaki hayashi
 
As-Isシステム分析は入出力から始めよ
As-Isシステム分析は入出力から始めよAs-Isシステム分析は入出力から始めよ
As-Isシステム分析は入出力から始めよKent Ishizawa
 

La actualidad más candente (20)

XenDesktop / XenAppの可用性を最大化するポッドアーキテクチャとは
XenDesktop / XenAppの可用性を最大化するポッドアーキテクチャとはXenDesktop / XenAppの可用性を最大化するポッドアーキテクチャとは
XenDesktop / XenAppの可用性を最大化するポッドアーキテクチャとは
 
データ分析を支える技術 データ分析基盤再入門
データ分析を支える技術 データ分析基盤再入門データ分析を支える技術 データ分析基盤再入門
データ分析を支える技術 データ分析基盤再入門
 
今から始めようMicrosoft PowerApps! (2017年版) - 吉田の備忘録
今から始めようMicrosoft PowerApps! (2017年版) - 吉田の備忘録今から始めようMicrosoft PowerApps! (2017年版) - 吉田の備忘録
今から始めようMicrosoft PowerApps! (2017年版) - 吉田の備忘録
 
Azureをフル活用したサーバーレスの潮流について
Azureをフル活用したサーバーレスの潮流についてAzureをフル活用したサーバーレスの潮流について
Azureをフル活用したサーバーレスの潮流について
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Neo4j の「データ操作プログラミング」から 「ビジュアライズ」まで
Neo4j の「データ操作プログラミング」から 「ビジュアライズ」までNeo4j の「データ操作プログラミング」から 「ビジュアライズ」まで
Neo4j の「データ操作プログラミング」から 「ビジュアライズ」まで
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
 
データ分析基盤について
データ分析基盤についてデータ分析基盤について
データ分析基盤について
 
DynamoDBを導入した話
DynamoDBを導入した話DynamoDBを導入した話
DynamoDBを導入した話
 
GraalVMでのFlight Recorderを使ったパフォーマンス解析(JJUG CCC 2023 Spring)
GraalVMでのFlight Recorderを使ったパフォーマンス解析(JJUG CCC 2023 Spring)GraalVMでのFlight Recorderを使ったパフォーマンス解析(JJUG CCC 2023 Spring)
GraalVMでのFlight Recorderを使ったパフォーマンス解析(JJUG CCC 2023 Spring)
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
業務システムとマイクロサービス
業務システムとマイクロサービス業務システムとマイクロサービス
業務システムとマイクロサービス
 
今から始めるWebClient(JSUG勉強会 2020年その6 LT大会)
今から始めるWebClient(JSUG勉強会 2020年その6 LT大会)今から始めるWebClient(JSUG勉強会 2020年その6 LT大会)
今から始めるWebClient(JSUG勉強会 2020年その6 LT大会)
 
Xây dụng và kết hợp Kafka, Druid, Superset để đua vào ứng dụng phân tích dữ l...
Xây dụng và kết hợp Kafka, Druid, Superset để đua vào ứng dụng phân tích dữ l...Xây dụng và kết hợp Kafka, Druid, Superset để đua vào ứng dụng phân tích dữ l...
Xây dụng và kết hợp Kafka, Druid, Superset để đua vào ứng dụng phân tích dữ l...
 
概念モデリング再入門 + DDD
概念モデリング再入門 + DDD概念モデリング再入門 + DDD
概念モデリング再入門 + DDD
 
データベース入門1
データベース入門1データベース入門1
データベース入門1
 
As-Isシステム分析は入出力から始めよ
As-Isシステム分析は入出力から始めよAs-Isシステム分析は入出力から始めよ
As-Isシステム分析は入出力から始めよ
 
MySQL at Yahoo! JAPAN #dbts2018
MySQL at Yahoo! JAPAN #dbts2018MySQL at Yahoo! JAPAN #dbts2018
MySQL at Yahoo! JAPAN #dbts2018
 

Similar a Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud

Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudAlluxio, Inc.
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Piotr Findeisen
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio, Inc.
 
Enabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Ultra-fast Presto in the Cloud with AlluxioEnabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Ultra-fast Presto in the Cloud with AlluxioAlluxio, Inc.
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectiveAlluxio, Inc.
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Bostonkbajda
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreAlluxio, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioAlluxio, Inc.
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioAlluxio, Inc.
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Alluxio, Inc.
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Community
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 

Similar a Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud (20)

Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
 
Enabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Ultra-fast Presto in the Cloud with AlluxioEnabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Ultra-fast Presto in the Cloud with Alluxio
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 

Más de Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 

Más de Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Último

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Último (20)

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud

  • 1. Interactive Big Data Analytics with the Starburst + Alluxio Stack for the Cloud 1 Matt Fuller (matt@starburstdata.com) | Co-Founder, Starburst Bin Fan (binfan@alluxio.com) | Founding Engineer, Alluxio
  • 2. Agenda 2 1. Why Presto+Alluxio Stack 2. Presto Overview 3. Alluxio Overview 4. Joint Use Cases 5. Best Practise
  • 3. Motivation 3 Trends: - Running Interactive SQL Queries over Big Data - Cloud and object stores become the scalable and cost-effective way to serve massive amount of data Challenges: - How to efficiently access data across Multi cloud / Hybrid cloud - SLA w.r.t. slow or variant I/O performance
  • 4. Starburst Presto + Alluxio 4 A truly separated compute and storage stack enabling interactive big data analytics: • on any object store • across clusters of HDFS • across multiple different storage systems • fast interactive SQL analytics
  • 5. Download Starburst | www.starburstdata.com/presto-enterprise Starburst Overview
  • 6. About Me Matt Fuller Co-Founder at Starburst Previously Teradata, Hadapt, Vertica 6 Email matt@starburstdata.com LinkedIn https://www.linkedin.com/in/mfuller/
  • 7. Company Overview Founded 2017 • Founding team of largest committers to open source project Presto • Former Teradata, Vertica, Hadapt, Netezza, and Ab Initio Enterprise Presto Offering • AWS, Azure, On Premises GCP & Kubernetes (coming soon) Headquartered Boston • Locations in Boston, New York, and Central Europe Customers Globally
  • 8. Starburst Offering • Enterprise Presto • Latest Cost Based Query Optimizer • Fully Tested, Stable Releases • Management • Starburst Mission Control • Presto Coordinator High Availability • Autoscaling with Graceful Shutdown • Presto Security Audit Logging 8 • Ecosystem • Apache Ranger Integration • Apache Sentry Integration • Enterprise ODBC & JDBC drivers • Support • 24x7 Support SLA from the Presto Experts • Long Term Presto Version Support • Hot fixes and Security Patches • Access to Customer Success team of Data Architects • Starburst & Presto Roadmap Influence
  • 9. Starburst: SQL on Anything Query anything, anywhere 9
  • 10. What is Presto? Community-driven open source project High performance ANSI SQL engine • New Cost-Based Query Optimizer • Proven scalability • High concurrency Separation of compute and storage • Scale storage and compute independently • No ETL or data integration necessary to get to insights • SQL-on-anything No vendor lock-in • No Hadoop distro vendor lock-in • No storage engine vendor lock-in • No cloud vendor lock-in
  • 11. What is Presto? Community-driven open source project Separation of compute and storage • Scale storage and compute independently • No ETL or data integration necessary to get to insights • SQL-on-anything No vendor lock-in • No Hadoop distro vendor lock-in • No storage engine vendor lock-in • No cloud vendor lock-in High performance ANSI SQL engine • New Cost-Based Query Optimizer • Proven scalability • High concurrency
  • 12. What is Presto? Community-driven open source project No vendor lock-in • No Hadoop distro vendor lock-in • No storage engine vendor lock-in • No cloud vendor lock-in High performance ANSI SQL engine • New Cost-Based Query Optimizer • Proven scalability • High concurrency Separation of compute and storage • Scale storage and compute independently • No ETL or data integration necessary to get to insights • SQL-on-anything
  • 13. What is Presto? Community-driven open source project High performance ANSI SQL engine • New Cost-Based Query Optimizer • Proven scalability • High concurrency Separation of compute and storage • Scale storage and compute independently • No ETL or data integration necessary to get to insights • SQL-on-anything No vendor lock-in • No Hadoop distro vendor lock-in • No storage engine vendor lock-in • No cloud vendor lock-in
  • 14. Nobody Knows Presto Like We Do Presto commits by company, 2017-2018 Source: Github
  • 15. Many Well Known Presto Users See more at https://github.com/prestodb/presto/wiki/Presto-Users
  • 16. Some key Presto contributions from our team Presto-Admin For easy installation & management of Presto Security Integrations Such as Kerberos, LDAP, and in-transit encryption ANSI SQL syntax Enhancements to fully support TPC-H and TPC-DS ODBC and JDBC drivers To enable BI tools such as Power BI, Tableau, Qlik, etc. Presto Connectors SQL Server, Cassandra, and Kafka Spill to disk Capabilities for large intermediate data sets Query Performance Cost-Based Query Optimizer Providing performance boost Improved performance such as Window Functions
  • 17. “Syntactic Optimizer” (without Cost Based Optimizer) CUSTOMER ORDERS LINEITEMCROSS JOIN CROSS JOIN FILTER ... CUSTOMER ORDERS LINEITEM JOIN ON CUSTKEY JOIN ON ORDERKEY
  • 18. Cost Based Optimizer ORDERS CUSTOMERS JOIN ON CUSTKEY LINEITEM JOIN ON ORDERKEY 61M ROWS 15M ROWS 1.3M ROWS FILTER LINEITEM 15M ROWS 61M ROWS 3K ROWS FILTER LINEITEM ORDERS JOIN ON ORDERKEY CUSTOMER JOIN ON CUSTKEY 1.3M ROWS 15M ROWS 3K ROWS 3K ROWS 3K ROWS 61M ROWS LINEITEM
  • 19. Starburst Presto Architecture Processor Processor Processor COORDINATOR WORKER WORKER DATA SOURCES Parser Optimizer Scheduler Azure SQL Database ADLS Gen 1 & 2 Blob Storage S3
  • 20. Query Execution Model STAGE 0STAGE 1 TASKS OPERATOR
  • 21. Alluxio Overview Download Alluxio | www.alluxio.org/download Questions? | www.alluxio.org/slack
  • 22. About Me • Bin Fan • PhD CS@CMU • Founding Engineer@Alluxio 22 Email: binfan@alluxio.com Github: @apc999 Twitter: @binfan
  • 23. Company Overview • Founded Feb. 2015 – Haoyuan Li • PhD research at UC Berkeley AMPLab • Initially Tachyon Nexus • Venture backed: Andreessen Horowitz etc. • Open Source • Tachyon Open Sourced in Dec. 2012 • Open source v2.0-preview Mar. 2019 • 900+ Github contributors, 4000 Github stars • Office in San Mateo, CA • Team: Google, Palantir, Vmware, AMD, Cisco…
  • 24. Data Ecosystem with Alluxio • Data Locality: move data to where it is needed • Data Abstraction: API translation to different file systems and object stores • Data Accessibility: Unified namespace across different storage systems Alluxio: a Virtual Distributed File System Java File API HDFS Interface S3 Interface REST API HDFS Connector S3 Connector Swift Connector NFS Connector POSIX Interface 24
  • 26. Alluxio Architecture Alluxio Master Zookeeper Standby Master Alluxio Worker Alluxio Worker Under Store RAM / SSD / HDD RAM / SSD / HDD Control Path Data Path 26
  • 27. Read Data not Cached in Alluxio + Caching 27 RAM / SSD / HDD Application Alluxio Client Alluxio WorkerUnder Store 12 3 4 4
  • 28. Read Cached Data in Alluxio Alluxio Worker RAM / SSD / HDD Application Alluxio Client 28 1 2 3
  • 29. Write data only to Alluxio Alluxio Worker RAM / SSD / HDD Application Alluxio Client 29 1 2 3
  • 30. Write to Alluxio and Under Store Synchronously RAM / SSD / HDD Application Alluxio Client Alluxio Worker Under Store 30 12 2 3
  • 32. A Common File System Abstraction 32 • Common interface across apps • HDFS-compatible interface: change hdfs://foo/bar to alluxio://foo/bar • Other interfaces: Native Alluxio Java FS, POSIX and S3. • Cloud storage becomes “hidden” to apps • Less vendor lock-in! Compute Zone Standalone or managed with Mesos or Yarn Storage in Different Availability Zone Either on-prem or cloud TensorflowPrestoMR HDFS API POSIX API
  • 33. Data Path: Improved I/O Performance 33 • A New Tier Above Cloud Storage for Compute • Distributed buffer cache • Restore locality to compute • Read: • Cache-hit read: served by Alluxio workers (local worker preferred) • Cache-miss read: served by cloud storage, then cache to Alluxio worker • Write: • Burst buffer, then async propagate to S3 (Alluxio 2.0) • Challenges: • Locality: expose location information to applications; serve local apps through ramdisk (rather than network)
  • 34. Metadata Path: Familiar Semantics 34 • Listing / renaming on object store can be expensive • Common operations for batch or SQL analytics • Overwriting Put is eventually consistent • Alluxio loads and manages metadata in master • Apps can continue assuming HDFS-like semantics and performance implication • Challenges • Data modification bypassing Alluxio: when and how to re-sync • Slow lists in object store: batch operations • Too many objects: off-heap metadata (Alluxio 2.0)
  • 35. Performance Tuning Tips: Presto + Alluxio 35 • Data Locality • Enable Locality Aware Scheduling • Hostname matching • Higher Parallelism • Tune worker threads • Tune number of splits in a batch • Tune Alluxio client timeout • Increase Netty timeout for Alluxi 1.8 https://www.alluxio.com/blog/top-5-performance-tuning-tips-for-running-presto-on-alluxio-1
  • 36. Case Study: - Leading Online Retailer (NASDAQ: JD) - Building Ad-hoc SQL Query Engine - Pain Point: - Presto workers may read remotely from HDFS datanodes - Large query variance https://www.slideshare.net/Alluxio/alluxio-in-jd 36
  • 37. Solution: Colocate Alluxio with Presto 37
  • 40. Case Study: 40 - Leading Online Gaming Service Company (NASDAQ: NTES) - Partner with Blizzard to operate service of “WoW”, “Hearthstone” - Coming “Diablo Immortal” - Building Ad-hoc SQL Query Engine - Large data volume: ~30 TB raw data daily - A separate satellite compute cluster - Pain Point: - Requirement in response time: < 15s - Large startup latency on submitting SQL jobs as YARN app https://www.alluxio.com/blog/presto-on-alluxio-how-netease-games-leveraged-alluxio-to-boost-ad-hoc-sql-on- hdfs
  • 41. Solution: Presto + Alluxio 41
  • 42. Result: Smoother Response During Peak Time 42 Response time (ms) Presto w/ Alluxio Presto w/o Alluxio
  • 43. - Presto + Alluxio as the Stack - Truly separated compute and storage - Improve data and metadata performance on cloud storage - Alluxio Architecture and Data Flow - Master, Worker, Under Storage - Cache-{hit, miss} reads, Sync/Async writes - Use Cases on Presto + Alluxio Conclusion 43
  • 45. Metadata Path: Efficient Renames 45 • Rename files on S3 can be expensive • Common operations for MR in commit phase • Write results to tmp paths • Rename tmp files to final paths (another copy, slow) • Rename with Alluxio async writes • t0: writes to tmp paths in Alluxio: near-compute, fast writes • t1: rename tmp paths to final path in Alluxio: cheap renames • t2: persist files in final paths in Alluxio to S3: 2PC to avoid partial data • Speculative execution allowed
  • 46. Data Transformation 46 • Pressure in all industries to be “data driven” • Majority of companies still figuring out the transformation • Increased collection of numerous, low-value data • Challenge of overcoming data silos to convert data into business value • Limited success of Data Warehouse, Mart, and Lakes – cost of copying/moving data is substantial • Single Data Plane for Business value
  • 47. Migration to Cloud 47 • Decoupling of compute and storage • Enterprise move from turnkey solution to self managed data platforms on IaaS • Lacking agility at Data Storage level • Requires Storage Abstraction
  • 48. Data Path: Async Persist to S3 (Alluxio 2.0) 48 RAM / SSD / HDD Application Alluxio Client Alluxio Master Alluxio Worker Under Store • Async Writes • Step1: App writes to Alluxio • Step2: Alluxio writes to UFS • Benefits • Apps writes in Alluxio speed • Data gets persisted • Challenges • File rename/delete before persist: 2PC • Fault-tolerance: journal async requests
  • 49. Alluxio 49 • Our implementation of the data access layer – a virtual distributed file system • Open source project with over 900 contributors from 100s of organizations worldwide • Deployed in many top internet and financial companies
  • 50. The Data Access Layer 50 • Abstraction layer between applications and storage systems • Present a stable storage interface to applications, including semantics, security, and performance • Eliminate weakness of data silos instead of data silos themselves • Enable transparent migration of underlying storage systems • Enable application API to storage API translation in a single layer