SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Tuning Performance for SQL-on-Anything Analytics
Martin Traverso, Co-creator of Presto
Kamil Bajda-Pawlikowski, CTO Starburst
@prestosql @starburstdata
Strata Data 2019
San Francisco, CA
Presto: SQL-on-Anything
Deploy Anywhere, Query Anything
Why Presto?
Community-driven
open source project
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute
and storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
Project History
©2017 Starburst Data, Inc. All Rights Reserved
Community
See more at our Wiki
Presto in Production
Facebook: 10,000+ of nodes, HDFS (ORC, RCFile), sharded MySQL, 1000s of users
Uber: 2,000+ nodes (several clusters on premises) with 160K+ queries daily over HDFS (Parquet/ORC)
Twitter: 2,000+ nodes (several clusters on premises and GCP), 20K+ queries daily (Parquet)
LinkedIn: 500+ nodes, 200K+ queries daily over HDFS (ORC), and ~1000 users
Lyft: ------ redacted due to the quiet period for the IPO -----------
Netflix: 300+ nodes in AWS, 100+ PB in S3 (Parquet)
Yahoo! Japan: 200+ nodes for HDFS (ORC), and ObjectStore
FINRA: 120+ nodes in AWS, 4PB in S3 (ORC), 200+ users
Starburst Data
© 2019 7
Founded by Presto committers:
● Over 4 years of contributions to Presto
● Presto distro for on-prem and cloud env
● Supporting large customers in production
● Enterprise subscription add-ons (ODBC,
Ranger, Sentry, Oracle, Teradata)
Notable features contributed:
● ANSI SQL syntax enhancements
● Execution engine improvements
● Security integrations
● Spill to disk
● Cost-Based Optimizer
https://www.starburstdata.com/presto-enterprise/
Performance
Built for Performance
Query Execution Engine:
● MPP-style pipelined in-memory execution
● Columnar and vectorized data processing
● Runtime query bytecode compilation
● Memory efficient data structures
● Multi-threaded multi-core execution
● Optimized readers for columnar formats (ORC and Parquet)
● Predicate and column projection pushdown
● Now also Cost-Based Optimizer
CBO in a nutshell
Presto Cost-Based Optimizer includes:
● support for statistics stored in Hive Metastore
● join reordering based on selectivity estimates and cost
● automatic join type selection (repartitioned vs broadcast)
● automatic left/right side selection for joined tables
https://www.starburstdata.com/technical-blog/
Statistics & Cost
Hive Metastore statistics:
● number of rows in a table
● number of distinct values in a column
● fraction of NULL values in a column
● minimum/maximum value in a column
● average data size for a column
Cost calculation includes:
● CPU
● Memory
● Network I/O
Join type selection
Join left/right side decision
Join reordering with filter
Join tree shapes
CBO off
CBO on
https://www.starburstdata.com/presto-benchmarks/
Benchmark results
Benchmark results
● on average 7x improvement vs EMR Presto
● EMR Presto cannot execute many TPC-DS queries
● All TPC-DS queries pass on Starburst Presto
https://www.starburstdata.com/presto-aws/
Recent CBO enhancements
● Deciding on semi-join distribution type based on cost
● Support for outer joins
● Capping a broadcasted table size
● Various minor fixes in cardinality estimation
● ANALYZE table (native in Presto)
● Stats for AWS Glue Catalog (exclusive from Starburst)
Current and Future work
What’s next for Optimizer
● Stats support
○ Improved stats for Hive
○ Stats for DBMS connectors and NoSQL connectors
○ Tolerate missing / incomplete stats
● Core CBO enhancements
○ Cost more operators
○ Adjust cost model weights based on the hardware
○ Adaptive optimizations
○ Introduce Traits
● Involve connectors in optimizations
Involving Connectors in Optimization
History and Current State
● Original motivation: partition pruning for queries over Hive tables
● Simple range predicates and nullability checks passed to connectors.
Modeled as TupleDomain
((col0 BETWEEN ? AND ?) OR (col0 BETWEEN ? and ?) OR …))
AND
((col1 BETWEEN ? AND ?) OR (col1 BETWEEN ? and ?) OR …))
AND
...
History and Current State
● Partial evaluation of non-trivial expressions
○ Bind only known variables
○ Result in "true/false/null" or "can't tell”. E.g.,
f(a, b) := lower(a) LIKE ‘john%’ AND b = 1
f(‘Mary’, ?) → false → can prune
f(‘John S’, ?) → b = 1 → ¯_(ツ)_/¯
Beyond Simple Filter Pushdown...
● Dereference expressions. E.g., x.a > 5
● Array/map subscript. E.g., a[‘key’] = 10
● Complex filters and projections
● Aggregations
● Joins
● Limit: https://github.com/prestosql/presto/pull/421
● Sampling
● Others…
https://github.com/prestosql/presto/issues/18
A
B
C
D E
F G
A
C’
B’
D E
F G
B
C
? ?
C’
B’?
?
Pattern Result
Rule 1
A
C’
E’
D B’’
F G
B
E
?
E’
B’
?
Pattern Result
A
C’
B’
D E
F G
Rule 2
Filter
x.f > 5 AND y LIKE ‘a%b’
Scan(t_0)
Filter
z > 5
Scan(t_1)
FilterIntoScan
Rule
Connector.applyFilter(...)
Table: t_0
Filter: x.f > 5 AND y LIKE ‘a%b’
Table t
x :: row(f bigint, g bigint)
y :: varchar(10)
Derived Table: t_1
Filter: z > 5 [z :: bigint]
SELECT count(*)
FROM t
WHERE x.f > 5 AND y LIKE ‘a%b’
New Connector APIs
applyFilter(ConnectorTableHandle table, Expression filter)
applyLimit(ConnectorTableHandle table, long limit)
applyAggregation(ConnectorTableHandle table, List<Aggregation> aggregates)
applySampling(ConnectorTableHandle table, double samplingRate)
...
Performance Benefits (?)
● Better support for sophisticated backend systems
○ Druid, Pinot, ElasticSearch
○ SQL databases
● Improved performance for columnar data formats (Parquet, ORC)
ORC Performance Improvements
https://github.com/prestosql/presto/pull/555
ORC Performance Improvements - TPC-DS
Project Roadmap
● Coordinator HA
● Kubernetes
● Dynamic filtering
● Connectors
○ Phoenix
○ Iceberg
○ Druid
● TIMESTAMP semantics
● And more… https://github.com/prestosql/presto/labels/roadmap
Getting Involved
● Join us on Slack
○ Invite link: https://prestosql.io/community.html
● Github: https://github.io/prestosql/presto
● Website: https://prestosql.io
Further reading
https://www.starburstdata.com/presto-newsletter/
https://fivetran.com/blog/warehouse-benchmark
https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-emr-sql/
http://bytes.schibsted.com/bigdata-sql-query-engine-benchmark/
https://virtuslab.com/blog/benchmarking-spark-sql-presto-hive-bi-processing-googles-cloud-d
ataproc/
Thank You!
@prestosql @starburstdata
www.starburstdata.comwww.prestosql.io

Más contenido relacionado

Más de kbajda

Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedInkbajda
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Prestokbajda
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBOkbajda
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkkbajda
 

Más de kbajda (6)

Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 

Último

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 

Último (20)

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 

Presto talk @ Strata Data CA 2019

  • 1. Tuning Performance for SQL-on-Anything Analytics Martin Traverso, Co-creator of Presto Kamil Bajda-Pawlikowski, CTO Starburst @prestosql @starburstdata Strata Data 2019 San Francisco, CA
  • 3. Why Presto? Community-driven open source project High performance ANSI SQL engine • New Cost-Based Query Optimizer • Proven scalability • High concurrency Separation of compute and storage • Scale storage and compute independently • No ETL or data integration necessary to get to insights • SQL-on-anything No vendor lock-in • No Hadoop distro vendor lock-in • No storage engine vendor lock-in • No cloud vendor lock-in
  • 4. Project History ©2017 Starburst Data, Inc. All Rights Reserved
  • 6. Presto in Production Facebook: 10,000+ of nodes, HDFS (ORC, RCFile), sharded MySQL, 1000s of users Uber: 2,000+ nodes (several clusters on premises) with 160K+ queries daily over HDFS (Parquet/ORC) Twitter: 2,000+ nodes (several clusters on premises and GCP), 20K+ queries daily (Parquet) LinkedIn: 500+ nodes, 200K+ queries daily over HDFS (ORC), and ~1000 users Lyft: ------ redacted due to the quiet period for the IPO ----------- Netflix: 300+ nodes in AWS, 100+ PB in S3 (Parquet) Yahoo! Japan: 200+ nodes for HDFS (ORC), and ObjectStore FINRA: 120+ nodes in AWS, 4PB in S3 (ORC), 200+ users
  • 7. Starburst Data © 2019 7 Founded by Presto committers: ● Over 4 years of contributions to Presto ● Presto distro for on-prem and cloud env ● Supporting large customers in production ● Enterprise subscription add-ons (ODBC, Ranger, Sentry, Oracle, Teradata) Notable features contributed: ● ANSI SQL syntax enhancements ● Execution engine improvements ● Security integrations ● Spill to disk ● Cost-Based Optimizer https://www.starburstdata.com/presto-enterprise/
  • 9. Built for Performance Query Execution Engine: ● MPP-style pipelined in-memory execution ● Columnar and vectorized data processing ● Runtime query bytecode compilation ● Memory efficient data structures ● Multi-threaded multi-core execution ● Optimized readers for columnar formats (ORC and Parquet) ● Predicate and column projection pushdown ● Now also Cost-Based Optimizer
  • 10. CBO in a nutshell Presto Cost-Based Optimizer includes: ● support for statistics stored in Hive Metastore ● join reordering based on selectivity estimates and cost ● automatic join type selection (repartitioned vs broadcast) ● automatic left/right side selection for joined tables https://www.starburstdata.com/technical-blog/
  • 11. Statistics & Cost Hive Metastore statistics: ● number of rows in a table ● number of distinct values in a column ● fraction of NULL values in a column ● minimum/maximum value in a column ● average data size for a column Cost calculation includes: ● CPU ● Memory ● Network I/O
  • 17. Benchmark results ● on average 7x improvement vs EMR Presto ● EMR Presto cannot execute many TPC-DS queries ● All TPC-DS queries pass on Starburst Presto https://www.starburstdata.com/presto-aws/
  • 18. Recent CBO enhancements ● Deciding on semi-join distribution type based on cost ● Support for outer joins ● Capping a broadcasted table size ● Various minor fixes in cardinality estimation ● ANALYZE table (native in Presto) ● Stats for AWS Glue Catalog (exclusive from Starburst)
  • 20. What’s next for Optimizer ● Stats support ○ Improved stats for Hive ○ Stats for DBMS connectors and NoSQL connectors ○ Tolerate missing / incomplete stats ● Core CBO enhancements ○ Cost more operators ○ Adjust cost model weights based on the hardware ○ Adaptive optimizations ○ Introduce Traits ● Involve connectors in optimizations
  • 21. Involving Connectors in Optimization
  • 22. History and Current State ● Original motivation: partition pruning for queries over Hive tables ● Simple range predicates and nullability checks passed to connectors. Modeled as TupleDomain ((col0 BETWEEN ? AND ?) OR (col0 BETWEEN ? and ?) OR …)) AND ((col1 BETWEEN ? AND ?) OR (col1 BETWEEN ? and ?) OR …)) AND ...
  • 23. History and Current State ● Partial evaluation of non-trivial expressions ○ Bind only known variables ○ Result in "true/false/null" or "can't tell”. E.g., f(a, b) := lower(a) LIKE ‘john%’ AND b = 1 f(‘Mary’, ?) → false → can prune f(‘John S’, ?) → b = 1 → ¯_(ツ)_/¯
  • 24. Beyond Simple Filter Pushdown... ● Dereference expressions. E.g., x.a > 5 ● Array/map subscript. E.g., a[‘key’] = 10 ● Complex filters and projections ● Aggregations ● Joins ● Limit: https://github.com/prestosql/presto/pull/421 ● Sampling ● Others… https://github.com/prestosql/presto/issues/18
  • 25. A B C D E F G A C’ B’ D E F G B C ? ? C’ B’? ? Pattern Result Rule 1
  • 26. A C’ E’ D B’’ F G B E ? E’ B’ ? Pattern Result A C’ B’ D E F G Rule 2
  • 27. Filter x.f > 5 AND y LIKE ‘a%b’ Scan(t_0) Filter z > 5 Scan(t_1) FilterIntoScan Rule Connector.applyFilter(...) Table: t_0 Filter: x.f > 5 AND y LIKE ‘a%b’ Table t x :: row(f bigint, g bigint) y :: varchar(10) Derived Table: t_1 Filter: z > 5 [z :: bigint] SELECT count(*) FROM t WHERE x.f > 5 AND y LIKE ‘a%b’
  • 28. New Connector APIs applyFilter(ConnectorTableHandle table, Expression filter) applyLimit(ConnectorTableHandle table, long limit) applyAggregation(ConnectorTableHandle table, List<Aggregation> aggregates) applySampling(ConnectorTableHandle table, double samplingRate) ...
  • 29. Performance Benefits (?) ● Better support for sophisticated backend systems ○ Druid, Pinot, ElasticSearch ○ SQL databases ● Improved performance for columnar data formats (Parquet, ORC)
  • 32. Project Roadmap ● Coordinator HA ● Kubernetes ● Dynamic filtering ● Connectors ○ Phoenix ○ Iceberg ○ Druid ● TIMESTAMP semantics ● And more… https://github.com/prestosql/presto/labels/roadmap
  • 33. Getting Involved ● Join us on Slack ○ Invite link: https://prestosql.io/community.html ● Github: https://github.io/prestosql/presto ● Website: https://prestosql.io