SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
1
Presto PaaS offering in GCP
Ashish Tadose
Reetika Agrawal
Siddharth Maloo
2
Agenda
• Data stores @ Walmart Labs
• Need for Presto PaaS offering
• Presto in GCP
• Presto deployment & auto-scaling
• Authentication & Authorization
• Monitoring
• Best practices and tuning
Footer
3
Data stores @ Walmart Labs
Access needs are varied from team to team – one solution does not fit all….
4
Motivation for Presto..
• DataLake cluster - powered by on-prem Hadoop/HDFS
• Compute storage colocation – GOOD
• Need to ingest data from all diverse sources – CHALLENGING
• Scaling out compute with growing needs – CHALLENGING
• Need to separate storage & compute / support federated query capability – PRESTO..
• Isolated clusters in private cloud powering dedicated data-marts
Datajourney
5
Presto & Alluxio
Works well together…
Small range query response time
Lower is better
Large scan query response time
Lower is better
Concurrency
Higher is better
Presto Presto + Alluxio
6
• Leverage public clouds – GCP
• Better scalability & Effective cluster utilization by auto-scaling
• Performant query response times
• Security
– Authentication – LDAP
– Authorization – work with existing policies
• Handle sensitive data – encryption at rest & over the wire
• Efficient Monitoring & alerting
PaaS offering - requirements
7
• Cloud DataProc init scripts or optional image -
https://cloud.google.com/dataproc/docs/tutorials/presto-dataproc
– Super easy to spawn Presto cluster
– Elevated cost due to managed services such as DataProc
– Overhead of additional Hadoop components
– Difficult to source new catalog or deploy config changes
• Alluxio – no GCP managed deployment
• Presto-admin – can be used deployment and configuration not auto-scaling
• Need for lower level deployment strategy
Presto on GCP
8
• Open source Presto data-ops - https://github.com/takari/presto-dataops
• Framework to deploy and auto-scale Presto cluster in GCP
• Leverages ansible & GCP deployment manager
• Auto-scaling via configurable cluster wide CPU & memory usage threshold
• Our recent changes – will be released soon
– Alluxio deployment co-located with Presto workers
– Efficient configurability – suitable for multiple envs
– More auto-scaling configs
GCP presto auto-deployment
9
Presto Alluxio – overall stack
10
• Access interface
– Presto cli
– Lower level rest end points
– Presto jdbc
• Coordinator LDAP Auth
– authorization based AD group membership
– Consider using firewall to limit access to the HTTP endpoint
– Wrap presto cli to use HTTPS end points
• Hadoop backed Hive catalog
– Kerberos authentication and leverage impersonation fine grained authorization and auditing
– Known issue - Hive metastore impersonation is not supported requires presto explicit read access on file system
Presto authentication
11
• Caching enabled for ranger policy
• Ranger policy check for authorization happens before query parsing phase
• Ranger policy check is done against username and unix groups both.
Presto ranger authorization
12
Hive metastore and Alluxio mounting
13
• On prem monitoring - Prometheus & Grafana
• GCP stack driver integration
• GCP Stackdriver Presto MBeans integration issue
Presto monitoring
14
• Increase ulimit – setup higher open file descriptor limit
• Hive metastore availability is critical
– Consider configuring multiple Hive metastore instance
• Enable Hive metastore caching
– Critical to avoid query failure hive metastore connection timeouts
Challenges faced @ scale
15
System catalog - https://prestosql.io/docs/current/connector/system.html
• system.runtime.queries - information about currently running queries
– If queries are spending more time in queued/analysis state
– Identify long running hogging cluster resources.
• system.runtime.tasks - info about how many rows and bytes processed by each task
– Identify query performance issues by looking at split and data distribution across nodes
– Identify parallelism issues for shuffle intensive queries
Presto monitoring – key metrics
16
JMX catalog - https://prestosql.io/docs/current/connector/jmx.html
Ability to query all MBeans from all nodes in cluster.
• jmx.current."java.lang:type=memory"
Utilization of memory, heap & off-heap etc
• jmx.current."java.lang:type=operatingsystem"
Utilization of swap size, file descriptors, cpu load avg etc
Presto monitoring – key metrics
17
• ORC compression – ZLIB
– Point to point queries performs well for snappy
– Large aggregation ZLIB is better
• Enable bloom filter on frequently used columns in filters
• Enable sorting on frequently used columns (boost query perf on the cost of higher ingestion time )
• Increase ORC stripe & stride size
– ORC files are splittable on a stripe level thus affects parallelism.
– We observed 18%-22% increased in presto parallelism (after setting stripe size = 128Mb and index stride = 16k)
• Enable Table & column stats (Most important )
– Now stats can be computed via Presto - https://prestosql.io/docs/current/sql/analyze.html
ORC storage recommendations
18
THANKS!
We are HIRING..
18

Más contenido relacionado

La actualidad más candente

Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Toolsbotsplash.com
 
Column and hadoop
Column and hadoopColumn and hadoop
Column and hadoopAlex Jiang
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElasticsearch
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagationRegunath B
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesLeandro Totino Pereira
 
Spark, Tachyon and Mesos internals
Spark, Tachyon and Mesos internalsSpark, Tachyon and Mesos internals
Spark, Tachyon and Mesos internalsClaudiu Barbura
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsAshnikbiz
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...HostedbyConfluent
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB
 
Automating EDB Postgres using Ansible by Sameer Kumar - Senior Solution Archi...
Automating EDB Postgres using Ansible by Sameer Kumar - Senior Solution Archi...Automating EDB Postgres using Ansible by Sameer Kumar - Senior Solution Archi...
Automating EDB Postgres using Ansible by Sameer Kumar - Senior Solution Archi...Ashnikbiz
 
Building near real-time HTAP solutions using Synapse Link for Azure Cosmos DB
Building near real-time HTAP solutions using Synapse Link for Azure Cosmos DBBuilding near real-time HTAP solutions using Synapse Link for Azure Cosmos DB
Building near real-time HTAP solutions using Synapse Link for Azure Cosmos DBTimothy McAliley
 
Distributed Data Systems
Distributed Data SystemsDistributed Data Systems
Distributed Data SystemsJared Kerim
 
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyondRakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyondBaiji He
 

La actualidad más candente (20)

Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
Column and hadoop
Column and hadoopColumn and hadoop
Column and hadoop
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
 
Spark, Tachyon and Mesos internals
Spark, Tachyon and Mesos internalsSpark, Tachyon and Mesos internals
Spark, Tachyon and Mesos internals
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
 
Automating EDB Postgres using Ansible by Sameer Kumar - Senior Solution Archi...
Automating EDB Postgres using Ansible by Sameer Kumar - Senior Solution Archi...Automating EDB Postgres using Ansible by Sameer Kumar - Senior Solution Archi...
Automating EDB Postgres using Ansible by Sameer Kumar - Senior Solution Archi...
 
Building near real-time HTAP solutions using Synapse Link for Azure Cosmos DB
Building near real-time HTAP solutions using Synapse Link for Azure Cosmos DBBuilding near real-time HTAP solutions using Synapse Link for Azure Cosmos DB
Building near real-time HTAP solutions using Synapse Link for Azure Cosmos DB
 
Distributed Data Systems
Distributed Data SystemsDistributed Data Systems
Distributed Data Systems
 
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyondRakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
 

Similar a Presto PaaS GCP Offering

Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Shubham Tagra
 
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Alluxio, Inc.
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
messaging.pptx
messaging.pptxmessaging.pptx
messaging.pptxNParakh1
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasusacelyc1112009
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?DoiT International
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityOSSCube
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera ClusterAbdul Manaf
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookTreasure Data, Inc.
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data storesColin Charles
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
REST Api Tips and Tricks
REST Api Tips and TricksREST Api Tips and Tricks
REST Api Tips and TricksMaksym Bruner
 

Similar a Presto PaaS GCP Offering (20)

Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
 
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
messaging.pptx
messaging.pptxmessaging.pptx
messaging.pptx
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasus
 
Cassandra
CassandraCassandra
Cassandra
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High Availability
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data stores
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
REST Api Tips and Tricks
REST Api Tips and TricksREST Api Tips and Tricks
REST Api Tips and Tricks
 

Último

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Último (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Presto PaaS GCP Offering

  • 1. 1 Presto PaaS offering in GCP Ashish Tadose Reetika Agrawal Siddharth Maloo
  • 2. 2 Agenda • Data stores @ Walmart Labs • Need for Presto PaaS offering • Presto in GCP • Presto deployment & auto-scaling • Authentication & Authorization • Monitoring • Best practices and tuning Footer
  • 3. 3 Data stores @ Walmart Labs Access needs are varied from team to team – one solution does not fit all….
  • 4. 4 Motivation for Presto.. • DataLake cluster - powered by on-prem Hadoop/HDFS • Compute storage colocation – GOOD • Need to ingest data from all diverse sources – CHALLENGING • Scaling out compute with growing needs – CHALLENGING • Need to separate storage & compute / support federated query capability – PRESTO.. • Isolated clusters in private cloud powering dedicated data-marts Datajourney
  • 5. 5 Presto & Alluxio Works well together… Small range query response time Lower is better Large scan query response time Lower is better Concurrency Higher is better Presto Presto + Alluxio
  • 6. 6 • Leverage public clouds – GCP • Better scalability & Effective cluster utilization by auto-scaling • Performant query response times • Security – Authentication – LDAP – Authorization – work with existing policies • Handle sensitive data – encryption at rest & over the wire • Efficient Monitoring & alerting PaaS offering - requirements
  • 7. 7 • Cloud DataProc init scripts or optional image - https://cloud.google.com/dataproc/docs/tutorials/presto-dataproc – Super easy to spawn Presto cluster – Elevated cost due to managed services such as DataProc – Overhead of additional Hadoop components – Difficult to source new catalog or deploy config changes • Alluxio – no GCP managed deployment • Presto-admin – can be used deployment and configuration not auto-scaling • Need for lower level deployment strategy Presto on GCP
  • 8. 8 • Open source Presto data-ops - https://github.com/takari/presto-dataops • Framework to deploy and auto-scale Presto cluster in GCP • Leverages ansible & GCP deployment manager • Auto-scaling via configurable cluster wide CPU & memory usage threshold • Our recent changes – will be released soon – Alluxio deployment co-located with Presto workers – Efficient configurability – suitable for multiple envs – More auto-scaling configs GCP presto auto-deployment
  • 9. 9 Presto Alluxio – overall stack
  • 10. 10 • Access interface – Presto cli – Lower level rest end points – Presto jdbc • Coordinator LDAP Auth – authorization based AD group membership – Consider using firewall to limit access to the HTTP endpoint – Wrap presto cli to use HTTPS end points • Hadoop backed Hive catalog – Kerberos authentication and leverage impersonation fine grained authorization and auditing – Known issue - Hive metastore impersonation is not supported requires presto explicit read access on file system Presto authentication
  • 11. 11 • Caching enabled for ranger policy • Ranger policy check for authorization happens before query parsing phase • Ranger policy check is done against username and unix groups both. Presto ranger authorization
  • 12. 12 Hive metastore and Alluxio mounting
  • 13. 13 • On prem monitoring - Prometheus & Grafana • GCP stack driver integration • GCP Stackdriver Presto MBeans integration issue Presto monitoring
  • 14. 14 • Increase ulimit – setup higher open file descriptor limit • Hive metastore availability is critical – Consider configuring multiple Hive metastore instance • Enable Hive metastore caching – Critical to avoid query failure hive metastore connection timeouts Challenges faced @ scale
  • 15. 15 System catalog - https://prestosql.io/docs/current/connector/system.html • system.runtime.queries - information about currently running queries – If queries are spending more time in queued/analysis state – Identify long running hogging cluster resources. • system.runtime.tasks - info about how many rows and bytes processed by each task – Identify query performance issues by looking at split and data distribution across nodes – Identify parallelism issues for shuffle intensive queries Presto monitoring – key metrics
  • 16. 16 JMX catalog - https://prestosql.io/docs/current/connector/jmx.html Ability to query all MBeans from all nodes in cluster. • jmx.current."java.lang:type=memory" Utilization of memory, heap & off-heap etc • jmx.current."java.lang:type=operatingsystem" Utilization of swap size, file descriptors, cpu load avg etc Presto monitoring – key metrics
  • 17. 17 • ORC compression – ZLIB – Point to point queries performs well for snappy – Large aggregation ZLIB is better • Enable bloom filter on frequently used columns in filters • Enable sorting on frequently used columns (boost query perf on the cost of higher ingestion time ) • Increase ORC stripe & stride size – ORC files are splittable on a stripe level thus affects parallelism. – We observed 18%-22% increased in presto parallelism (after setting stripe size = 128Mb and index stride = 16k) • Enable Table & column stats (Most important ) – Now stats can be computed via Presto - https://prestosql.io/docs/current/sql/analyze.html ORC storage recommendations