SlideShare a Scribd company logo
1 of 24
Download to read offline
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Shivram Mani Francisco Guerrero
@shivram @frankgh
Maximize Greenplum
For Any Use Cases
Decoupling Compute and Storage
Cover w/ Image
Agenda
■ Enterprise Data Landscape
■ Accessing External Data from
Greenplum
■ Platform Extension Framework (PXF)
■ Use Cases
■ Q+A
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Enterprise Data Landscape
The Wild Wild West of Data
?
5
Greenplum uses PXF
as a federated query engine
to access
external heterogeneous data.
Platform Extension Framework (PXF)
Tabular view for
heterogeneous data
Built-in connectors for
various data sources/formats Pluggable framework
Parallel high throughput
data access
Open source
Read and write
external data
7
Architecture of PXF
Master Host
External Data
Segment Host
1
seg1 seg2 seg3
PXF
Segment Host
2
seg4 seg5 seg6
PXF
8
Q: How can I access sales data residing in an S3 bucket stored in parquet format?
Greenplum External Table
CREATE EXTERNAL TABLE sales
(cust int, sku text, amount decimal, date date)
LOCATION
('pxf://s3-bucket/2018/sales/?PROFILE=s3:parquet&SERVER=s3_sales')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import')
profilepath to data server
9
How can we scale performance
when querying remote data ?
Performance - Predicate Pushdown
state=NY
state=NJ
state=CA
state=CA
{state='CA'}
SELECT item, amount FROM orders
WHERE state = 'CA'MASTER
SEGMENT
predicates :
state=CA
PXF with
JDBC
Row oriented
storage format
● Predicate information
pushed to external
system
● External engines can
support predicates for
its own queries (e.g.
JDBC)
● No filtering within PXF
itself
● Partition pruning (e.g.
Hive)
Performance - Column Projection
date:
{item:,
amount:,
state='CA'}
SELECT item, amount FROM orders
WHERE state = 'CA'MASTER
SEGMENT
columns : item, amount
predicates : state=CA
aggregates : count
PXF with
Hive/ORC
Columnar
storage format
● Propagate columns
projection metadata to
external systems
● JDBC, Parquet & ORC
● Reduces Network I/O
● Reduces Remote Disk
I/O
● Improved performance
for aggregate queries
state:
amount:
item:
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Use Cases
Use Case: Multi-temperature data querying
● Storage based on
operational requirements
● Can I work with data
created few second ago ?
● Can I run a report on data
from few days ago ?
● Can I inspect the data
archived months or years
ago ?
In-Memory
Database
RDBMS
dataData Lake
HOT
DATA
WARM
DATA
COLD
DATA
14
Use Case: Elastic scaling with Greenplum
● Greenplum on K8s for
elastic compute
● Elastic storage with
S3/Azure/Google
● Ability to separate
compute from storage
● On-demand data
warehouses
15
Use Case: Access Heterogenous data on multiple
clouds
● Different cloud providers
based on business
requirements
● Low cost storage
● No storage admin
● Data doesn’t need to be
copied
16
Use Case: Access Heterogenous data on multiple
clouds
Historical_Orders
xx xx
xx xx
Historical_Invoices
xx xx
xx xx
Product_Catalog
xx xx
xx xx
Historical_Orders
xx xx
xx xx
Admin migrates data from s3-
bucket-orders to Azure Blob
Storage
SELECT * FROM historical_orders o, product_catalog p
WHERE o.product_id = p.product_id
s3-bucket-orders s3-bucket-price
Historical_Invoices
xx xx
xx xx
17
SELECT * FROM historical_orders o, product_catalog p
WHERE o.product_id = p.product_id
Use Case: Access Heterogenous data on multiple
cloud
CREATE EXTERNAL TABLE historical_orders
(item int, amount money)
LOCATION
('pxf://s3-bucket-orders/path?PROFILE=s3:parquet&SERVER=s3_orders')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
CREATE EXTERNAL TABLE historical_orders
(item int, amount money)
LOCATION
('pxf://my.azuredatalakestore.net/path?PROFILE=adl:parquet&SERVER=azure')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
Historical_orders table data on S3
Historical_orders table data now on Azure Data Lake
18
Summary
Greenplum embraces the modern data landscape
● Scale and manage compute independently from storage
● Federate queries across heterogeneous data sources
● Cloud Agnostic
Data is available for analytics with Greenplum no matter its form and where it
resides!
19
#ScaleMatters
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Cover w/ Image
Greenplum External
Table
Define an external table with the following:
● the schema of the external data
● the protocol pxf
● the location of the data in an external
system
● the profile to identify the specific connector
● The compressions_codec of the data
● the format of the external data
CREATE [READABLE|WRITABLE] EXTERNAL TABLE
table_name
( col_name data_type [,...] | LIKE other_table )
LOCATION ('pxf://<path to data>?
PROFILE=[<profile_name>|<data_store:data_type>]&
COMPRESSIONG_CODEC=[snappy|gzip|lzo|bzip2]&
[&<CUSTOM_OPTIONS>=<value>[...]]’)
FORMAT '[TEXT|CSV|CUSTOM]'
cust, sku, amount, date
1234, ABC, $9.90, 4/01
1235, CDE, $8.80, 3/30
CREATE EXTERNAL TABLE sales
(cust int, sku text, amount decimal, date date)
LOCATION
('pxf:///2018/sales.csv?PROFILE=hdfs:text')
FORMAT 'TEXT'
Cover w/ Image
PXF supports accessing multiple external datastores
simultaneously
● server identifies an external datastore
● Staging directory server/ under
${PXF_CONF}
● Contains relevant configuration files under
servers/{server_name}/
○ HDFS: core-site.xml, hdfs-site.xml, ...
○ S3: s3-site.xml containing access
properties
PXF Multi Server
CREATE [READABLE|WRITABLE] EXTERNAL TABLE
table_name
( col_name data_type [,...] | LIKE other_table )
LOCATION ('pxf://<path to data>?
PROFILE=<data_store:data_type>&
SERVER=<server_name>’)
CREATE EXTERNAL TABLE sales
(cust int, sku text, amount decimal, date date)
LOCATION ('pxf://s3-bucket-
sales/2018/sales.csv?PROFILE=s3:text&server=s3_s
ales’)
FORMAT 'TEXT'
cust, sku, amount, date
1234, ABC, $9.90, 4/01
1235, CDE, $8.80, 3/30
Performance in PXF
● Parallel access to data
● Predicate pushdown
● Column projection
23
SELECT item, amount
WHERE state = 'CA'
column projection
predicate pushdown
Performance in PXF
● Parallel access to
data
● Column Projection
● Predicate Pushdown
24
SELECT item, amount
WHERE state = 'CA'
column projection
predicate pushdown

More Related Content

What's hot

Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Amazon Web Services
 
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...IDERA Software
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Simplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptxSimplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptxssuser5faa791
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to GreenplumDave Cramer
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Architecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres ClusterArchitecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres ClusterAshnikbiz
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Precisely
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know SnowflakeKnoldus Inc.
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 

What's hot (20)

Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
 
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Simplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptxSimplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptx
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to Greenplum
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Architecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres ClusterArchitecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres Cluster
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 

Similar to Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenplum Summit 2019

DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...inside-BigData.com
 
Z Data Tools and APIs Overview
Z Data Tools and APIs OverviewZ Data Tools and APIs Overview
Z Data Tools and APIs OverviewHCLSoftware
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionTorsten Steinbach
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateMichael Rainey
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lGanesan Narayanasamy
 
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDays Riga
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platformLars Albertsson
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalAvere Systems
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL Torsten Steinbach
 
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Sizing Splunk SmartStore - Spend Less and Get More Out of SplunkSizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Sizing Splunk SmartStore - Spend Less and Get More Out of SplunkPaula Koziol
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Goutam Tadi
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 
A Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data Flow
A Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data FlowA Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data Flow
A Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data Flowjagada7
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...HostedbyConfluent
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowYohei Onishi
 
Discover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLDiscover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLEDB
 

Similar to Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenplum Summit 2019 (20)

DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
 
Z Data Tools and APIs Overview
Z Data Tools and APIs OverviewZ Data Tools and APIs Overview
Z Data Tools and APIs Overview
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2l
 
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
 
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Sizing Splunk SmartStore - Spend Less and Get More Out of SplunkSizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
A Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data Flow
A Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data FlowA Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data Flow
A Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data Flow
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Discover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLDiscover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQL
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Recently uploaded (20)

Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenplum Summit 2019

  • 1.
  • 2. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Shivram Mani Francisco Guerrero @shivram @frankgh Maximize Greenplum For Any Use Cases Decoupling Compute and Storage
  • 3. Cover w/ Image Agenda ■ Enterprise Data Landscape ■ Accessing External Data from Greenplum ■ Platform Extension Framework (PXF) ■ Use Cases ■ Q+A
  • 4. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. Enterprise Data Landscape
  • 5. The Wild Wild West of Data ? 5
  • 6. Greenplum uses PXF as a federated query engine to access external heterogeneous data.
  • 7. Platform Extension Framework (PXF) Tabular view for heterogeneous data Built-in connectors for various data sources/formats Pluggable framework Parallel high throughput data access Open source Read and write external data 7
  • 8. Architecture of PXF Master Host External Data Segment Host 1 seg1 seg2 seg3 PXF Segment Host 2 seg4 seg5 seg6 PXF 8
  • 9. Q: How can I access sales data residing in an S3 bucket stored in parquet format? Greenplum External Table CREATE EXTERNAL TABLE sales (cust int, sku text, amount decimal, date date) LOCATION ('pxf://s3-bucket/2018/sales/?PROFILE=s3:parquet&SERVER=s3_sales') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import') profilepath to data server 9
  • 10. How can we scale performance when querying remote data ?
  • 11. Performance - Predicate Pushdown state=NY state=NJ state=CA state=CA {state='CA'} SELECT item, amount FROM orders WHERE state = 'CA'MASTER SEGMENT predicates : state=CA PXF with JDBC Row oriented storage format ● Predicate information pushed to external system ● External engines can support predicates for its own queries (e.g. JDBC) ● No filtering within PXF itself ● Partition pruning (e.g. Hive)
  • 12. Performance - Column Projection date: {item:, amount:, state='CA'} SELECT item, amount FROM orders WHERE state = 'CA'MASTER SEGMENT columns : item, amount predicates : state=CA aggregates : count PXF with Hive/ORC Columnar storage format ● Propagate columns projection metadata to external systems ● JDBC, Parquet & ORC ● Reduces Network I/O ● Reduces Remote Disk I/O ● Improved performance for aggregate queries state: amount: item:
  • 13. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. Use Cases
  • 14. Use Case: Multi-temperature data querying ● Storage based on operational requirements ● Can I work with data created few second ago ? ● Can I run a report on data from few days ago ? ● Can I inspect the data archived months or years ago ? In-Memory Database RDBMS dataData Lake HOT DATA WARM DATA COLD DATA 14
  • 15. Use Case: Elastic scaling with Greenplum ● Greenplum on K8s for elastic compute ● Elastic storage with S3/Azure/Google ● Ability to separate compute from storage ● On-demand data warehouses 15
  • 16. Use Case: Access Heterogenous data on multiple clouds ● Different cloud providers based on business requirements ● Low cost storage ● No storage admin ● Data doesn’t need to be copied 16
  • 17. Use Case: Access Heterogenous data on multiple clouds Historical_Orders xx xx xx xx Historical_Invoices xx xx xx xx Product_Catalog xx xx xx xx Historical_Orders xx xx xx xx Admin migrates data from s3- bucket-orders to Azure Blob Storage SELECT * FROM historical_orders o, product_catalog p WHERE o.product_id = p.product_id s3-bucket-orders s3-bucket-price Historical_Invoices xx xx xx xx 17 SELECT * FROM historical_orders o, product_catalog p WHERE o.product_id = p.product_id
  • 18. Use Case: Access Heterogenous data on multiple cloud CREATE EXTERNAL TABLE historical_orders (item int, amount money) LOCATION ('pxf://s3-bucket-orders/path?PROFILE=s3:parquet&SERVER=s3_orders') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); CREATE EXTERNAL TABLE historical_orders (item int, amount money) LOCATION ('pxf://my.azuredatalakestore.net/path?PROFILE=adl:parquet&SERVER=azure') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); Historical_orders table data on S3 Historical_orders table data now on Azure Data Lake 18
  • 19. Summary Greenplum embraces the modern data landscape ● Scale and manage compute independently from storage ● Federate queries across heterogeneous data sources ● Cloud Agnostic Data is available for analytics with Greenplum no matter its form and where it resides! 19
  • 20. #ScaleMatters © Copyright 2019 Pivotal Software, Inc. All rights Reserved.
  • 21. Cover w/ Image Greenplum External Table Define an external table with the following: ● the schema of the external data ● the protocol pxf ● the location of the data in an external system ● the profile to identify the specific connector ● The compressions_codec of the data ● the format of the external data CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( col_name data_type [,...] | LIKE other_table ) LOCATION ('pxf://<path to data>? PROFILE=[<profile_name>|<data_store:data_type>]& COMPRESSIONG_CODEC=[snappy|gzip|lzo|bzip2]& [&<CUSTOM_OPTIONS>=<value>[...]]’) FORMAT '[TEXT|CSV|CUSTOM]' cust, sku, amount, date 1234, ABC, $9.90, 4/01 1235, CDE, $8.80, 3/30 CREATE EXTERNAL TABLE sales (cust int, sku text, amount decimal, date date) LOCATION ('pxf:///2018/sales.csv?PROFILE=hdfs:text') FORMAT 'TEXT'
  • 22. Cover w/ Image PXF supports accessing multiple external datastores simultaneously ● server identifies an external datastore ● Staging directory server/ under ${PXF_CONF} ● Contains relevant configuration files under servers/{server_name}/ ○ HDFS: core-site.xml, hdfs-site.xml, ... ○ S3: s3-site.xml containing access properties PXF Multi Server CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( col_name data_type [,...] | LIKE other_table ) LOCATION ('pxf://<path to data>? PROFILE=<data_store:data_type>& SERVER=<server_name>’) CREATE EXTERNAL TABLE sales (cust int, sku text, amount decimal, date date) LOCATION ('pxf://s3-bucket- sales/2018/sales.csv?PROFILE=s3:text&server=s3_s ales’) FORMAT 'TEXT' cust, sku, amount, date 1234, ABC, $9.90, 4/01 1235, CDE, $8.80, 3/30
  • 23. Performance in PXF ● Parallel access to data ● Predicate pushdown ● Column projection 23 SELECT item, amount WHERE state = 'CA' column projection predicate pushdown
  • 24. Performance in PXF ● Parallel access to data ● Column Projection ● Predicate Pushdown 24 SELECT item, amount WHERE state = 'CA' column projection predicate pushdown