SlideShare a Scribd company logo
1 of 40
Download to read offline
HTAP By AccidentHTAP By Accident
Confidential & Proprietary©Swarm64 AS, 2019 2
There's Gold In Them Thar Hills
"The stone age did not end for the lack of stone,
and the oil age will end long before the world runs out of oil."
Organizations store lots of transactional data.
They want to derive value from that data.
Fast.
Confidential & Proprietary©Swarm64 AS, 2019 3
From OLTP To HTAP
Data engineering today:
driven by the right tool for the right job
Elasticsearch for search / relevance
Hadoop for Map-Reduce
CouchDB and MongoDB for document stores
Tableau for BI analytics and visualization
Why?
Confidential & Proprietary©Swarm64 AS, 2019 4
The Vs Of Big Data
Value
Veracity
VelocityVolume
Variety
DBAs & Analysts: One DB, Two Perspectives
Confidential & Proprietary©Swarm64 AS, 2019 6
DBAs
Volume & Velocity
Care about
Install & configure databases
Monitor performance
Do capacity planning
They
Confidential & Proprietary©Swarm64 AS, 2019 7
Analysts
Veracity & ValueSolve business problems by learning from data
Convert data into information
Help businesses make better decision using
information
Care aboutThey
Confidential & Proprietary©Swarm64 AS, 2019 8
HTAP By Accident
PA UL A DA M S
VP Engineering
S E B A S T I A N DRE S S LE R
Team Lead Solution Engineering
Confidential & Proprietary©Swarm64 AS, 2019 9
The DBAs Benchmark
Standardized & comparable: TPC-H*
DWH benchmark
22 Queries
8 Tables (2 fact, 6 dimension)
Various scale factors (up to PB)
Run single or multiple streams
(*) TPC-DS is newer but there are fewer (semi-)official data points to compare to
Confidential & Proprietary©Swarm64 AS, 2019 10
Setup
Hardware
Dual Intel Xeon Gold 6140
384GB RAM
8x960 GB SSD
Software
CentOS 7.6
PostgreSQL 11.3
Swarm64 DA 2.0
TPC-H
1 TB worth of data
Biggest tables: 6bn & 1.5bn rows
Configurations
Single Node
1 Coordinator + 2 Data Nodes
2 Coordinators + 2 Data Nodes
Same as above + Swarm64 DA
Part 1: Postgres for Analytics (Single
Instance)
Confidential & Proprietary©Swarm64 AS, 2019 12
Postgres & Analytics
Postgres
One of the world's most trusted and powerful databases
Maturity built on decades of community-driven development
Well-respected for OLTP workloads
Very capable in HTAP and OLAP
Well, there is a "but"...
Let's Look At Some Analytics Queries
Confidential & Proprietary©Swarm64 AS, 2019 14
Queries By Example: TPC-H Q6
SELECT
SUM(l_extendedprice * l_discount) AS revenue
FROM
lineitem
WHERE l_shipdate >= DATE '1993-01-01'
AND l_shipdate < DATE '1993-01-01' + INTERVAL '1' YEAR
AND l_discount BETWEEN 0.05 - 0.01 AND 0.05 + 0.01
AND l_quantity < 24;
Scanning
> Parallelism helps, yet limited
> Indices may cause scatter-gather
Statistics help
> They narrow selectivity
Typical runtime: 10min
Confidential & Proprietary©Swarm64 AS, 2019 15
Queries By Example: TPC-H Q12
SELECT l_shipmode,
SUM(CASE WHEN o_orderpriority = '1-URGENT'
OR o_orderpriority = '2-HIGH'
THEN 1 ELSE 0 END)AS high_line_count,
SUM(CASE WHEN o_orderpriority <> '1-URGENT'
AND o_orderpriority <> '2-HIGH'
THEN 1 ELSE 0 END) AS low_line_count
FROM
orders, lineitem
WHERE o_orderkey = l_orderkey
AND l_shipmode IN ('TRUCK', 'AIR')
AND l_commitdate < l_receiptdate
AND l_shipdate < l_commitdate
AND l_receiptdate >= DATE '1996-01-01'
AND l_receiptdate < DATE '1996-01-01' + INTERVAL '1' YEAR
GROUP BY l_shipmode
ORDER BY l_shipmode;
Typical runtime: >10min
Expensive finalization
> Early data reduction is key
JOIN
> May scatter-gather
Filtering
> Data reduction point #1
> Parallelism helps
GROUP BY
> Data reduction point #2
Demo Time
Confidential & Proprietary©Swarm64 AS, 2019 17
Q6 Single Node
The Analyst's Point Of View
Confidential & Proprietary©Swarm64 AS, 2019 19
Analytics On Real World Datasets
Question
How much more generous are
passengers being picked up at The Dead
Rabbit than those being dropped off?
NYC Taxi
Billions of rows
Pickup location, drop-off location, tip, fare, ...
Confidential & Proprietary©Swarm64 AS, 2019 20
The Analyst's Query
SELECT
(outbound.tip - inbound.tip) / inbound.tip * 100
AS generosity_increase
FROM (
SELECT AVG(tip_amount / fare_amount) AS tip
FROM trip_dropoff_locations
JOIN trips ON trip_dropoff_locations.id = trips.id
WHERE (trip_dropoff_locations.longitude
BETWEEN -74.0114803745 AND -74.0105174615)
AND (trip_dropoff_locations.latitude
BETWEEN 40.7030212228 AND 40.7032184606)
AND tip_amount != 0
AND fare_amount != 0) inbound
CROSS JOIN (
SELECT AVG(tip_amount / fare_amount) AS tip
FROM trip_pickup_locations
JOIN trips ON trip_pickup_locations.id = trips.id
WHERE (trip_pickup_locations.longitude
BETWEEN -74.0114803745 AND -74.0105174615)
AND (trip_pickup_locations.latitude
BETWEEN 40.7030212228 AND 40.7032184606)
AND tip_amount != 0
AND fare_amount != 0) outbound;
Range-based scan
> Index helps to reduce data volume
JOIN
> May scatter-gather
Serial repetition
> Similar query, executes in sequence
Filtering
> Reduces data volume
Confidential & Proprietary©Swarm64 AS, 2019 21
Analytics On Real World Datasets
SELECT
(outbound.tip - inbound.tip) / inbound.tip * 100
AS generosity_increase
FROM (
SELECT AVG(tip_amount / fare_amount) AS tip
FROM trip_dropoff_locations
JOIN trips ON trip_dropoff_locations.id = trips.id
WHERE (trip_dropoff_locations.longitude
BETWEEN -74.0114803745 AND -74.0105174615)
AND (trip_dropoff_locations.latitude
BETWEEN 40.7030212228 AND 40.7032184606)
AND tip_amount != 0
AND fare_amount != 0) inbound
CROSS JOIN (
SELECT AVG(tip_amount / fare_amount) AS tip
FROM trip_pickup_locations
JOIN trips ON trip_pickup_locations.id = trips.id
WHERE (trip_pickup_locations.longitude
BETWEEN -74.0114803745 AND -74.0105174615)
AND (trip_pickup_locations.latitude
BETWEEN 40.7030212228 AND 40.7032184606)
AND tip_amount != 0
AND fare_amount != 0) outbound;
1min 56s
Part 2: Postgres for Analytics (Scaled-Out)
Confidential & Proprietary©Swarm64 AS, 2019 23
Scale-Out: When & Why?
Separation of concerns
I/O intensive vs. CPU intensive
Bottlenecks on your single node
I/O
CPU
RAM
Why is that?
Data grows
More concurrent users
More demanding queries
Confidential & Proprietary©Swarm64 AS, 2019 24
Scale-Out: TypicalApproach
Coordinator
Table metadata
Node to connect to & query on
Does the compute intesive part
Data Nodes
Table data
Nodes where coordinators connect to
Performs scanning and other I/O operations (e.g. filtering)
Confidential & Proprietary©Swarm64 AS, 2019 25
Scale-Out: "3rd-Party Options"
(Patroni)
... there are more.
Confidential & Proprietary©Swarm64 AS, 2019 26
Scale-Out With PG Tools
Native scale-out
Use the Postgres Foreign Data Wrapper extension (postgres_fdw)
Coordinator tables are postgres_fdw tables
Connect to data nodes where the data is actually located
Use partitions for parallelism
Pitfall
postgres_fdw not parallelized out-of-the-box, needs a patch
Confidential & Proprietary©Swarm64 AS, 2019 27
Q6 2 Coordinator + 2 Data Nodes
Confidential & Proprietary©Swarm64 AS, 2019 28
Analytics On Real World Datasets
Same query, better result?
Split computation & I/O
Data gathering on data nodes
Final computation is done on the
coordinator
1min 46s
Part 3: Software & Hardware Acceleration
Confidential & Proprietary©Swarm64 AS, 2019 30
Tuning Postgres Data For Analytics
Add Indices
They help you on point-lookups, range
queries, full text search, ...
Upside
Access data faster
Downside
They cost extra storage & CPU
They can cause non-optimal I/O patterns
Decide for fast reads over fast writes
Add Partitions
Mostly reduce data on range queries by
selecting the right partition
Upside
Lower data volume & better maintenance
(partitions can be plugged out)
Downside
Parallelism might be limited
Changing the partition scheme might be
hard
Are There Any Other Options?
Confidential & Proprietary©Swarm64 AS, 2019 32
How To Increase Parallelism?
Postgres limits parallelism...
to prevent resource over-allocation
to ensure transactional safety, even on a highly loaded system
Patched postgres_fdw
Parallelize scans on remote tables for higher throughput
Workload management
Determine and assign resources prior to query execution
Monitor system state to acknowledge change
Query rewriting
Transform query plans to be executed more efficiently
Confidential & Proprietary©Swarm64 AS, 2019
Optimized Columns
ROW- / COLUMN-HYBRID BLOCKS
UP TO 3 RANGE-INDICES
I/O transfer from storage device
WHERE
ws_order_number
BETWEEN 150
AND 15000
AND
ws_sold_date_sk
BETWEEN 2450820
AND 2452000
WHERE
ws_order_number
BETWEEN 150
AND 15000
AND
ws_sold_date_sk
BETWEEN 2450820
AND 2452000
Confidential & Proprietary©Swarm64 AS, 2019
Decompress Pick RowsPick Columns Result
FROM SELECT
Parallel Plan Optimized Columns
WHERE
Executedon the HW Accelerator
WHERE
34
+ HardwareAcceleration
Confidential & Proprietary©Swarm64 AS, 2019 35
Q6 Side-By-Side
Confidential & Proprietary©Swarm64 AS, 2019 36
Q6 2 Coordinator + 2 Data Nodes + HWAcceleration
Confidential & Proprietary©Swarm64 AS, 2019 37
Analytics On Real World Datasets
Same query, best result?
Split computation & I/O
Data gathering on data nodes
Final computation is done on the
coordinator
Plus
Higher scan & filter parallelism
Higher throughput due to compression
21s
Confidential & Proprietary©Swarm64 AS, 2019 38
Conclusions
Postgres is great for analytics!
Single node performs well
Postgres can scale-out natively
Hardware and software optimizations allow for greater parallelism and higher throughput
Postgres can be a true analytics engine
Confidential & Proprietary©Swarm64 AS, 2019 39
Got Questions?
Come and find us in the exhibitor area
PA UL A DA M S
VP Engineering
paul.adams@swarm64.co
m
@theRealPAdams
S E B A S T I A N DRE S S LE R
Team Lead Solution Engineering
sebastian@swarm64.com
@theDressler
info@swarm64.com
Follow us:
©Swarm64 AS, 2019

More Related Content

What's hot

Preview of the EDB Postgres Roadmap
Preview of the EDB Postgres RoadmapPreview of the EDB Postgres Roadmap
Preview of the EDB Postgres RoadmapEDB
 
Best Practices for Monitoring Postgres
Best Practices for Monitoring Postgres Best Practices for Monitoring Postgres
Best Practices for Monitoring Postgres EDB
 
No Time to Waste: Migrate from Oracle to Postgres in Minutes
No Time to Waste: Migrate from Oracle to Postgres in MinutesNo Time to Waste: Migrate from Oracle to Postgres in Minutes
No Time to Waste: Migrate from Oracle to Postgres in MinutesEDB
 
New Strategies for Database Modernization
New Strategies for Database ModernizationNew Strategies for Database Modernization
New Strategies for Database ModernizationEDB
 
Automating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleAutomating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleEDB
 
Public Sector Virtual Town Hall
Public Sector Virtual Town HallPublic Sector Virtual Town Hall
Public Sector Virtual Town HallEDB
 
No Time to Waste: Migrate from Oracle to EDB Postgres in Minutes
No Time to Waste: Migrate from Oracle to EDB Postgres in MinutesNo Time to Waste: Migrate from Oracle to EDB Postgres in Minutes
No Time to Waste: Migrate from Oracle to EDB Postgres in MinutesEDB
 
PostgreSQL to Accelerate Innovation
PostgreSQL to Accelerate InnovationPostgreSQL to Accelerate Innovation
PostgreSQL to Accelerate InnovationEDB
 
New Enterprise Cloud Database Options for 2019
New Enterprise Cloud Database Options for 2019New Enterprise Cloud Database Options for 2019
New Enterprise Cloud Database Options for 2019EDB
 
Running Highly Available Postgres Databases in Containers
Running Highly Available Postgres Databases in ContainersRunning Highly Available Postgres Databases in Containers
Running Highly Available Postgres Databases in ContainersEDB
 
Discover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLDiscover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLEDB
 
Delivering Data Science to the Business
Delivering Data Science to the BusinessDelivering Data Science to the Business
Delivering Data Science to the BusinessDataWorks Summit
 
Migrate Today: Proactive Steps to Unhook from Oracle
Migrate Today: Proactive Steps to Unhook from OracleMigrate Today: Proactive Steps to Unhook from Oracle
Migrate Today: Proactive Steps to Unhook from OracleEDB
 
EDB & ELOS Technologies - Break Free from Oracle
EDB & ELOS Technologies - Break Free from OracleEDB & ELOS Technologies - Break Free from Oracle
EDB & ELOS Technologies - Break Free from OracleEDB
 
New Approaches to Integrating Oracle and Postgres Database Strategies
New Approaches to Integrating Oracle and Postgres Database StrategiesNew Approaches to Integrating Oracle and Postgres Database Strategies
New Approaches to Integrating Oracle and Postgres Database StrategiesEDB
 
Reducing Database Pain & Costs with Postgres
Reducing Database Pain & Costs with PostgresReducing Database Pain & Costs with Postgres
Reducing Database Pain & Costs with PostgresEDB
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Not all open source is the same
Not all open source is the sameNot all open source is the same
Not all open source is the sameEDB
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
Doing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database AdoptionDoing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database AdoptionEDB
 

What's hot (20)

Preview of the EDB Postgres Roadmap
Preview of the EDB Postgres RoadmapPreview of the EDB Postgres Roadmap
Preview of the EDB Postgres Roadmap
 
Best Practices for Monitoring Postgres
Best Practices for Monitoring Postgres Best Practices for Monitoring Postgres
Best Practices for Monitoring Postgres
 
No Time to Waste: Migrate from Oracle to Postgres in Minutes
No Time to Waste: Migrate from Oracle to Postgres in MinutesNo Time to Waste: Migrate from Oracle to Postgres in Minutes
No Time to Waste: Migrate from Oracle to Postgres in Minutes
 
New Strategies for Database Modernization
New Strategies for Database ModernizationNew Strategies for Database Modernization
New Strategies for Database Modernization
 
Automating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleAutomating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with Ansible
 
Public Sector Virtual Town Hall
Public Sector Virtual Town HallPublic Sector Virtual Town Hall
Public Sector Virtual Town Hall
 
No Time to Waste: Migrate from Oracle to EDB Postgres in Minutes
No Time to Waste: Migrate from Oracle to EDB Postgres in MinutesNo Time to Waste: Migrate from Oracle to EDB Postgres in Minutes
No Time to Waste: Migrate from Oracle to EDB Postgres in Minutes
 
PostgreSQL to Accelerate Innovation
PostgreSQL to Accelerate InnovationPostgreSQL to Accelerate Innovation
PostgreSQL to Accelerate Innovation
 
New Enterprise Cloud Database Options for 2019
New Enterprise Cloud Database Options for 2019New Enterprise Cloud Database Options for 2019
New Enterprise Cloud Database Options for 2019
 
Running Highly Available Postgres Databases in Containers
Running Highly Available Postgres Databases in ContainersRunning Highly Available Postgres Databases in Containers
Running Highly Available Postgres Databases in Containers
 
Discover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLDiscover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQL
 
Delivering Data Science to the Business
Delivering Data Science to the BusinessDelivering Data Science to the Business
Delivering Data Science to the Business
 
Migrate Today: Proactive Steps to Unhook from Oracle
Migrate Today: Proactive Steps to Unhook from OracleMigrate Today: Proactive Steps to Unhook from Oracle
Migrate Today: Proactive Steps to Unhook from Oracle
 
EDB & ELOS Technologies - Break Free from Oracle
EDB & ELOS Technologies - Break Free from OracleEDB & ELOS Technologies - Break Free from Oracle
EDB & ELOS Technologies - Break Free from Oracle
 
New Approaches to Integrating Oracle and Postgres Database Strategies
New Approaches to Integrating Oracle and Postgres Database StrategiesNew Approaches to Integrating Oracle and Postgres Database Strategies
New Approaches to Integrating Oracle and Postgres Database Strategies
 
Reducing Database Pain & Costs with Postgres
Reducing Database Pain & Costs with PostgresReducing Database Pain & Costs with Postgres
Reducing Database Pain & Costs with Postgres
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Not all open source is the same
Not all open source is the sameNot all open source is the same
Not all open source is the same
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Doing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database AdoptionDoing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database Adoption
 

Similar to HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration

DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Certus Solutions
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub ImplementationOracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub ImplementationVengata Guruswamy
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everythingLew Tucker
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Carol McDonald
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFAmazon Web Services
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...nimak
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseAtScale
 

Similar to HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration (20)

DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub ImplementationOracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub Implementation
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure Synapse
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 

More from EDB

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenEDB
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube EDB
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EDB
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLEDB
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLEDB
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?EDB
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresEDB
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLEDB
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!EDB
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesEDB
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoEDB
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJEDB
 

More from EDB (20)

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQL
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQL
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJ
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos données
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - Italiano
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration

  • 1. HTAP By AccidentHTAP By Accident
  • 2. Confidential & Proprietary©Swarm64 AS, 2019 2 There's Gold In Them Thar Hills "The stone age did not end for the lack of stone, and the oil age will end long before the world runs out of oil." Organizations store lots of transactional data. They want to derive value from that data. Fast.
  • 3. Confidential & Proprietary©Swarm64 AS, 2019 3 From OLTP To HTAP Data engineering today: driven by the right tool for the right job Elasticsearch for search / relevance Hadoop for Map-Reduce CouchDB and MongoDB for document stores Tableau for BI analytics and visualization Why?
  • 4. Confidential & Proprietary©Swarm64 AS, 2019 4 The Vs Of Big Data Value Veracity VelocityVolume Variety
  • 5. DBAs & Analysts: One DB, Two Perspectives
  • 6. Confidential & Proprietary©Swarm64 AS, 2019 6 DBAs Volume & Velocity Care about Install & configure databases Monitor performance Do capacity planning They
  • 7. Confidential & Proprietary©Swarm64 AS, 2019 7 Analysts Veracity & ValueSolve business problems by learning from data Convert data into information Help businesses make better decision using information Care aboutThey
  • 8. Confidential & Proprietary©Swarm64 AS, 2019 8 HTAP By Accident PA UL A DA M S VP Engineering S E B A S T I A N DRE S S LE R Team Lead Solution Engineering
  • 9. Confidential & Proprietary©Swarm64 AS, 2019 9 The DBAs Benchmark Standardized & comparable: TPC-H* DWH benchmark 22 Queries 8 Tables (2 fact, 6 dimension) Various scale factors (up to PB) Run single or multiple streams (*) TPC-DS is newer but there are fewer (semi-)official data points to compare to
  • 10. Confidential & Proprietary©Swarm64 AS, 2019 10 Setup Hardware Dual Intel Xeon Gold 6140 384GB RAM 8x960 GB SSD Software CentOS 7.6 PostgreSQL 11.3 Swarm64 DA 2.0 TPC-H 1 TB worth of data Biggest tables: 6bn & 1.5bn rows Configurations Single Node 1 Coordinator + 2 Data Nodes 2 Coordinators + 2 Data Nodes Same as above + Swarm64 DA
  • 11. Part 1: Postgres for Analytics (Single Instance)
  • 12. Confidential & Proprietary©Swarm64 AS, 2019 12 Postgres & Analytics Postgres One of the world's most trusted and powerful databases Maturity built on decades of community-driven development Well-respected for OLTP workloads Very capable in HTAP and OLAP Well, there is a "but"...
  • 13. Let's Look At Some Analytics Queries
  • 14. Confidential & Proprietary©Swarm64 AS, 2019 14 Queries By Example: TPC-H Q6 SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= DATE '1993-01-01' AND l_shipdate < DATE '1993-01-01' + INTERVAL '1' YEAR AND l_discount BETWEEN 0.05 - 0.01 AND 0.05 + 0.01 AND l_quantity < 24; Scanning > Parallelism helps, yet limited > Indices may cause scatter-gather Statistics help > They narrow selectivity Typical runtime: 10min
  • 15. Confidential & Proprietary©Swarm64 AS, 2019 15 Queries By Example: TPC-H Q12 SELECT l_shipmode, SUM(CASE WHEN o_orderpriority = '1-URGENT' OR o_orderpriority = '2-HIGH' THEN 1 ELSE 0 END)AS high_line_count, SUM(CASE WHEN o_orderpriority <> '1-URGENT' AND o_orderpriority <> '2-HIGH' THEN 1 ELSE 0 END) AS low_line_count FROM orders, lineitem WHERE o_orderkey = l_orderkey AND l_shipmode IN ('TRUCK', 'AIR') AND l_commitdate < l_receiptdate AND l_shipdate < l_commitdate AND l_receiptdate >= DATE '1996-01-01' AND l_receiptdate < DATE '1996-01-01' + INTERVAL '1' YEAR GROUP BY l_shipmode ORDER BY l_shipmode; Typical runtime: >10min Expensive finalization > Early data reduction is key JOIN > May scatter-gather Filtering > Data reduction point #1 > Parallelism helps GROUP BY > Data reduction point #2
  • 17. Confidential & Proprietary©Swarm64 AS, 2019 17 Q6 Single Node
  • 19. Confidential & Proprietary©Swarm64 AS, 2019 19 Analytics On Real World Datasets Question How much more generous are passengers being picked up at The Dead Rabbit than those being dropped off? NYC Taxi Billions of rows Pickup location, drop-off location, tip, fare, ...
  • 20. Confidential & Proprietary©Swarm64 AS, 2019 20 The Analyst's Query SELECT (outbound.tip - inbound.tip) / inbound.tip * 100 AS generosity_increase FROM ( SELECT AVG(tip_amount / fare_amount) AS tip FROM trip_dropoff_locations JOIN trips ON trip_dropoff_locations.id = trips.id WHERE (trip_dropoff_locations.longitude BETWEEN -74.0114803745 AND -74.0105174615) AND (trip_dropoff_locations.latitude BETWEEN 40.7030212228 AND 40.7032184606) AND tip_amount != 0 AND fare_amount != 0) inbound CROSS JOIN ( SELECT AVG(tip_amount / fare_amount) AS tip FROM trip_pickup_locations JOIN trips ON trip_pickup_locations.id = trips.id WHERE (trip_pickup_locations.longitude BETWEEN -74.0114803745 AND -74.0105174615) AND (trip_pickup_locations.latitude BETWEEN 40.7030212228 AND 40.7032184606) AND tip_amount != 0 AND fare_amount != 0) outbound; Range-based scan > Index helps to reduce data volume JOIN > May scatter-gather Serial repetition > Similar query, executes in sequence Filtering > Reduces data volume
  • 21. Confidential & Proprietary©Swarm64 AS, 2019 21 Analytics On Real World Datasets SELECT (outbound.tip - inbound.tip) / inbound.tip * 100 AS generosity_increase FROM ( SELECT AVG(tip_amount / fare_amount) AS tip FROM trip_dropoff_locations JOIN trips ON trip_dropoff_locations.id = trips.id WHERE (trip_dropoff_locations.longitude BETWEEN -74.0114803745 AND -74.0105174615) AND (trip_dropoff_locations.latitude BETWEEN 40.7030212228 AND 40.7032184606) AND tip_amount != 0 AND fare_amount != 0) inbound CROSS JOIN ( SELECT AVG(tip_amount / fare_amount) AS tip FROM trip_pickup_locations JOIN trips ON trip_pickup_locations.id = trips.id WHERE (trip_pickup_locations.longitude BETWEEN -74.0114803745 AND -74.0105174615) AND (trip_pickup_locations.latitude BETWEEN 40.7030212228 AND 40.7032184606) AND tip_amount != 0 AND fare_amount != 0) outbound; 1min 56s
  • 22. Part 2: Postgres for Analytics (Scaled-Out)
  • 23. Confidential & Proprietary©Swarm64 AS, 2019 23 Scale-Out: When & Why? Separation of concerns I/O intensive vs. CPU intensive Bottlenecks on your single node I/O CPU RAM Why is that? Data grows More concurrent users More demanding queries
  • 24. Confidential & Proprietary©Swarm64 AS, 2019 24 Scale-Out: TypicalApproach Coordinator Table metadata Node to connect to & query on Does the compute intesive part Data Nodes Table data Nodes where coordinators connect to Performs scanning and other I/O operations (e.g. filtering)
  • 25. Confidential & Proprietary©Swarm64 AS, 2019 25 Scale-Out: "3rd-Party Options" (Patroni) ... there are more.
  • 26. Confidential & Proprietary©Swarm64 AS, 2019 26 Scale-Out With PG Tools Native scale-out Use the Postgres Foreign Data Wrapper extension (postgres_fdw) Coordinator tables are postgres_fdw tables Connect to data nodes where the data is actually located Use partitions for parallelism Pitfall postgres_fdw not parallelized out-of-the-box, needs a patch
  • 27. Confidential & Proprietary©Swarm64 AS, 2019 27 Q6 2 Coordinator + 2 Data Nodes
  • 28. Confidential & Proprietary©Swarm64 AS, 2019 28 Analytics On Real World Datasets Same query, better result? Split computation & I/O Data gathering on data nodes Final computation is done on the coordinator 1min 46s
  • 29. Part 3: Software & Hardware Acceleration
  • 30. Confidential & Proprietary©Swarm64 AS, 2019 30 Tuning Postgres Data For Analytics Add Indices They help you on point-lookups, range queries, full text search, ... Upside Access data faster Downside They cost extra storage & CPU They can cause non-optimal I/O patterns Decide for fast reads over fast writes Add Partitions Mostly reduce data on range queries by selecting the right partition Upside Lower data volume & better maintenance (partitions can be plugged out) Downside Parallelism might be limited Changing the partition scheme might be hard
  • 31. Are There Any Other Options?
  • 32. Confidential & Proprietary©Swarm64 AS, 2019 32 How To Increase Parallelism? Postgres limits parallelism... to prevent resource over-allocation to ensure transactional safety, even on a highly loaded system Patched postgres_fdw Parallelize scans on remote tables for higher throughput Workload management Determine and assign resources prior to query execution Monitor system state to acknowledge change Query rewriting Transform query plans to be executed more efficiently
  • 33. Confidential & Proprietary©Swarm64 AS, 2019 Optimized Columns ROW- / COLUMN-HYBRID BLOCKS UP TO 3 RANGE-INDICES I/O transfer from storage device WHERE ws_order_number BETWEEN 150 AND 15000 AND ws_sold_date_sk BETWEEN 2450820 AND 2452000 WHERE ws_order_number BETWEEN 150 AND 15000 AND ws_sold_date_sk BETWEEN 2450820 AND 2452000
  • 34. Confidential & Proprietary©Swarm64 AS, 2019 Decompress Pick RowsPick Columns Result FROM SELECT Parallel Plan Optimized Columns WHERE Executedon the HW Accelerator WHERE 34 + HardwareAcceleration
  • 35. Confidential & Proprietary©Swarm64 AS, 2019 35 Q6 Side-By-Side
  • 36. Confidential & Proprietary©Swarm64 AS, 2019 36 Q6 2 Coordinator + 2 Data Nodes + HWAcceleration
  • 37. Confidential & Proprietary©Swarm64 AS, 2019 37 Analytics On Real World Datasets Same query, best result? Split computation & I/O Data gathering on data nodes Final computation is done on the coordinator Plus Higher scan & filter parallelism Higher throughput due to compression 21s
  • 38. Confidential & Proprietary©Swarm64 AS, 2019 38 Conclusions Postgres is great for analytics! Single node performs well Postgres can scale-out natively Hardware and software optimizations allow for greater parallelism and higher throughput Postgres can be a true analytics engine
  • 39. Confidential & Proprietary©Swarm64 AS, 2019 39 Got Questions? Come and find us in the exhibitor area PA UL A DA M S VP Engineering paul.adams@swarm64.co m @theRealPAdams S E B A S T I A N DRE S S LE R Team Lead Solution Engineering sebastian@swarm64.com @theDressler