SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Distributed Computing on
PostgreSQL
Marco Slot <marco@citusdata.com>
Small data architecture
Big data architecture
Records
Data warehouse
Real-time analytics
Big data architecture using postgres
Messaging
PostgreSQL is a perfect building block
for distributed systems
Features!
PostgreSQL contains many useful features for building a distributed system:
● Well-defined protocol, libpq
● Crash safety
● Concurrent execution
● Transactions
● Access controls
● 2PC
● Replication
● Custom functions
● …
Extensions!
Built-in / contrib:
● postgres_fdw
● dblink RPC!
● plpgsql
Third-party open source:
● pglogical
● pg_cron
● citus
Extensions!
Built-in / contrib:
● postgres_fdw
● dblink RPC!
● plpgsql
Third-party open source:
● pglogical
● pg_cron
● citus
Yours!
dblink
Run queries on remote postgres server
SELECT dblink_connect(node_id,
format('host=%s port=%s dbname=postgres', node_name, node_port))
FROM nodes;
SELECT dblink_send_query(node_id, $$SELECT pg_database_size('postgres')$$)
FROM nodes;
SELECT sum(size::bigint)
FROM nodes, dblink_get_result(nodes.node_id) AS r(size text);
SELECT dblink_disconnect(node_id)
FROM nodes;
RPC using dblink
For every postgres function, we can create a client-side stub using dblink.
CREATE FUNCTION func(input text)
...
CREATE FUNCTION remote_func(host text, port int, input text) RETURNS text
LANGUAGE sql AS $function$
SELECT res FROM dblink(
format('host=%s port=%s', host, port),
format('SELECT * FROM func(%L)', input))
AS res(output text);
$function$;
PL/pgSQL
Procedural language for Postgres:
CREATE FUNCTION distributed_database_size(dbname text)
RETURNS bigint LANGUAGE plpgsql AS $function$
DECLARE
total_size bigint;
BEGIN
PERFORM dblink_send_query(node_id, format('SELECT pg_database_size(%L)', dbname)
FROM nodes;
SELECT sum(size::bigint) INTO total_size
FROM nodes, dblink_get_result(nodes.node_id) AS r(size text);
RETURN total_size
END;
$function$;
Distributed system in progress...
With these extensions, we can already create a simple distributed computing
system.
Nodes
Nodes Nodes Nodes
Parallel operation using dblink
SELECT
transform_data()
Data 1 Data 2 Data 3
postgres_fdw?
pglogical / logical replication
Asynchronously replicate changes to another database.
Nodes
Nodes Nodes Nodes
pg_paxos
Consistently replicate changes between databases.
Nodes
Nodes
Nodes
pg_cron
Cron-based job scheduler for postgres:
CREATE EXTENSION pg_cron;
SELECT cron.schedule('* * * * */10', 'SELECT transform_data()');
Internally uses libpq, meaning it can also schedule jobs on other nodes.
pg_cron provides a way for nodes to act autonomously
Citus
Transparently shards tables across multiple nodes
Coordinator
E1 E4 E2 E5 E2 E5
Events
create_distributed_table('events',
'event_id');
Citus MX
Nodes can have the distributed tables too
Coordinator
E1 E4 E2 E5 E2 E5
Events
Events Events Events
How to build a distributed system
using only PostgreSQL & extensions?
Building a streaming publish-subscribe system
Producers
Postgres nodes
Consumers
topic: adclick
Storage nodes
E1 E4 E2 E5 E2 E5
Events Events Events
Coordinator
Events
CREATE TABLE
Use Citus to create a distributed table
Distributed Table Creation
$ psql -h coordinator
CREATE TABLE events (
event_id bigserial,
ingest_time timestamptz default now(),
topic_name text not null,
payload jsonb
);
SELECT create_distributed_table('events', 'event_id');
$ psql -h any-node
INSERT INTO events (topic_name, payload) VALUES ('adclick','{...}');
Sharding strategy
Shard is chosen by hashing the value in the partition column.
Application-defined:
● stream_id text not null
Optimise data distribution:
● event_id bigserial
Optimise ingest capacity and availability:
● sid int default pick_local_value()
Producers connect to a random node and perform COPY or INSERT into events
Producers
E1 E4 E2 E5 E2 E5
Events Events Events
COPY / INSERT
Consumers in a group together consume events at least / exactly once.
Consumers
E1 E4 E2 E5 E2 E5
topic: adclick%
Consumer
group
Consumers obtain leases for consuming a shard.
Lease are kept in a separate table on each node:
CREATE TABLE leases (
consumer_group text not null,
shard_id bigint not null,
owner text,
new_owner text,
last_heartbeat timestamptz,
PRIMARY KEY (consumer_group, shard_id)
);
Consumer leases
Consumers obtain leases for consuming a shard.
SELECT * FROM claim_lease('click-analytics', 'node-2', 102008);
Under the covers: Insert a new lease or set new_owner to steal lease.
CREATE FUNCTION claim_lease(group_name text, node_name text, shard_id int)
…
INSERT INTO leases (consumer_group, shard_id, owner, last_heartbeat)
VALUES (group_name, shard, node_name, now())
ON CONFLICT (consumer_group, shard_id) DO UPDATE
SET new_owner = node_name
WHERE leases.new_owner IS NULL;
Consumer leases
Distributing leases across consumers
Distributed algorithm for distributing leases across nodes
SELECT * FROM obtain_leases('click-analytics', 'node-2')
-- gets all available lease tables
-- claim all unclaimed shards
-- claim random shards until #claims >= #shards/#consumers
Not perfect, but ensures all shards are consumed with load balancing (unless C>S)
Consumers
E1 E4 E2 E5 E2 E5
leases
First consumer consumes all
obtain_leases
leases leases
Consumers
E1 E4 E2 E5 E2 E5
First consumer consumes all
leases leases leases
Consumers
E1 E4 E2 E5 E2 E5
Second consumer steals leases from first consumer
obtain_leases
leases leases leases
Consumers
E1 E4 E2 E5 E2 E5
Second consumer steals leases from first consumer
Consuming events
Consumer wants to receive all events once.
Several options:
● SQL level
● Logical decoding utility functions
● Use a replication connection
● PG10 logical replication / pglogical
Consuming events
Get a batch of events from a shard:
SELECT * FROM poll_events('click-analytics', 'node-2', 102008, 'adclick',
'<last-processed-event-id>');
-- Check if node has the lease
Set owner = new_owner if new_owner is set
-- Get all pending events (pg_logical_slot_peek_changes)
-- Progress the replication slot (pg_logical_slot_get_changes)
-- Return remaining events if still owner
Consumer loop
E1 E4 E2 E5 E2 E5
1. Call poll_events for each leased shard
2. Process events from each batch
3. Repeat with event IDs of last event in each batch
poll_events
Failure handling
Producer / consumer fails to connect to storage node:
→ Connect to different node
Storage node fails:
→ Use pick_local_value() for partition column, failover to hot standby
Consumer fails to consume batch
→ Events are repeated until confirmed
Consumer fails and does not come back
→ Consumers periodically call obtain_leases
→ Old leases expire
Use pg_cron to periodically expire leases on coordinator:
SELECT cron.schedule('* * * * *', 'SELECT expire_leases()');
CREATE FUNCTION expire_leases()
...
UPDATE leases
SET owner = new_owner, last_heartbeat = now()
WHERE last_heartbeat < now() - interval '2 minutes'
Maintenance: Lease expiration
Use pg_cron to periodically expire leases on coordinator:
$ psql -h coordinator
SELECT cron.schedule('* * * * *', 'SELECT expire_events()');
CREATE FUNCTION expire_events()
...
DELETE FROM events
WHERE ingest_time < now() - interval '1 day';
Maintenance: Delete old events
Prototyped a functional, highly available publish-subscribe systems in
https://goo.gl/R1suAo
~300 lines of code
Demo
Records
Data warehouse
Real-time analytics
Big data architecture using postgres
Messaging
Questions?
marco@citusdata.com

Más contenido relacionado

La actualidad más candente

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Databricks
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
EDB
 

La actualidad más candente (20)

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony Fox
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.do
 
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander ZaitsevClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
ClickHouse 2018.  How to stop waiting for your queries to complete and start ...ClickHouse 2018.  How to stop waiting for your queries to complete and start ...
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
ClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei MilovidovClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei Milovidov
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on Android
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 

Similar a Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot

Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
SegFaultConf
 
Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2
Alessandro Molina
 
Skytools: PgQ Queues and applications
Skytools: PgQ Queues and applicationsSkytools: PgQ Queues and applications
Skytools: PgQ Queues and applications
elliando dias
 
Lifecycle Inference on Unreliable Event Data
Lifecycle Inference on Unreliable Event DataLifecycle Inference on Unreliable Event Data
Lifecycle Inference on Unreliable Event Data
Databricks
 
PgQ Generic high-performance queue for PostgreSQL
PgQ Generic high-performance queue for PostgreSQLPgQ Generic high-performance queue for PostgreSQL
PgQ Generic high-performance queue for PostgreSQL
elliando dias
 

Similar a Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot (20)

Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
 
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
 
Adding replication protocol support for psycopg2
Adding replication protocol support for psycopg2Adding replication protocol support for psycopg2
Adding replication protocol support for psycopg2
 
Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Ansiblefest 2018 Network automation journey at roblox
Ansiblefest 2018 Network automation journey at robloxAnsiblefest 2018 Network automation journey at roblox
Ansiblefest 2018 Network automation journey at roblox
 
How to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita Galkin
 
Skytools: PgQ Queues and applications
Skytools: PgQ Queues and applicationsSkytools: PgQ Queues and applications
Skytools: PgQ Queues and applications
 
Lifecycle Inference on Unreliable Event Data
Lifecycle Inference on Unreliable Event DataLifecycle Inference on Unreliable Event Data
Lifecycle Inference on Unreliable Event Data
 
Mod06 new development tools
Mod06 new development toolsMod06 new development tools
Mod06 new development tools
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
PgQ Generic high-performance queue for PostgreSQL
PgQ Generic high-performance queue for PostgreSQLPgQ Generic high-performance queue for PostgreSQL
PgQ Generic high-performance queue for PostgreSQL
 
Pcp
PcpPcp
Pcp
 

Más de Citus Data

Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Citus Data
 
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Citus Data
 

Más de Citus Data (20)

JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will Leinweber
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
 
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot

  • 1. Distributed Computing on PostgreSQL Marco Slot <marco@citusdata.com>
  • 4. Records Data warehouse Real-time analytics Big data architecture using postgres Messaging
  • 5. PostgreSQL is a perfect building block for distributed systems
  • 6. Features! PostgreSQL contains many useful features for building a distributed system: ● Well-defined protocol, libpq ● Crash safety ● Concurrent execution ● Transactions ● Access controls ● 2PC ● Replication ● Custom functions ● …
  • 7. Extensions! Built-in / contrib: ● postgres_fdw ● dblink RPC! ● plpgsql Third-party open source: ● pglogical ● pg_cron ● citus
  • 8. Extensions! Built-in / contrib: ● postgres_fdw ● dblink RPC! ● plpgsql Third-party open source: ● pglogical ● pg_cron ● citus Yours!
  • 9. dblink Run queries on remote postgres server SELECT dblink_connect(node_id, format('host=%s port=%s dbname=postgres', node_name, node_port)) FROM nodes; SELECT dblink_send_query(node_id, $$SELECT pg_database_size('postgres')$$) FROM nodes; SELECT sum(size::bigint) FROM nodes, dblink_get_result(nodes.node_id) AS r(size text); SELECT dblink_disconnect(node_id) FROM nodes;
  • 10. RPC using dblink For every postgres function, we can create a client-side stub using dblink. CREATE FUNCTION func(input text) ... CREATE FUNCTION remote_func(host text, port int, input text) RETURNS text LANGUAGE sql AS $function$ SELECT res FROM dblink( format('host=%s port=%s', host, port), format('SELECT * FROM func(%L)', input)) AS res(output text); $function$;
  • 11. PL/pgSQL Procedural language for Postgres: CREATE FUNCTION distributed_database_size(dbname text) RETURNS bigint LANGUAGE plpgsql AS $function$ DECLARE total_size bigint; BEGIN PERFORM dblink_send_query(node_id, format('SELECT pg_database_size(%L)', dbname) FROM nodes; SELECT sum(size::bigint) INTO total_size FROM nodes, dblink_get_result(nodes.node_id) AS r(size text); RETURN total_size END; $function$;
  • 12. Distributed system in progress... With these extensions, we can already create a simple distributed computing system. Nodes Nodes Nodes Nodes Parallel operation using dblink SELECT transform_data() Data 1 Data 2 Data 3 postgres_fdw?
  • 13. pglogical / logical replication Asynchronously replicate changes to another database. Nodes Nodes Nodes Nodes
  • 14. pg_paxos Consistently replicate changes between databases. Nodes Nodes Nodes
  • 15. pg_cron Cron-based job scheduler for postgres: CREATE EXTENSION pg_cron; SELECT cron.schedule('* * * * */10', 'SELECT transform_data()'); Internally uses libpq, meaning it can also schedule jobs on other nodes. pg_cron provides a way for nodes to act autonomously
  • 16. Citus Transparently shards tables across multiple nodes Coordinator E1 E4 E2 E5 E2 E5 Events create_distributed_table('events', 'event_id');
  • 17. Citus MX Nodes can have the distributed tables too Coordinator E1 E4 E2 E5 E2 E5 Events Events Events Events
  • 18. How to build a distributed system using only PostgreSQL & extensions?
  • 19. Building a streaming publish-subscribe system Producers Postgres nodes Consumers topic: adclick
  • 20. Storage nodes E1 E4 E2 E5 E2 E5 Events Events Events Coordinator Events CREATE TABLE Use Citus to create a distributed table
  • 21. Distributed Table Creation $ psql -h coordinator CREATE TABLE events ( event_id bigserial, ingest_time timestamptz default now(), topic_name text not null, payload jsonb ); SELECT create_distributed_table('events', 'event_id'); $ psql -h any-node INSERT INTO events (topic_name, payload) VALUES ('adclick','{...}');
  • 22. Sharding strategy Shard is chosen by hashing the value in the partition column. Application-defined: ● stream_id text not null Optimise data distribution: ● event_id bigserial Optimise ingest capacity and availability: ● sid int default pick_local_value()
  • 23. Producers connect to a random node and perform COPY or INSERT into events Producers E1 E4 E2 E5 E2 E5 Events Events Events COPY / INSERT
  • 24. Consumers in a group together consume events at least / exactly once. Consumers E1 E4 E2 E5 E2 E5 topic: adclick% Consumer group
  • 25. Consumers obtain leases for consuming a shard. Lease are kept in a separate table on each node: CREATE TABLE leases ( consumer_group text not null, shard_id bigint not null, owner text, new_owner text, last_heartbeat timestamptz, PRIMARY KEY (consumer_group, shard_id) ); Consumer leases
  • 26. Consumers obtain leases for consuming a shard. SELECT * FROM claim_lease('click-analytics', 'node-2', 102008); Under the covers: Insert a new lease or set new_owner to steal lease. CREATE FUNCTION claim_lease(group_name text, node_name text, shard_id int) … INSERT INTO leases (consumer_group, shard_id, owner, last_heartbeat) VALUES (group_name, shard, node_name, now()) ON CONFLICT (consumer_group, shard_id) DO UPDATE SET new_owner = node_name WHERE leases.new_owner IS NULL; Consumer leases
  • 27. Distributing leases across consumers Distributed algorithm for distributing leases across nodes SELECT * FROM obtain_leases('click-analytics', 'node-2') -- gets all available lease tables -- claim all unclaimed shards -- claim random shards until #claims >= #shards/#consumers Not perfect, but ensures all shards are consumed with load balancing (unless C>S)
  • 28. Consumers E1 E4 E2 E5 E2 E5 leases First consumer consumes all obtain_leases leases leases
  • 29. Consumers E1 E4 E2 E5 E2 E5 First consumer consumes all leases leases leases
  • 30. Consumers E1 E4 E2 E5 E2 E5 Second consumer steals leases from first consumer obtain_leases leases leases leases
  • 31. Consumers E1 E4 E2 E5 E2 E5 Second consumer steals leases from first consumer
  • 32. Consuming events Consumer wants to receive all events once. Several options: ● SQL level ● Logical decoding utility functions ● Use a replication connection ● PG10 logical replication / pglogical
  • 33. Consuming events Get a batch of events from a shard: SELECT * FROM poll_events('click-analytics', 'node-2', 102008, 'adclick', '<last-processed-event-id>'); -- Check if node has the lease Set owner = new_owner if new_owner is set -- Get all pending events (pg_logical_slot_peek_changes) -- Progress the replication slot (pg_logical_slot_get_changes) -- Return remaining events if still owner
  • 34. Consumer loop E1 E4 E2 E5 E2 E5 1. Call poll_events for each leased shard 2. Process events from each batch 3. Repeat with event IDs of last event in each batch poll_events
  • 35. Failure handling Producer / consumer fails to connect to storage node: → Connect to different node Storage node fails: → Use pick_local_value() for partition column, failover to hot standby Consumer fails to consume batch → Events are repeated until confirmed Consumer fails and does not come back → Consumers periodically call obtain_leases → Old leases expire
  • 36. Use pg_cron to periodically expire leases on coordinator: SELECT cron.schedule('* * * * *', 'SELECT expire_leases()'); CREATE FUNCTION expire_leases() ... UPDATE leases SET owner = new_owner, last_heartbeat = now() WHERE last_heartbeat < now() - interval '2 minutes' Maintenance: Lease expiration
  • 37. Use pg_cron to periodically expire leases on coordinator: $ psql -h coordinator SELECT cron.schedule('* * * * *', 'SELECT expire_events()'); CREATE FUNCTION expire_events() ... DELETE FROM events WHERE ingest_time < now() - interval '1 day'; Maintenance: Delete old events
  • 38. Prototyped a functional, highly available publish-subscribe systems in https://goo.gl/R1suAo ~300 lines of code
  • 39. Demo
  • 40. Records Data warehouse Real-time analytics Big data architecture using postgres Messaging