SlideShare una empresa de Scribd logo
1 de 84
SCALING A SAAS BACKEND 
WITH POSTGRESQL – A CASE STUDY 
PostgreSQL Conference Europe 
Madrid 2014-10-24 
Oliver Seemann - Bidmanagement GmbH 
oliver.seemann@adspert.net
Gigabytes 
Terabytes 
Growing Data
We do productivity tools for 
advertisers
Significant amounts of data
Upper boundary: 
5M keywords × 365 days 
× 20 bigints/doubles 
≅ 300GB
OLTP / OLAP Duality
“Slow” OLAP data for daily batch-processing 
jobs
“Fast” OLTP data for 
human interaction
Initially separate databases 
Slow 
Data 
Fast 
Data
Data overlaps significantly 
Slow 
Data 
Fast 
Data
We went with unified approach 
Slow 
Data 
Fast 
Data
Currently: 
7 machines running PG 9.3
Currently: 
~3 TB Data
Currently: 
largest table: ~100GB
How it all started..
It began as an experiment
Design by the book 
Keywords 
PK,FK1 adgroup_id 
PK keyword_id 
Campaign 
PK campaign_id 
FK1 account_id 
Adgroup 
PK adgroup_id 
FK1 campaign_id 
Account 
PK account_id 
FK1 customer_id 
Customer 
PK customer_id 
User 
PK user_id 
FK1 customer_id 
History 
PK day 
PK,FK1 keyword_id 
PK,FK1,FK2 adgroup_id 
UserAccountAccess 
PK,FK1 account_id 
PK,FK2 user_id 
Scenario 
PK,FK1 keyword_id 
PK,FK1 adgroup_id 
PK factor
Soon tens of GB 
>100M records
All Accounts 
Account 1 – Rec 1 
Account 2 – Rec 1 
Account 1 – Rec 2 
Account 3 – Rec 1 
Account 2 – Rec 2 
Account 2 – Rec 3 
Account 1 – Rec 3 
Account 3 – Rec 2
+10fold increase per level 
Account >10 
FK Campaign >1k 
FK Ad Group >100K 
FK Keyword >10M 
FK History >100M
Partitioning, somehow 
Account 1 
Account 1 – Rec 1 
Account 1 – Rec 2 
Account 1 – Rec 3 
Account 2 
Account 2 – Rec 1 
Account 2 – Rec 2 
Account 2 – Rec 3 
Account 3 
Account 3 – Rec 1 
Account 3 – Rec 2 
Account 3 – Rec 3
Partitioning with inheritance 
Parent 
Child Child Child 
check-constraints 
SELECT 
INSERT
PG Partitioning is nifty – 
but not a match for our case
Our case: 
Little to no shared data between 
clients
Isolate accounts 
One DB Many DBs/Schemas?
Both approaches: 
+ Good horizontal scaling
Both approaches: 
+ Good tool support 
(e.g. pg_dump/restore)
Partition into databases: 
+ Easy cloning 
CREATE DATABASE foo TEMPLATE bar;
Partition into databases: 
+ Stricter isolation (security)
Partition into databases: 
- Some Overhead
Partition into databases: 
- No direct references
Partition into schemas: 
+ More lightweight
Partition into schemas: 
+ Full references
Partition into schemas: 
- No easy cloning
Partition into schemas: 
- No cascading schemas
Now: 
Several thousand databases 
on five 1TB machines
Now: 
Plus main DB server pair 
with <10GB data
Setup 
Main DB Hosts 
standalone-0 standalone-1 
standalone-2 standalone-3 
master slave 
Account DB Hosts
No replication on 
account db hosts?
Performance Problems
Too many concurrent 
full table scans
From 300MB/s to 30MB/s 
More concurrent queries 
Longer query runtime
Different apps 
Different access patterns 
Web 
Apps 
Compute 
Cluster 
Many small/ 
fast queries 
Few very slow/ 
big queries
Limit concurrent access 
with counting semaphore 
Web 
Apps 
Compute 
Cluster 
Many small/ 
fast queries 
Few very slow/ 
big queries
Implement Semaphore using 
Advisory Locks
Simpler than setting up 
Zookeeper
More performance problems: 
Bulk Inserts
Solved with common 
best practices:
COPY exclusively
Drop / Recreate indexes
COPY to new table + swap
Another problem:
CREATE DATABASE 
can take a while
Signup Delays 
Signup 
Web App 
+ 
CREATE 
DATABASE 
Can take 
up to 5-15 min
CREATE DATABASE 
performs a CHECKPOINT
Solution: 
Keep stock of 
spare databases
In general: 
Very happy with our approach
Databases are tangible
Move DBs between hosts
Painless 9.0 -> 9.3 migration
Use schemas as partitions?
Would prevent 
regular schema usage
CREATE SCHEMA foo; 
CREATE SCHEMA foo.bar; 
CREATE SCHEMA foo.bar.baz;
Schemas are crucial for us
Versioning of database code
Grown to about 
~15k SQL functions/views
Moved core algorithms 
from app to db
Previously: 
1. Read bulk raw data from DB 
2. Number crunching in app 
3. Write bulk results to DB
4x-10x faster in DB 
2x-4x RAM reduction
SQL is harder to read & write
How to test it? 
App test suite 
goes a long way.
Different production stages 
Versioning with schemas
Every 4-8 weeks: 
CREATE SCHEMA version_%d;
Assign each version a stage: 
unstable -> testing -> stable
Stage App Schema COUNT(account) 
unstable v22.4 version_22 0% - 2% 
testing v21.13 version_21 1% – 50% 
stable v20.19 version_20 50% – 100%
Watchdogs on key metrics 
alert on suspicious behaviour
Schemas are VERY important
Takeaway: 
Databases can be aptly 
used as partitions
Takeaway: 
So can schemas
Takeaway: 
Schemas can be used 
for versioning
Thanks for listening 
Questions?
Managing Schema Changes
ORM 
Can’t live with it, 
can’t live without it
PG Wishes

Más contenido relacionado

La actualidad más candente

Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at TencentMichael Stack
 
HBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project StatusHBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project StatusMichael Stack
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseTim Vaillancourt
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...In-Memory Computing Summit
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Columnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil AgarwalColumnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil AgarwalBrent Ozar
 
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiHBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiMichael Stack
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDBMongoDB
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBAshnikbiz
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014Modern Data Stack France
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 

La actualidad más candente (20)

Jee conf
Jee confJee conf
Jee conf
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
HBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project StatusHBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project Status
 
Copy data management
Copy data managementCopy data management
Copy data management
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
PHP and Cassandra
PHP and CassandraPHP and Cassandra
PHP and Cassandra
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Columnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil AgarwalColumnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil Agarwal
 
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiHBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 

Destacado

Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
Design Strategy for Data Isolation in SaaS Model
Design Strategy for Data Isolation in SaaS ModelDesign Strategy for Data Isolation in SaaS Model
Design Strategy for Data Isolation in SaaS ModelTechcello
 
Денормализованное хранение данных в PostgreSQL 9.2 (Александр Коротков)
Денормализованное хранение данных в PostgreSQL 9.2 (Александр Коротков)Денормализованное хранение данных в PostgreSQL 9.2 (Александр Коротков)
Денормализованное хранение данных в PostgreSQL 9.2 (Александр Коротков)Ontico
 
MySQL for Software-as-a-Service (SaaS)
MySQL for Software-as-a-Service (SaaS)MySQL for Software-as-a-Service (SaaS)
MySQL for Software-as-a-Service (SaaS)Mario Beck
 
Scaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six MonthsScaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six MonthsKonstantin Gredeskoul
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLKonstantin Gredeskoul
 
Best Practices for Managing SaaS Applications
Best Practices for Managing SaaS ApplicationsBest Practices for Managing SaaS Applications
Best Practices for Managing SaaS ApplicationsCorrelsense
 
12 Key Levers of SaaS Success
12 Key Levers of SaaS Success12 Key Levers of SaaS Success
12 Key Levers of SaaS SuccessDavid Skok
 
How to Develop Your SaaS Pricing Model
How to Develop Your SaaS Pricing ModelHow to Develop Your SaaS Pricing Model
How to Develop Your SaaS Pricing ModelLincoln Murphy
 
The SaaS business model and metrics
The SaaS business model and metricsThe SaaS business model and metrics
The SaaS business model and metricsDavid Skok
 

Destacado (10)

Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Design Strategy for Data Isolation in SaaS Model
Design Strategy for Data Isolation in SaaS ModelDesign Strategy for Data Isolation in SaaS Model
Design Strategy for Data Isolation in SaaS Model
 
Денормализованное хранение данных в PostgreSQL 9.2 (Александр Коротков)
Денормализованное хранение данных в PostgreSQL 9.2 (Александр Коротков)Денормализованное хранение данных в PostgreSQL 9.2 (Александр Коротков)
Денормализованное хранение данных в PostgreSQL 9.2 (Александр Коротков)
 
MySQL for Software-as-a-Service (SaaS)
MySQL for Software-as-a-Service (SaaS)MySQL for Software-as-a-Service (SaaS)
MySQL for Software-as-a-Service (SaaS)
 
Scaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six MonthsScaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six Months
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
 
Best Practices for Managing SaaS Applications
Best Practices for Managing SaaS ApplicationsBest Practices for Managing SaaS Applications
Best Practices for Managing SaaS Applications
 
12 Key Levers of SaaS Success
12 Key Levers of SaaS Success12 Key Levers of SaaS Success
12 Key Levers of SaaS Success
 
How to Develop Your SaaS Pricing Model
How to Develop Your SaaS Pricing ModelHow to Develop Your SaaS Pricing Model
How to Develop Your SaaS Pricing Model
 
The SaaS business model and metrics
The SaaS business model and metricsThe SaaS business model and metrics
The SaaS business model and metrics
 

Similar a Scaling a SaaS backend with PostgreSQL - A case study

AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelishuguk
 
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...VMware Tanzu
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Evaluating Storage for VDI Projects
Evaluating Storage for VDI ProjectsEvaluating Storage for VDI Projects
Evaluating Storage for VDI ProjectsTegile Systems
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Clustrix
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 
Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!mold
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Antonios Chatzipavlis
 
Scaling Your Web Application
Scaling Your Web ApplicationScaling Your Web Application
Scaling Your Web ApplicationKetan Deshmukh
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Espc17 make your share point fly by tuning and optimising sql server
Espc17 make your share point  fly by tuning and optimising sql serverEspc17 make your share point  fly by tuning and optimising sql server
Espc17 make your share point fly by tuning and optimising sql serverIsabelle Van Campenhoudt
 
Make your SharePoint fly by tuning and optimizing SQL Server
Make your SharePoint  fly by tuning and optimizing SQL ServerMake your SharePoint  fly by tuning and optimizing SQL Server
Make your SharePoint fly by tuning and optimizing SQL Serverserge luca
 

Similar a Scaling a SaaS backend with PostgreSQL - A case study (20)

AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
 
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Evaluating Storage for VDI Projects
Evaluating Storage for VDI ProjectsEvaluating Storage for VDI Projects
Evaluating Storage for VDI Projects
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014
 
Scaling Your Web Application
Scaling Your Web ApplicationScaling Your Web Application
Scaling Your Web Application
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Espc17 make your share point fly by tuning and optimising sql server
Espc17 make your share point  fly by tuning and optimising sql serverEspc17 make your share point  fly by tuning and optimising sql server
Espc17 make your share point fly by tuning and optimising sql server
 
Make your SharePoint fly by tuning and optimizing SQL Server
Make your SharePoint  fly by tuning and optimizing SQL ServerMake your SharePoint  fly by tuning and optimizing SQL Server
Make your SharePoint fly by tuning and optimizing SQL Server
 

Último

Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 

Último (20)

Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 

Scaling a SaaS backend with PostgreSQL - A case study

Notas del editor

  1. Hi, I’m Oliver, I’m a software developer, currently heading the development team at Bidmanagement GmbH in Berlin.
  2. I’m going to talk about how we’re using it as main datastore in our system Non of the solutions or approaches are .. But by using some pg features in a non-standard way certain problems can be solved quite elegantly And seeing that this works very well for some, maybe will be helpful to some of you when you have similar problems now or in the future.
  3. Mostly in the area of search engine marketing Which today is mostly adwords, however we also support other networks, for example yandex Our flagship product is a fully automatic bid management solution. Everyday we’re changing the bids on tens of millions of keywords and learn from the effects to steer campaign performance towards goals configured by the user. The philosophy is to take the mind numbing number crunching tasks away from the user, because a computer can do it better and much more efficiently, especially when you have thousands or millions of objects to analyze.
  4. - Replicate the campaign structure Provide reporting interface I don’t want to bore you with the technical details about how search engine marketing work so let’s just say we store a lot of ints and floats and especially time series data of those. To get an idea of the ballpark we’re working with let’s have a look at the upper boundary
  5. Ballpark estimates Upper bound Time series data Hierarchical data Clicks, impressions and also lots of statistical projections with confidence intervals Of course most of those values are actually zero and those can be omitted when storing the data. So it may actually only be 5 or 10% or that. However we have thousands of accounts, most of which only have a few hundred MB to a few GB. But the occasional outlier with 100GB must work just as well.
  6. The different kinds of data we store can be largely separated into two groups.
  7. One internal (batch processing), One external (web app access)
  8. Mostly the time series data So we had to either duplicate lots data and synchronize changes. Or integrate both into one and make sure different parts of the system don’t get in each other’s way.
  9. We opted for the latter because it makes for a simpler system. We just have to make sure So far it turned out well and we havent looked back.
  10. Let’s have a peek into the past in order to understand how the system evolved.
  11. Our CTO is a mathematician Skunk works project
  12. PostgreSQL supports partitioning via inheritance [insert scheme] Use CHECK constraints to tell Query Planner where to look Cannot insert into parent table, must insert into child table Lot of effort goes to application logic Tried it on one table, weren’t it conviced
  13. The database or schema as a logical unit is a central part of PG with good tool support Easy to add, easy to drop Can be Backed up Restored Moved between machines Very Tangible from an ops view
  14. MainDB still replicated To enable quick failover Here we can’t afford extended downtime
  15. Can make availability / cost trade offs here
  16. Big cheap HDDs Bottle neck is Gigabit Ethernet
  17. Capacity doubled, cost reduced 40% The more servers, the faster the restore Gbit Ethernet on backup server is limiting factor
  18. Not really feasible: We rewrite lots of data every day (crude approach, but simpler code) Complex Administration (no dedicated DBA)
  19. From sequential reads to random reads The cause of the problem is only on one side ..
  20. Webapp-queries with humans waiting are quite fast Problematic queries done by the analysis jobs Frequent full table scans Queries with huge results Need way to synchronize queries, control concurrency Could use a connection pooler Or an external synchronization mechanism e.g. Zookeeper
  21. Webapp-queries with humans waiting are quite fast Problematic queries done by the analysis jobs Frequent full table scans Queries with huge results Need way to synchronize queries, control concurrency Could use a connection pooler Or an external synchronization mechanism e.g. Zookeeper
  22. Very simple mechanism Unfair, but that’s no problem
  23. However, it’s starting to spread with a tendency to be mis-used.
  24. An ALTER INDEX foo DISABLE would come in handy;
  25. We added a self-service signup 2-minute process to add AdWords account to the system OAuth  User Info  Optimization Bootstrap Biggest problem: CREATE DATABASE can take several minutes Depends on current amount of write activity More granular checkpoint (per db) would be cool?
  26. Restrict checkpoints to databases?
  27. So all of the drawbacks that came up could be worked around, more or less elegantly. In total, we’re very happy with the way the approach has turned out. Especially the scalability and isolation aspects of it have us very pleased. So much in fact, that we also used it for a second product and it feels very natural.
  28. Databases as a unit of abstraction on a client or account level level are very much tangible. Which makes it comfortable both from an development and operations point of view. They can be connected to, renamed, cloned, copied, moved, backed up and restored. When we remove a customer from the system we just dump the account databases and put them on S3 Glacier for some amount of time, instead of keeping the 100GB in the system.
  29. To manage capacity. Currently this is still a manual process because it’s not required very often. Making it automatic would require amongst other things a means to briefly disable the app from connecting to it. Does “ALTER DATABASE set CONNECTION LIMIT 0” work?
  30. Moving between hosts means we can also move it between PG versions. We upgraded from 9.0 to 9.3 without much effort by installing both on all machines and then dumping them one after another from 9.0 and restoring into 9.3. Over a period of 2-3 months. Memory is not a problem as shared_buffers is relatively low (a few gigabytes) and most memory is used by page and buffer cache and all files continue to exist only once. Used 9.3 in development for a few months Btw, I only remember one case where we needed to adapt code for 9.3, something with the order of a query result. Otherwise the ugprade was a breeze.
  31. But, even though this works very well with using databases as partitions. Would schemas have worked the same way?
  32. The biggest problem we would have had is, that we wouldnt be able to use schemas for other purposes anymore.
  33. This is become necessary
  34. It has grown quite a bit because we started with lots of Perl code and a “dumb” data store
  35. Up to 100GB memory in step 2
  36. Only works when we can limit concurrent the batch jobs per machine (advisory locks).
  37. But it’s not all sunshine and rainbows with that approach, of course. Because SQL is much harder to write and to read than procedural code. The notion that “code is a liability” has some truth to it. So the more we move into the database, the hard it becomes to manage. Python I just much more tangible and malleable than SQL. We have to compromise between easy to debug&test and performance.
  38. But, given a bit of time and quiet one can accomplish much with little code in SQL. Testing of individual snippets can be done by calling it from the application code, as part of an integrated test suite that has test data and expects certain results. Covering most of the code in tests is not the problem, but covering most data scenarios is much more work (Div by zero sneeks in from time to time). Those cases are postponed to …
  39. The SQL code decides how to spend millions of euros in advertising money every months. We can’t afford deploying any code changes (app or db) to all account databases at the same time. So we use schemas to manage multiple versions of the optimization code.
  40. The schema is filled with all the objects from a set of source .sql files. The application software version that uses the db and schema version are identical, app sets search path. We don’t use minor versions for fixes in the db code.
  41. What we do is, assign each version a stage. Unstable, testing, stable, borrowed from Debian. And we also can assign individual client accounts a stage.
  42. Typically test accounts or one with a pathological case that is fixed by the new release. Those are closely monitored (performance, errors, log files, debugging data). Brand new unstable: few, selected (test-)accounts Testing stage for incremental roll-out