SlideShare una empresa de Scribd logo
1 de 43
SCALING OUR SAAS BACKEND
WITH POSTGRESQL
OLIVER SEEMANN, BIDMANAGEMENT GMBH
BWB MEETUP, 2013-10-28
THIS TALK IS ABOUT …
THIS TALK IS ABOUT …

Gigabytes

Terabytes
PRODUCTIVITY TOOLS FOR
ONLINE MARKETERS

Automatic Bid Management for
Auctioned Ads

“Organic” Search
SIGNIFICANT AMOUNTS OF DATA

10.000 Campaigns
5 Mio Keywords
4 Mio Ads
per AdWords account
SIGNIFICANT AMOUNTS OF DATA

Full History for all objects
over full lifetime
SLOW AND FAST DATA

“Slow” / OLAP data for
batch-processing jobs
“Fast” / OLTP data for
human interaction
INITIALLY SEPARATE

Slow
Data

Fast
Data
A LOT OF OVERLAP

Slow
Data

Fast
Data
THEN ONLY ONE

Slow
Data

Fast
Data
CURRENTLY

7 machines running PostgreSQL
3 Terabytes Data
Thousands of Databases
Largest Table: 120GB
HOW IT BEGAN

Experiment
DESIGN BY THE BOOK
Scenario
PK,FK1
PK,FK1
PK

Customer
PK

customer_id

Account

Campaign

Adgroup

PK

user_id

FK1

customer_id

account_id

PK

campaign_id

PK

adgroup_id

FK1

User

PK

customer_id

FK1

account_id

FK1

campaign_id

UserAccountAccess
PK,FK1
PK,FK2

account_id
user_id

History
PK
PK,FK1
PK,FK1,FK2

day
keyword_id
adgroup_id

keyword_id
adgroup_id
factor

Keywords
PK,FK1
PK

adgroup_id
keyword_id
MORE CUSTOMERS – MORE DATA
PARTITIONING
All Accounts
Account 1 – Rec 1
Account 2 – Rec 1
Account 1 – Rec 2
Account 3 – Rec 1

Account 2 – Rec 2
Account 2 – Rec 3
Account 1 – Rec 3

Account 3 – Rec 2
PARTITIONING
Account 1

Account 2

Account 3

Account 1 – Rec 1

Account 2 – Rec 1

Account 3 – Rec 1

Account 1 – Rec 2

Account 2 – Rec 2

Account 3 – Rec 2

Account 1 – Rec 3

Account 2 – Rec 3

Account 3 – Rec 3
PARTITION WITH INHERITANCE

SELECT

Child

Parent

INSERT

Child

CHECK CONSTRAINTS

Child
ISOLATE ACCOUNTS

One DB

Many DBs
PARTITIONING VIA DATABASES

Excellent horizontal scaling
Easy cloning
pg_dump/pg_restore
Some Overhead
No direct references
WHY NOT SCHEMAS?

More lightweight
Full References
No easy cloning
No schemas inside schemas
SETUP

main

machine-1

machine-0
machine-2
DB HARDWARE

Data > RAM
⇒ High I/O
EC2?
MIGRATION TO EC2

Must migrate all/most machines
No PostgreSQL in RDS
DB Instances run 24/7 ⇒ costly
EBS Performance limited
EBS I/O LIMITED
MB/s
900
800
700
600
500
400
300
200
100
0

Seq. Write
Seq. Read

AWS Instance AWS EBS (Raid-0)
Storage SSD (Raid0)

Real 15k SAS2
(Raid-10)
DEDICATED MACHINES

Moderate CPU / RAM
Fast Disks
Battery-backed caching controller
ALTERNATIVE HW

Use bigger (and slower) SATA drives
Evaluate EC2+EBS in production
SSDs
HARDWARE FAILS

Replication

Master

Slave

Availability
Query Load Balancing
REPLICATION
Account DBs

Main DB
master-1

master

slave-1

master-2

slave-2

slave
BACKUPS

pg_dump
compressed

Backup Server
REPLICATION
Account DBs

Main DB
master-1

master

slave-1

master-2

slave-2

slave
REPLICATION
Account DBs

Main DB
master-1

master

slave-1

master-2

slave-2

slave
REPLICATION
Account DBs

Main DB
master-1

master

master-3

master-2

master-4

slave
DISASTER RESTORE

concurrent
pg_restore

Backup Server
PERFORMANCE PROBLEMS
Too many concurrent full table scans
From 300MB/s to 30MB/s
MORE CONCURRENT
QUERIES

LONGER QUERY RUNTIME
DIFFERENT APPS

Web App
Server

Compute
Cluster

Many fast
queries

Few very
slow queries
DIFFERENT APPS
Semaphore

Web App
Server

Many fast queries

Compute
Cluster

Few very slow queries

Simple counting semaphore using Advisory Locks
Implemented in the application
BULK INSERTS

INSERT
20k – 80k
per sec

50M
BULK INSERT BEST PRACTICE

COPY instead of INSERT
Drop indexes + recreate
Truncate
COPY into a new table, swap + drop
SIGNUP PROBLEMS

Adspert
Service

Signup
CREATE
DATABASE

Up to 5-10 min
PRE-CREATE DATABASES

Create DBs ahead of time
New signups rename DBs
Periodically create new
Fall back to direct create
CONCLUDING ..

Partitioning into Databases
Physical Hardware
Check out advisory locks
THANKS FOR LISTENING

QUESTIONS?

Más contenido relacionado

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Destacado

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Scaling our SaaS backend with PostgreSQL

Notas del editor

  1. Hi, I’m Oliver, I’m a software developer, currently heading the development team at Bidmanagement GmbH in Berlin.
  2. I’m going to talk aboutpostgresqlNot so much about the dbms itself, but more about how we’re using it as main datastore in our system.
  3. About how in our company we're running a large Postgresql installationHow we‘ve grown our setup
  4. ----- Meeting Notes (27/10/13 11:10) -----very popularbillions of dollarsvery important online marketing channel
  5. Google provides a very extensive API
  6. ----- Meeting Notes (23/10/13 22:22) -----The different kinds of data we store can be largely separated into two groups.
  7. .. And we decided to go with postgresql, because:Our Go-To tool for storing data for many yearsProblems from time to time, but..We never looked back
  8. But it began much smaller …
  9. Straightforward approachNobody thought of scaling
  10. Pilots successful, we started to acquire customersSoon >10mio rows in some tablesQuery performance lagged (many FTS) Did not want to scale horizontally, because we aspired much bigger growth(Also: expensive)----- Meeting Notes (24/10/13 20:45) -----vertically
  11. PostgreSQL supports partitioning via inheritance[insert scheme]Use CHECK constraints to tell Query Planner where to lookCannot insert into parent table, must insert into child tableLot of effort goes to application logicTried it on one table, weren’t it conviced
  12. One main db with non-account specific dataCurrently ~ 1-2 GBSeveral machines dedicated to account-databases50-1000 DBs per machinePostgreSQL 9.0 and 9.3 on each machineAllows us to migrate one db after another
  13. Partitioning scheme allows easy horizontal scaling More machines. But which?Dataset does not fit in RAM High I/O requirementsAWS EC2?Must migrate all/most machines due to latencyDB Instances run 24/7  costlyEBS Performance limited (GBit Ethernet)[ec2 / ebs performance numbers vs. physical]----- Meeting Notes (24/10/13 20:45) -----Add: not many core
  14. Not that much elasticity requiredAs B2B our growth is more predictableBatch processing of expensive backend jobs1 year EC2 instance ≅ Buying one physical serverUsing mid-sized machinesGood price/value ratio
  15. SATA: 600GB vs 3 TBEC2: performance, latency unclear. Evaluate to make informed decisionSSDs: expensive. Reliable? Raid?
  16. But when things go awry and data gets deleted …
  17. Big cheap HDDs
  18. But when things go awry and data gets deleted …
  19. But when things go awry and data gets deleted …
  20. MainDB still replicatedTo enable quick failoverHere we can’t afford extended downtime
  21. Capacity doubled, cost reduced 40%The more servers, the faster the restoreGbit Ethernet on backup server is limiting factor
  22. From sequential reads to random readsFeedback loop:
  23. Webapp-queries with humans waiting are quite fastProblematic queries done by the analysis jobsFrequent full table scansQueries with huge resultsNeed way to synchronize queries, control concurrencyCould use a connection poolerOr an external synchronization mechanisme.g. Zookeeper
  24. Webapp-queries with humans waiting are quite fastProblematic queries done by the analysis jobsFrequent full table scansQueries with huge resultsNeed way to synchronize queries, control concurrencyCould use a connection poolerOr an external synchronization mechanisme.g. Zookeeper
  25. We rewrite the history every day (for various reasons)Conversions arrive up to 30 days laterCampaigns are added to optimizationFor most accounts <1M recordsFor some 10-100MWe achieve up to 80k inserts/secNetwork is bottleneck [check this]
  26. We use COPY for all bulk inserts, even small bulksDrop/recreate with simple plpgsql functionsFor complete table rewritesTRUNCATE is not transaction safe
  27. We added a self-service signup2-minute process to add AdWords account to the systemOAuth User Info  Optimization BootstrapBiggest problem:CREATE DATABASE can take several minutesDepends on current amount of write activity
  28. We know always keep 10-20 spare databases in stockWe control target host for new databases this wayTake care not to have race conditions when applying schema changes