SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Cloud Architecture Patterns
Running PostgreSQL at Scale
(when RDS won't do what you need)
Corey Huinker
Corlogic Consulting
March 2018
First, we need a problem to
solve.
This is You
You Get An Idea For a Product
You make a product! ...now you have to sell it.
To advertise the product, you need an ad...
...so you talk to an ad agency.
But placing ads has challenges
Need to find websites with visitors who:
● Would want to buy your product
● Are able to buy your product
● Would be drawn in by the creative you have designed
Websites Claims about their Visitors...
...are not always accurate.
Buying ad-space on websites directly is usually not
possible, you need a broker/auction service.
So how do you know that your ad was seen?
Focal points of ad monitoring
● Number of times ad landed on a page (impressions)
● Where on the page did it land?
● Did it fit the space allotted.
● How long did the page stay up.
● Did the viewer interact with the ad in any way?
● Was the viewer a human?
● How do these numbers compare with the claims of the website?
● How do these numbers compare with the claims of the broker?
This creates a lot of data
● Not all impressions phone home (sampling rate varies by contract)
● Sampling events recorded per day (approx): 50 Billion
● Sampling events are chained together to tell the story of that impression.
● Impression data is then aggregated by date, ad campaign, browser
● After aggregation, about 500M rows are left per day.
● Each row has > 125 measures of viewability metrics
Capturing The Events
● Pixel servers
● Need to be fast to not slow down user experience
● or risk losing event data
● Need to get log data off of machine ASAP
● Approximately 500 machines
○ Low CPU workload
○ Low disk I/O workload
○ High network bandwidth
○ low latency
○ generously over-provisioned
Real-time accumulation and aggregation
● Consumes event logs from pixel servers as fast as possible.
● Each server is effectively a shard of the whole "today" database
● Custom in-memory database updating continuously
● Serving API calls continuously
● Approximately 450 machines
○ CPU load nearly 100%
○ To swap is to die
○ High network bandwidth
○ low latency
○ generously over-provisioned
What Didn't Work: MySQL
● Original DB choice
● Performed adequately when daily volume was < 1% of current volume
● Impossible to add new columns to tables
● Easier to create a new shard than to modify an existing one.
● New metrics being added every few weeks, or even days
● Dozens of shards, no consistency in their size
What Didn't Work: Redshift
● Intended to compliment MySQL
● Performed adequately when daily volume was < 1% of current volume
● Needed subsecond response, was getting 30s+ response
● Was only machine that had a copy of data across all time
● HDD was slow, tried SSD instances, but had limited space
● Eventually got up to a 26 node cluster with 32 cores per node.
● Cannot distinguish a large query from a small one
● Had no insight into how the data was partitioned
● Reorganizing data according to AWS suggestions would have resulted in
vacuums taking several days.
What Didn't Work: Vertica
● Intended to compliment MySQL
● Good response times over larger data volumes
● Needed local disk to perform adequately, which limited disk size
● each cluster could only hold a few months of data
● 5 node clusters, 32 cores each.
● Could only have K-safety of 1, or else load took too long (2 hrs vs 10)
● Nodes failed daily, until glibc bug was fixed
● Expensive
What Did Work: Postgres
● Migrated OLTP MySQL DB (which held some DW tables)
● Conversion took 2 weeks with 2 programmers
● Used mysql_fdw to create migration tables
● Triggers on tables to identify modified rows
● Moved read-only workloads to postgres instance
● Migrated read-write apps in stages
● Only downtime was in final cut-over
● Single 32 core EC2 with 1-2 physical read replicas
What Did Work: Zipfian workloads
● Customers primarily care about data from today and last seven days
● About 85% of all API requests were in that date range
● Vanilla PostgreSQL instance, 32 cores, ample RAM, 5TB disk
● Data partitioned by day. Drop any partitions > 10 days old.
● Stores derivative data, so no need for backup and recovery strategy
● Focus on loading the data as quickly as possible each morning.
● Adjust apps to be aware that certain client's data is available earlier than
others
● Codename: L7
What Did Work: Getting cute with RAID
● Engineer discovered a quirk in AWS pricing of disk by size
● Could maximize IOPS by combining 30 small drives into a RAID-0
● Same hardware as an L7 could now store ~40 days of data, but data growth
meant that that figure would shrink with time
● Same strategy as L7, just adjusted for longer date coverage
● Codename:
○ L-Month? Would sound silly when X fell below 30
○ L-More? Accurate but not catchy.
○ L-mo?
○ Elmo
What Did Work: Typeahead search
● "Type-ahead" queries must return in < 100ms
● Such queries can be across arbitrary time range
● Scope of response is limited (screen real estate)
● Engineer discovers that our data compresses really well with TOAST
● Specialized instance to store all data at highest grain level, TOASTed
● pseudo-materialized views that aggregate data in search-friendly forms
● Use of "Dimension" tables as a form of compression on the matviews.
● Heavy btree_gin indexing on searchable terms and tokens in dimensions
● Single 32 core machine, abundant memory, 2 read replicas
● Rebuild from scratch would take days, so B&R strategy was needed
What Did Work: TOASTing the Kitchen Sink
● Data usage patterns guaranteed that a client usually wants most of the data
across their org for whatever date range is requested
● Putting such data in arrays guarantees TOASTing and compression.
● Compression shifts workload from scarce IOPS to abundant CPU
● Size of array chunks was heavily tuned for the EC2 instance type.
● Same RAID-0 as used in Elmo instance could now hold all customer data
● 5 32-core machines with an ETL-load sharing feature such that each one
processes a client/day then shares it with other nodes
● Replaced all Redshift and Vertica instances
● Codename: Marjory (the all seeing, all knowing trash heap)
What Did Work: Foreign Data Wrappers
● One FDW converted queries into API calls to the in-memory "today" database
● Another one used query quals to determine the set of client-dates that must
be fetched
● All client data stored on S3 as both .csv.gz and a compressed SQLite db
● FDW starts web service, launches on lambda per sqlite file
● Lambda queries SQLite file, sends results to web service
● web service re-issues lambdas as needed, returns results to FDW
● Very good for queries across long date ranges
● Codename: Frackles (the name for background monster muppets)
What Did Work: PMPP
● Poor Man's Parallel Processing
● Allows an application to issue multiple queries in parallel to multiple servers,
provided all the queries have the same shape
● Returns data via a set returning function, which can then do secondary
aggregation, joins, etc.
● Any machine that talks libpq could be queried (PgSQL, Vertica, Redshift)
● Allows for partial aggregation on DW boxes
● Secondary aggregation can occur on local machine
What Did Work: Decanters
● A place to let the data "breathe"
● Abundant CPUs, abundant memory per CPU, minimal disk
● Very small lookup tables replicated for performance reasons
● All other local tables are FDWs to OLTP database
● Mostly executes aggregation queries that use PMPP to access: Statscache,
Elmo, Marjory, Frackles, each one doing a local aggregation
● Final aggregation happens on decanter
● Can occasionally experience OOM (rather than on an important machine)
● New decanter can spin up and enter load balancer in 5 minutes
● No engineering time to be spent rescuing failed decanters
Putting it all together with PostgreSQL
Tagged Ads
Viewable Events
Pixel
Servers
Stats
Aggregators
S3 - CSVs
S3 - SQLite
Log shipping
Daily
Summaries
Elmo
Clusters
Marjory
Clusters
Search
Clusters
Daily ETLs
Putting it all together with PostgreSQL
User
Stats Requests
Elmo
Clusters
Marjory
Clusters
S3 - SQLite
PMPP Requests
OLTP DB
Third Party DW
Search
Clusters
Searches
Pg FDW
Frackles
FDW
Pg
FDW
Decanters
Live Stats
Aggregators
Stats-Cache
FDW
Why Not RDS?
● No ability to install custom extensions (esp Partitioning modules)
● No place to do local copy operations
● Reduced insight into the server load
● Reduced ability to tune pg server
● No ability to try beta versions
● Expense
Why Not Aurora?
● Had early adopter access
● AWS Devs said that it wasn't geared for DW workloads
● Seems nice on I/O
● Nice not having to worry about which servers are read only
● Wasn't there yet
● Data volumes necessitate advanced partitioning
● Expense
Why Not Athena?
● Athena had no concept of constraint exclusion to avoid reading irrelevant files
● Costs $5/TB of data read
● Most queries would cost > $100 each
● Running thousands of queries per hour
Questions?

Más contenido relacionado

La actualidad más candente

Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 
[우리가 데이터를 쓰는 법] 모바일 게임 로그 데이터 분석 이야기 - 엔터메이트 공신배 팀장
[우리가 데이터를 쓰는 법] 모바일 게임 로그 데이터 분석 이야기 - 엔터메이트 공신배 팀장[우리가 데이터를 쓰는 법] 모바일 게임 로그 데이터 분석 이야기 - 엔터메이트 공신배 팀장
[우리가 데이터를 쓰는 법] 모바일 게임 로그 데이터 분석 이야기 - 엔터메이트 공신배 팀장Dylan Ko
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL PerformanceTakuya UESHIN
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaJiangjie Qin
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company PresentationAndrewJiang18
 

La actualidad más candente (20)

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
[우리가 데이터를 쓰는 법] 모바일 게임 로그 데이터 분석 이야기 - 엔터메이트 공신배 팀장
[우리가 데이터를 쓰는 법] 모바일 게임 로그 데이터 분석 이야기 - 엔터메이트 공신배 팀장[우리가 데이터를 쓰는 법] 모바일 게임 로그 데이터 분석 이야기 - 엔터메이트 공신배 팀장
[우리가 데이터를 쓰는 법] 모바일 게임 로그 데이터 분석 이야기 - 엔터메이트 공신배 팀장
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company Presentation
 

Similar a Cloud arch patterns

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...Anna Ossowski
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at UberDatabricks
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the CloudAmihay Zer-Kavod
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsClaudiu Coman
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaScyllaDB
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On DemandBogdan Kyryliuk
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Enginefschupp
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
Evolution of DBA in the Cloud Era
 Evolution of DBA in the Cloud Era Evolution of DBA in the Cloud Era
Evolution of DBA in the Cloud EraMydbops
 

Similar a Cloud arch patterns (20)

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Engine
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Evolution of DBA in the Cloud Era
 Evolution of DBA in the Cloud Era Evolution of DBA in the Cloud Era
Evolution of DBA in the Cloud Era
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Cloud arch patterns

  • 1. Cloud Architecture Patterns Running PostgreSQL at Scale (when RDS won't do what you need) Corey Huinker Corlogic Consulting March 2018
  • 2. First, we need a problem to solve.
  • 4. You Get An Idea For a Product
  • 5. You make a product! ...now you have to sell it.
  • 6. To advertise the product, you need an ad... ...so you talk to an ad agency.
  • 7. But placing ads has challenges Need to find websites with visitors who: ● Would want to buy your product ● Are able to buy your product ● Would be drawn in by the creative you have designed
  • 8. Websites Claims about their Visitors...
  • 9. ...are not always accurate.
  • 10. Buying ad-space on websites directly is usually not possible, you need a broker/auction service.
  • 11. So how do you know that your ad was seen?
  • 12. Focal points of ad monitoring ● Number of times ad landed on a page (impressions) ● Where on the page did it land? ● Did it fit the space allotted. ● How long did the page stay up. ● Did the viewer interact with the ad in any way? ● Was the viewer a human? ● How do these numbers compare with the claims of the website? ● How do these numbers compare with the claims of the broker?
  • 13. This creates a lot of data ● Not all impressions phone home (sampling rate varies by contract) ● Sampling events recorded per day (approx): 50 Billion ● Sampling events are chained together to tell the story of that impression. ● Impression data is then aggregated by date, ad campaign, browser ● After aggregation, about 500M rows are left per day. ● Each row has > 125 measures of viewability metrics
  • 14. Capturing The Events ● Pixel servers ● Need to be fast to not slow down user experience ● or risk losing event data ● Need to get log data off of machine ASAP ● Approximately 500 machines ○ Low CPU workload ○ Low disk I/O workload ○ High network bandwidth ○ low latency ○ generously over-provisioned
  • 15. Real-time accumulation and aggregation ● Consumes event logs from pixel servers as fast as possible. ● Each server is effectively a shard of the whole "today" database ● Custom in-memory database updating continuously ● Serving API calls continuously ● Approximately 450 machines ○ CPU load nearly 100% ○ To swap is to die ○ High network bandwidth ○ low latency ○ generously over-provisioned
  • 16. What Didn't Work: MySQL ● Original DB choice ● Performed adequately when daily volume was < 1% of current volume ● Impossible to add new columns to tables ● Easier to create a new shard than to modify an existing one. ● New metrics being added every few weeks, or even days ● Dozens of shards, no consistency in their size
  • 17. What Didn't Work: Redshift ● Intended to compliment MySQL ● Performed adequately when daily volume was < 1% of current volume ● Needed subsecond response, was getting 30s+ response ● Was only machine that had a copy of data across all time ● HDD was slow, tried SSD instances, but had limited space ● Eventually got up to a 26 node cluster with 32 cores per node. ● Cannot distinguish a large query from a small one ● Had no insight into how the data was partitioned ● Reorganizing data according to AWS suggestions would have resulted in vacuums taking several days.
  • 18. What Didn't Work: Vertica ● Intended to compliment MySQL ● Good response times over larger data volumes ● Needed local disk to perform adequately, which limited disk size ● each cluster could only hold a few months of data ● 5 node clusters, 32 cores each. ● Could only have K-safety of 1, or else load took too long (2 hrs vs 10) ● Nodes failed daily, until glibc bug was fixed ● Expensive
  • 19. What Did Work: Postgres ● Migrated OLTP MySQL DB (which held some DW tables) ● Conversion took 2 weeks with 2 programmers ● Used mysql_fdw to create migration tables ● Triggers on tables to identify modified rows ● Moved read-only workloads to postgres instance ● Migrated read-write apps in stages ● Only downtime was in final cut-over ● Single 32 core EC2 with 1-2 physical read replicas
  • 20. What Did Work: Zipfian workloads ● Customers primarily care about data from today and last seven days ● About 85% of all API requests were in that date range ● Vanilla PostgreSQL instance, 32 cores, ample RAM, 5TB disk ● Data partitioned by day. Drop any partitions > 10 days old. ● Stores derivative data, so no need for backup and recovery strategy ● Focus on loading the data as quickly as possible each morning. ● Adjust apps to be aware that certain client's data is available earlier than others ● Codename: L7
  • 21. What Did Work: Getting cute with RAID ● Engineer discovered a quirk in AWS pricing of disk by size ● Could maximize IOPS by combining 30 small drives into a RAID-0 ● Same hardware as an L7 could now store ~40 days of data, but data growth meant that that figure would shrink with time ● Same strategy as L7, just adjusted for longer date coverage ● Codename: ○ L-Month? Would sound silly when X fell below 30 ○ L-More? Accurate but not catchy. ○ L-mo? ○ Elmo
  • 22. What Did Work: Typeahead search ● "Type-ahead" queries must return in < 100ms ● Such queries can be across arbitrary time range ● Scope of response is limited (screen real estate) ● Engineer discovers that our data compresses really well with TOAST ● Specialized instance to store all data at highest grain level, TOASTed ● pseudo-materialized views that aggregate data in search-friendly forms ● Use of "Dimension" tables as a form of compression on the matviews. ● Heavy btree_gin indexing on searchable terms and tokens in dimensions ● Single 32 core machine, abundant memory, 2 read replicas ● Rebuild from scratch would take days, so B&R strategy was needed
  • 23. What Did Work: TOASTing the Kitchen Sink ● Data usage patterns guaranteed that a client usually wants most of the data across their org for whatever date range is requested ● Putting such data in arrays guarantees TOASTing and compression. ● Compression shifts workload from scarce IOPS to abundant CPU ● Size of array chunks was heavily tuned for the EC2 instance type. ● Same RAID-0 as used in Elmo instance could now hold all customer data ● 5 32-core machines with an ETL-load sharing feature such that each one processes a client/day then shares it with other nodes ● Replaced all Redshift and Vertica instances ● Codename: Marjory (the all seeing, all knowing trash heap)
  • 24. What Did Work: Foreign Data Wrappers ● One FDW converted queries into API calls to the in-memory "today" database ● Another one used query quals to determine the set of client-dates that must be fetched ● All client data stored on S3 as both .csv.gz and a compressed SQLite db ● FDW starts web service, launches on lambda per sqlite file ● Lambda queries SQLite file, sends results to web service ● web service re-issues lambdas as needed, returns results to FDW ● Very good for queries across long date ranges ● Codename: Frackles (the name for background monster muppets)
  • 25. What Did Work: PMPP ● Poor Man's Parallel Processing ● Allows an application to issue multiple queries in parallel to multiple servers, provided all the queries have the same shape ● Returns data via a set returning function, which can then do secondary aggregation, joins, etc. ● Any machine that talks libpq could be queried (PgSQL, Vertica, Redshift) ● Allows for partial aggregation on DW boxes ● Secondary aggregation can occur on local machine
  • 26. What Did Work: Decanters ● A place to let the data "breathe" ● Abundant CPUs, abundant memory per CPU, minimal disk ● Very small lookup tables replicated for performance reasons ● All other local tables are FDWs to OLTP database ● Mostly executes aggregation queries that use PMPP to access: Statscache, Elmo, Marjory, Frackles, each one doing a local aggregation ● Final aggregation happens on decanter ● Can occasionally experience OOM (rather than on an important machine) ● New decanter can spin up and enter load balancer in 5 minutes ● No engineering time to be spent rescuing failed decanters
  • 27. Putting it all together with PostgreSQL Tagged Ads Viewable Events Pixel Servers Stats Aggregators S3 - CSVs S3 - SQLite Log shipping Daily Summaries Elmo Clusters Marjory Clusters Search Clusters Daily ETLs
  • 28. Putting it all together with PostgreSQL User Stats Requests Elmo Clusters Marjory Clusters S3 - SQLite PMPP Requests OLTP DB Third Party DW Search Clusters Searches Pg FDW Frackles FDW Pg FDW Decanters Live Stats Aggregators Stats-Cache FDW
  • 29. Why Not RDS? ● No ability to install custom extensions (esp Partitioning modules) ● No place to do local copy operations ● Reduced insight into the server load ● Reduced ability to tune pg server ● No ability to try beta versions ● Expense
  • 30. Why Not Aurora? ● Had early adopter access ● AWS Devs said that it wasn't geared for DW workloads ● Seems nice on I/O ● Nice not having to worry about which servers are read only ● Wasn't there yet ● Data volumes necessitate advanced partitioning ● Expense
  • 31. Why Not Athena? ● Athena had no concept of constraint exclusion to avoid reading irrelevant files ● Costs $5/TB of data read ● Most queries would cost > $100 each ● Running thousands of queries per hour