SlideShare una empresa de Scribd logo
1 de 35
Scaling Marketplaces at Thumbtack
QCon SF 2017
Nate Kupp
Technical Infrastructure
Data Eng, Experimentation, Platform
Infrastructure, Security, Dev Tools
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
thumbtack-scalability
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Infrastructure from early beginnings
“You see that?”
my dad would say,
“That won’t last more than a
few years.
They’re going to have to rip the
whole thing out within a decade”.
You have choices. A lot of them.
There’s no right
answer. Sorry.
ME: HARDWARE SOFTWARE
Semiconductor manufacturing research
Apple iOS battery life
Data Infrastructure @ Thumbtack
Technical Infrastructure @ Thumbtack
Data, core platform, A/B testing & dev tools
Yup, those are working transistors
Growth Hacking with Team Philippines
Customer
Pro A
Pro B
Pro C
I need a
plumber!
Is this a spam
request?
SPAM DETECTION
Can we automate
quoting?
“QUOTING SERVICE”
Which pros do we
invite to quote?
MATCHING
Hacky (especially in the early days) is fine, even
necessary
Hacky is going to happen. With or without you.
“It has long been an axiom of mine
that the little things are infinitely
the most important.”
- Sherlock Holmes
“Small data”
CSV files Upload Web UI
Analysts
“Big data”
Hive/Impala
Change doesn’t happen in a vacuum.
Give people a way to get things done while
you build or migrate.
Late 2015
Starting to hit
“real” scale
Growth @ Thumbtack
2009
Thumbtack
founded
2014
Raise
$130mm
Dec. 2014
I join
Thumbtack
Now
Continued
growth
Mid-2015
Early data infrastructure
2009 - 2014
The monolith
2015
Finally on AWS! Started
using DynamoDB, Golang
microservices
The Evolution of Thumbtack’s Infrastructure
Go servicesWebsite Website
Sqoop ETL
HDFS
Event Logging
Today
Mixed clouds,
fully-containerized
infrastructure, terraform
The Evolution of Thumbtack’s Infrastructure
Production Serving Infrastructure Data Infrastructure
Application Layer
(Docker on ECS)
PHP, Go, Scala
Storage Layer
PostgreSQL, DynamoDB,
Elasticsearch
Processing
Scala/Spark on Dataproc
Storage
GCS
SQL
BigQuery
pgbouncer
Master + Hot
standbys
Production Serving Infrastructure
ECS
Go services
services
Website
Latency
Cluster
Throughput
Cluster
...
Sqoop
Cluster
WAL-E Backups
EBS + ZFS
Sqoop ETL
Application
Layer Offline Data Processing
DynamoDB ETL
Indexing Pipelines
Event Logging
PG Logging
Storage
Layer
Application Layer
Storage
Cloud
Storage
Ingest
Cloud
Pub/Sub
Analytics
BigQuery
Data Infrastructure
Spark ETL
Cloud Dataproc
Airflow
Thrift/JSON
events
Search
Indexing
Pipelines
Pipeline
Orchestration
Full table
Sqoop ETL
Batch ETL
Cloud Dataproc
Job-scoped clusters on
Dataproc 1.1 / Spark 2.0.2,
Scala
Data Retention
Data retained forever
on GCS / BigQueryCustom DynamoDB ETL
Internal DynamoDB ETL
pipelines, scales up/down
read capacity
Sqoop
Cloud Dataproc
Pipelines
ML Pipelines
And it was great!
- Costs comparable to monolithic cluster
- Creating and destroying > 600 clusters per day
- > 12,000 BigQuery queries per workday
But, getting there took some real work
HDFS
Storage
Cloud
Storage
Analytics
BigQuery
Code Changes
- Add retries everywhere (many 500/503 error codes)
- Rewrite HDFS utilities
- Change all Parquet to Avro
- Educate team (100s of users) on new SQL dialect
and web interface
And, still needed to build interfaces to managed services
Ingest
Cloud
Pub/Sub
Go services
services
Website
Analytics
BigQuery
STREAMING PIPELINEDATA
INGEST
Spark Streaming
Cloud Dataproc
Spark Streaming
Wrote custom Spark receiver for Cloud
Pub/Sub, checkpointing WAL to local
HDFS, no backpressure (yet)
End-to-end Durability
Built internal solutions; achieving end-to-end
durability guarantees here is hard
BigQuery Streaming APIs
Streaming writes of O(100s of millions)
of events / day into BigQuery tables
+ Uptime: GCP uptime has actually been great
- Unexpected BigQuery API Changes: Google occasionally makes production changes
which break us. When these happen, we can’t really do anything until fixed by Google or a
workaround is identified.
“Here be dragons”
April 2017
BigQuery Avro Ingest API Changes
Previously, a field marked as required by the Avro
schema could be loaded into a table with the field
marked nullable; this started failing.
October 2017
BigQuery Sharded Export Changes
Noticed many hung Dataproc clusters. Team identified
workaround to disable BQ sharded export by setting
mapred.bq.input.sharded.export.enable to false.
Managed services make some things easier…
But only some things.
Go in with eyes wide open.
Our first managed service: DynamoDB
Go services
Website
DynamoDB table proliferation
GCP Migration & BigQuery
Why? A few reasons...
1. Fully-managed, serverless, petabyte-scale data
warehouse
2. Nested/complex data types
3. Security & access controls
4. Self-serve interface to Google Sheets
Surprise dependencies
Exception: BigQuery job failed. Final error was:
{
'reason': 'invalid',
'message':"Could not parse '.' as int for field category_id
(position 0) starting at location 340 ",
'location': '/gdrive/home/Category taxonomy - master file'
}
When you make it really easy to deploy infrastructure...
...you get A LOT of random infrastructure
What does that mean for Thumbtack & Serverless?
Hacky is going to happen. With or without you.
Empower the team while limiting the wild west.
QUESTIONS?
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
thumbtack-scalability

Más contenido relacionado

Más de C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

Más de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Último

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Scaling Marketplaces at Thumbtack

  • 1. Scaling Marketplaces at Thumbtack QCon SF 2017 Nate Kupp Technical Infrastructure Data Eng, Experimentation, Platform Infrastructure, Security, Dev Tools
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ thumbtack-scalability
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 5. “You see that?” my dad would say, “That won’t last more than a few years. They’re going to have to rip the whole thing out within a decade”.
  • 6. You have choices. A lot of them.
  • 8. ME: HARDWARE SOFTWARE Semiconductor manufacturing research Apple iOS battery life Data Infrastructure @ Thumbtack Technical Infrastructure @ Thumbtack Data, core platform, A/B testing & dev tools Yup, those are working transistors
  • 9.
  • 10.
  • 11. Growth Hacking with Team Philippines Customer Pro A Pro B Pro C I need a plumber! Is this a spam request? SPAM DETECTION Can we automate quoting? “QUOTING SERVICE” Which pros do we invite to quote? MATCHING
  • 12. Hacky (especially in the early days) is fine, even necessary
  • 13. Hacky is going to happen. With or without you.
  • 14. “It has long been an axiom of mine that the little things are infinitely the most important.” - Sherlock Holmes “Small data” CSV files Upload Web UI Analysts “Big data” Hive/Impala
  • 15. Change doesn’t happen in a vacuum. Give people a way to get things done while you build or migrate.
  • 16. Late 2015 Starting to hit “real” scale Growth @ Thumbtack 2009 Thumbtack founded 2014 Raise $130mm Dec. 2014 I join Thumbtack Now Continued growth
  • 17. Mid-2015 Early data infrastructure 2009 - 2014 The monolith 2015 Finally on AWS! Started using DynamoDB, Golang microservices The Evolution of Thumbtack’s Infrastructure Go servicesWebsite Website Sqoop ETL HDFS Event Logging
  • 18. Today Mixed clouds, fully-containerized infrastructure, terraform The Evolution of Thumbtack’s Infrastructure Production Serving Infrastructure Data Infrastructure Application Layer (Docker on ECS) PHP, Go, Scala Storage Layer PostgreSQL, DynamoDB, Elasticsearch Processing Scala/Spark on Dataproc Storage GCS SQL BigQuery
  • 19. pgbouncer Master + Hot standbys Production Serving Infrastructure ECS Go services services Website Latency Cluster Throughput Cluster ... Sqoop Cluster WAL-E Backups EBS + ZFS Sqoop ETL Application Layer Offline Data Processing DynamoDB ETL Indexing Pipelines Event Logging PG Logging Storage Layer
  • 20. Application Layer Storage Cloud Storage Ingest Cloud Pub/Sub Analytics BigQuery Data Infrastructure Spark ETL Cloud Dataproc Airflow Thrift/JSON events Search Indexing Pipelines Pipeline Orchestration Full table Sqoop ETL Batch ETL Cloud Dataproc Job-scoped clusters on Dataproc 1.1 / Spark 2.0.2, Scala Data Retention Data retained forever on GCS / BigQueryCustom DynamoDB ETL Internal DynamoDB ETL pipelines, scales up/down read capacity Sqoop Cloud Dataproc Pipelines ML Pipelines
  • 21. And it was great! - Costs comparable to monolithic cluster - Creating and destroying > 600 clusters per day - > 12,000 BigQuery queries per workday
  • 22. But, getting there took some real work HDFS Storage Cloud Storage Analytics BigQuery Code Changes - Add retries everywhere (many 500/503 error codes) - Rewrite HDFS utilities - Change all Parquet to Avro - Educate team (100s of users) on new SQL dialect and web interface
  • 23. And, still needed to build interfaces to managed services Ingest Cloud Pub/Sub Go services services Website Analytics BigQuery STREAMING PIPELINEDATA INGEST Spark Streaming Cloud Dataproc Spark Streaming Wrote custom Spark receiver for Cloud Pub/Sub, checkpointing WAL to local HDFS, no backpressure (yet) End-to-end Durability Built internal solutions; achieving end-to-end durability guarantees here is hard BigQuery Streaming APIs Streaming writes of O(100s of millions) of events / day into BigQuery tables
  • 24. + Uptime: GCP uptime has actually been great - Unexpected BigQuery API Changes: Google occasionally makes production changes which break us. When these happen, we can’t really do anything until fixed by Google or a workaround is identified. “Here be dragons” April 2017 BigQuery Avro Ingest API Changes Previously, a field marked as required by the Avro schema could be loaded into a table with the field marked nullable; this started failing. October 2017 BigQuery Sharded Export Changes Noticed many hung Dataproc clusters. Team identified workaround to disable BQ sharded export by setting mapred.bq.input.sharded.export.enable to false.
  • 25. Managed services make some things easier… But only some things. Go in with eyes wide open.
  • 26. Our first managed service: DynamoDB Go services Website
  • 28. GCP Migration & BigQuery Why? A few reasons... 1. Fully-managed, serverless, petabyte-scale data warehouse 2. Nested/complex data types 3. Security & access controls 4. Self-serve interface to Google Sheets
  • 29. Surprise dependencies Exception: BigQuery job failed. Final error was: { 'reason': 'invalid', 'message':"Could not parse '.' as int for field category_id (position 0) starting at location 340 ", 'location': '/gdrive/home/Category taxonomy - master file' }
  • 30. When you make it really easy to deploy infrastructure... ...you get A LOT of random infrastructure
  • 31. What does that mean for Thumbtack & Serverless?
  • 32. Hacky is going to happen. With or without you.
  • 33. Empower the team while limiting the wild west.
  • 35. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ thumbtack-scalability