SlideShare una empresa de Scribd logo
1 de 24
Living with SQL and
NoSQL at Craigslist
      Jeremy Zawodny
          craigslist
There is no stack
     anymore...
-- Mårten Mickos during Wednesday’s Keynote
Data Storage at craigslist
• MySQL
• Memcached
• Redis
• MongoDB
• Sphinx
• Filesystem
Choosing the Right Tool
• Durability
• Performance
• Query API
• Features
• Complexity
• Support
Request Flow (reads)
Browser                       Load Balancer                       Caching Proxy
         Posting, Search, Browse                                  Perl+epoll      Memcached

                                                                        Proxy Cache


     Web Server                                                   Async Services
Apache      mod_perl     Memcached                                Perl+epoll      Memcached

         Posting Cache


                                                                        haproxy


  MongoDB                                   Sphinx                        MySQL
 Archived Postings                   Live and Archived Postings           Live Postings
Request Flow (reads)
Browser                Load Balancer                   Caching Proxy
      Image Requests                                   Perl+epoll    Memcached

                                                             Proxy Cache




                            Image Storage
                        Apache   mod_perl   xfs+JBOD
Data Repositories
   MongoDB                      MySQL                 Filesystem
OldPostings   Email Meta    Postings      Finance    Images      Logs


                             Users       Misc Meta

                             Abuse      WorkQueue

                             Stats      Monitoring



                                     Redis
 Memcached                  Counters         Lists        Sphinx
 Counters      Postings      Blobs      Monitoring   Postings   Internal

  Blobs        Objects     WorkQueue                 Forums     Archive
MySQL at craigslist
•   Vertical Partitioning: Clusters
    •   auth/users, abuse/spam, postings, finance
•   Sub-partitioning: Roles
    •   master, read, long read, dumper, thrash
•   Lots of SSD storage (mostly fusion-io)
    •   solved most of our performance problems
•   Few manual tasks
    •   re-cloning slaves, master swaps
MySQL at craigslist
• MySQL 5.5.x
 • hoping to move to 5.6.x
    • GTID + crash-safe slaves?!?!
• InnoDB almost everywhere
 • InnoDB compression where it works well
 • Large buffer pool (48GB common)
• haproxy sits between clients and servers
MySQL at Craigslist
      Postings Database Cluster




                                       long read

                                                   long read




                                                                        dumper
                                                               thrash
   write




                                read
           read

                  read

                         read




                           haproxy

                           client(s)
Why MySQL?
•   It’s the devil we know!
    •   Very reliable
    •   Lots of Admin and Dev skills
•   Durability
•   Replication
•   Support
    •   Seriously, look at this ecosystem
•   Data Model
Why memcached?
• Wickedly Fast
• Stable
• Virtually zero administration required
• Easily co-exists with CPU-intensive services
• Muti-core? Run more instances!
Memcached at craigslist
• Primary cache for rendered pages
  (compresed and full), serialized objects, and
  misc. other data
• Used for lots of transient data blobs (and
  occasional counters)
• Custom async client library
 • Some key encoding issues
• Durability via client-side mirroring (think
  RAID-1)
Redis at craigslist
• Primary repository of posting activity
  metadata used in analysis tasks
• Remote replication in 2nd data center
• 80+% of data in sorted sets (ZSETS)
• Sharded multi-node cluster
 • See: http://bit.ly/I4XUCj
Why Redis?
• Features
• Performance
• Flexible Persistence
• Excellent but simple API
• Project Vision
• Muti-core? Run more instances!
MongoDB at craigslist
•   Repository of 2.5+ billion archived postings
    •   growing and growing and growing
•   3 shards across 3 node replica sets
    •   duplicate config in 2nd data center
•   ~6TB of data, sized up to 12TB
•   Biggest challenge was data migration
•   Previous talks:
    •   http://bit.ly/HEYJ57 (before)
    •   http://bit.ly/Hr2qMf (after)
Why MongoDB?
• Schema free
• Active community
• Commercial support
• Perl client!
• Ease of scaling
  • Yay! for built-in sharding support
• Fewer single points of failure
  • Replica sets are awesome
Sphinx at craigslist
• Full-text indexing and search of
 • all live postings
 • all archived postings
 • all forums (in progress)
• 300+ million daily queries
Why Sphinx?
• Performance
• Friendly API
• Flexibility in deployment model
• Commercial support
Filesystem at craigslist
• All uploaded images are stored in XFS
• Multiple image sizes, resized upon upload
Why Filesystem?
• Reliable (and Simple)
 • We use XFS for images and databases
 • Proven technology
• Fast
 • Some other filesystems have had
    performance issues
• Easy to move data around
• No other metadata/indexes to worry about
So Many Data Stores...
• Can be hard for developers if you don’t have
  good APIs or abstractions in place!
  • We built an object layer for our MongoDB
    migration
  • It speaks MySQL, Sphinx, MongoDB,
    Memcached
• Relational vs. Non-Relational?
  • In practice, we often just don’t care
  • NoSQL is a stupid label
Craigslist Tech FAQs
• Self-hosted (no virtualization or “cloud”)
• Mix of hardware (2 main vendors)
 • Blades
 • Larger multi-U multi-disk RAID boxes
• Mostly local storage (SAN for backups)
• Virtually all open source infrastructure
  tools
• Famously small (but growing) tech team
Craigslist is Hiring!
• Developers
 • Back-end
 • Front-end
• Systems Administrators
• Network Engineers
• Email: z@craiglist.org plain text resume!

Más contenido relacionado

La actualidad más candente

Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL.
Anastasia Lubennikova
 
MongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDBMongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDB
ippei_suzuki
 

La actualidad más candente (20)

ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL.
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
JapanTaxiにおけるML Ops 〜機械学習の開発運用プロセス〜
JapanTaxiにおけるML Ops 〜機械学習の開発運用プロセス〜JapanTaxiにおけるML Ops 〜機械学習の開発運用プロセス〜
JapanTaxiにおけるML Ops 〜機械学習の開発運用プロセス〜
 
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
 
ドメインオブジェクトの見つけ方・作り方・育て方
ドメインオブジェクトの見つけ方・作り方・育て方ドメインオブジェクトの見つけ方・作り方・育て方
ドメインオブジェクトの見つけ方・作り方・育て方
 
MongoDB
MongoDBMongoDB
MongoDB
 
グラフモデルとSoEとGraphQL データ指向アプリケーションデザインから見るGraphQL
グラフモデルとSoEとGraphQL データ指向アプリケーションデザインから見るGraphQLグラフモデルとSoEとGraphQL データ指向アプリケーションデザインから見るGraphQL
グラフモデルとSoEとGraphQL データ指向アプリケーションデザインから見るGraphQL
 
実践!Django + GraphQL 実装
実践!Django + GraphQL 実装実践!Django + GraphQL 実装
実践!Django + GraphQL 実装
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesData Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive Approaches
 
Cookpad TechConf 2016 - DWHに必要なこと
Cookpad TechConf 2016 - DWHに必要なことCookpad TechConf 2016 - DWHに必要なこと
Cookpad TechConf 2016 - DWHに必要なこと
 
RDRAにおける合意形成の仕組み
RDRAにおける合意形成の仕組みRDRAにおける合意形成の仕組み
RDRAにおける合意形成の仕組み
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
 
MongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDBMongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDB
 
SQL Server コンテナ入門(Kubernetes編)
SQL Server コンテナ入門(Kubernetes編)SQL Server コンテナ入門(Kubernetes編)
SQL Server コンテナ入門(Kubernetes編)
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
[db tech showcase Tokyo 2016] A25: ACIDトランザクションをサポートするエンタープライズ向けNoSQL Databas...
[db tech showcase Tokyo 2016] A25: ACIDトランザクションをサポートするエンタープライズ向けNoSQL Databas...[db tech showcase Tokyo 2016] A25: ACIDトランザクションをサポートするエンタープライズ向けNoSQL Databas...
[db tech showcase Tokyo 2016] A25: ACIDトランザクションをサポートするエンタープライズ向けNoSQL Databas...
 
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法 ※講演は翻訳資料にて行います。 - Getting the Best...
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法  ※講演は翻訳資料にて行います。 - Getting the Best...PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法  ※講演は翻訳資料にて行います。 - Getting the Best...
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法 ※講演は翻訳資料にて行います。 - Getting the Best...
 

Destacado

Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the Numbers
Devin Foley
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQL
mwasaha mwagambo
 

Destacado (20)

Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Webinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBWebinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDB
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the Numbers
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searches
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
 
Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.
 
Red Box Commerce Shopping Cart
Red Box Commerce Shopping CartRed Box Commerce Shopping Cart
Red Box Commerce Shopping Cart
 
Shopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesShopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web Sites
 
Tayra
TayraTayra
Tayra
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 
SphinxSearch
SphinxSearchSphinxSearch
SphinxSearch
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinx
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQL
 
Social Media Trends - Content Curation
Social Media Trends - Content CurationSocial Media Trends - Content Curation
Social Media Trends - Content Curation
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 

Similar a Living with SQL and NoSQL at craigslist, a Pragmatic Approach

Redis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicRedis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - Omnilogic
Felipe Guimarães
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
MySQLConference
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
Heriyadi Janwar
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similar a Living with SQL and NoSQL at craigslist, a Pragmatic Approach (20)

High Performance Drupal Sites
High Performance Drupal SitesHigh Performance Drupal Sites
High Performance Drupal Sites
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Redis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicRedis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - Omnilogic
 
Drop acid
Drop acidDrop acid
Drop acid
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Living with SQL and NoSQL at craigslist, a Pragmatic Approach

  • 1. Living with SQL and NoSQL at Craigslist Jeremy Zawodny craigslist
  • 2. There is no stack anymore... -- Mårten Mickos during Wednesday’s Keynote
  • 3. Data Storage at craigslist • MySQL • Memcached • Redis • MongoDB • Sphinx • Filesystem
  • 4. Choosing the Right Tool • Durability • Performance • Query API • Features • Complexity • Support
  • 5. Request Flow (reads) Browser Load Balancer Caching Proxy Posting, Search, Browse Perl+epoll Memcached Proxy Cache Web Server Async Services Apache mod_perl Memcached Perl+epoll Memcached Posting Cache haproxy MongoDB Sphinx MySQL Archived Postings Live and Archived Postings Live Postings
  • 6. Request Flow (reads) Browser Load Balancer Caching Proxy Image Requests Perl+epoll Memcached Proxy Cache Image Storage Apache mod_perl xfs+JBOD
  • 7. Data Repositories MongoDB MySQL Filesystem OldPostings Email Meta Postings Finance Images Logs Users Misc Meta Abuse WorkQueue Stats Monitoring Redis Memcached Counters Lists Sphinx Counters Postings Blobs Monitoring Postings Internal Blobs Objects WorkQueue Forums Archive
  • 8. MySQL at craigslist • Vertical Partitioning: Clusters • auth/users, abuse/spam, postings, finance • Sub-partitioning: Roles • master, read, long read, dumper, thrash • Lots of SSD storage (mostly fusion-io) • solved most of our performance problems • Few manual tasks • re-cloning slaves, master swaps
  • 9. MySQL at craigslist • MySQL 5.5.x • hoping to move to 5.6.x • GTID + crash-safe slaves?!?! • InnoDB almost everywhere • InnoDB compression where it works well • Large buffer pool (48GB common) • haproxy sits between clients and servers
  • 10. MySQL at Craigslist Postings Database Cluster long read long read dumper thrash write read read read read haproxy client(s)
  • 11. Why MySQL? • It’s the devil we know! • Very reliable • Lots of Admin and Dev skills • Durability • Replication • Support • Seriously, look at this ecosystem • Data Model
  • 12. Why memcached? • Wickedly Fast • Stable • Virtually zero administration required • Easily co-exists with CPU-intensive services • Muti-core? Run more instances!
  • 13. Memcached at craigslist • Primary cache for rendered pages (compresed and full), serialized objects, and misc. other data • Used for lots of transient data blobs (and occasional counters) • Custom async client library • Some key encoding issues • Durability via client-side mirroring (think RAID-1)
  • 14. Redis at craigslist • Primary repository of posting activity metadata used in analysis tasks • Remote replication in 2nd data center • 80+% of data in sorted sets (ZSETS) • Sharded multi-node cluster • See: http://bit.ly/I4XUCj
  • 15. Why Redis? • Features • Performance • Flexible Persistence • Excellent but simple API • Project Vision • Muti-core? Run more instances!
  • 16. MongoDB at craigslist • Repository of 2.5+ billion archived postings • growing and growing and growing • 3 shards across 3 node replica sets • duplicate config in 2nd data center • ~6TB of data, sized up to 12TB • Biggest challenge was data migration • Previous talks: • http://bit.ly/HEYJ57 (before) • http://bit.ly/Hr2qMf (after)
  • 17. Why MongoDB? • Schema free • Active community • Commercial support • Perl client! • Ease of scaling • Yay! for built-in sharding support • Fewer single points of failure • Replica sets are awesome
  • 18. Sphinx at craigslist • Full-text indexing and search of • all live postings • all archived postings • all forums (in progress) • 300+ million daily queries
  • 19. Why Sphinx? • Performance • Friendly API • Flexibility in deployment model • Commercial support
  • 20. Filesystem at craigslist • All uploaded images are stored in XFS • Multiple image sizes, resized upon upload
  • 21. Why Filesystem? • Reliable (and Simple) • We use XFS for images and databases • Proven technology • Fast • Some other filesystems have had performance issues • Easy to move data around • No other metadata/indexes to worry about
  • 22. So Many Data Stores... • Can be hard for developers if you don’t have good APIs or abstractions in place! • We built an object layer for our MongoDB migration • It speaks MySQL, Sphinx, MongoDB, Memcached • Relational vs. Non-Relational? • In practice, we often just don’t care • NoSQL is a stupid label
  • 23. Craigslist Tech FAQs • Self-hosted (no virtualization or “cloud”) • Mix of hardware (2 main vendors) • Blades • Larger multi-U multi-disk RAID boxes • Mostly local storage (SAN for backups) • Virtually all open source infrastructure tools • Famously small (but growing) tech team
  • 24. Craigslist is Hiring! • Developers • Back-end • Front-end • Systems Administrators • Network Engineers • Email: z@craiglist.org plain text resume!

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n