SlideShare una empresa de Scribd logo
1 de 72
Descargar para leer sin conexión
Data Modeling for
                 Performance


Mongo Boulder                 Michael Dwan
January 21, 2010                     Snapjoy
i’m michael dwan
 @michaeldwan on the twitter
the project
  Company X
• find business details (web + api)
• search by category/keyword + geo (web + api)
• update (api)



                                   application spec
100,000             30,000
                                 100,000,000
geo areas                              tags
                   partners

                                    2,300
   15,000,000                     categories

       businesses
                              2,000,000
                              requests daily
24,000,000
 urls in sitemap
                          why is this interesting?
• infrequent changes
• monthly updates w/ 12M monthly changes
• “zero downtime”



                                           updates
the problem
 mo’ data, mo’ problems
complexity
providers          mappings                phone_numbers

                                                                          zips
 assets

                               businesses _phone_numbers

                                                                         cities
categorizations




                             businesses
                                                                         states
  categories


                                                           businesses_neighborhoods
                  taggings



                                    users
    tags                                                        neighborhoods
x
xx   x
     architecture
read performance
dow
   n ti
       me
solr
solr getting fussy
dow
      n ti
          me
migrations
the solution
> gem install acts_as_web_scale
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
}




                                        a business...
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
}




            a business... has many phone numbers
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
      "5035550091",
      "8005555456"
    ]
}


            a business... has many phone numbers
"_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
      "5035550091",
      "8005555456"
    ]
}




                      a business... has coordinates
"_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ]
}



                      a business... has coordinates
"url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ]
}




                        a business... has many tags
"url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ]
}



                        a business... has many tags
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ]
}




                        a business... has an address
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                         a business... has an address
belongs to?
{
    "_id" : ObjectId("4ce82937961552247900000f"),
    "name" : "Illinois",
    "slug" : "il",
    ...
}




                                             a state
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                     a business... belongs to a state
"tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                     a business... belongs to a state
"tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St",
       "state" : {
         "_id" : ObjectId("4ce829379615522479000026"),
         "meta" : {
            "slug" : "or"
         },
         "display_name" : "Oregon"
       }
    }
}


                     a business... belongs to a state
"state" : {
          "_id" : ObjectId("4ce829379615522479000026"),
          "meta" : {
             "slug" : "or"
          },
          "display_name" : "Oregon"
        }
    }
}




                          a business... belongs to a city
"state" : {
           "_id" : ObjectId("4ce829379615522479000026"),
           "meta" : {
              "slug" : "or"
           },
           "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
    }
}

                          a business... belongs to a city
},
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
    }
}




                     a business... belongs to a zip code
},
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
        "zip" : {
           "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
           "display_name" : "97211"
        }
    }
}

                     a business... belongs to a zip code
many-to-many?
{
    "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
    "name" : "Auto Glass",
    "slug" : "3063-auto-glass",
    "tags" : [
       "windshields"
    ],
    ...
}




                                       a category
"meta" : {
             "slug" : "or"
          },
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
        "zip" : {
           "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
           "display_name" : "97211"
        }
    }
}




                        a business... belongs to a zip code
}
    }
}




            a business... belongs to many categories
}
    },
    "categories" : [
       {
          "_id" : ObjectId("4ce82e50d3dfaa16360004f2"),
          "meta" : {
             "slug" : "282-glass",
             "tags" : [ "windows" ],
          },
          "display_name" : "Glass"
       },
       {
          "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
          "meta" : {
             "slug" : "3063-auto-glass",
             "tags" : [ "windshields" ],
          },
          "display_name" : "Auto Glass"
       }
    ]
}

               a business... belongs to many categories
queries & indexes
    know what you want
#1 find a business
    I want *that* one
// single business
db.businesses.findOne({
   _id: ObjectId("4ce838ef4a882579960001b9")
})




                                 find a business
#2 find by location
  Businesses in San Francisco, CA
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})




                       find businesses by state/city/zip
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})

// find all within city
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
})




                       find businesses by state/city/zip
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})

// find all within city
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
})

// find all within zip
db.businesses.find({
   "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")
})




                       find businesses by state/city/zip
// the indexes
db.businesses.ensureIndex({"location.city._id": 1})
db.businesses.ensureIndex({"location.zip._id": 1})



                         1.5GB
                          each




    skip “location.state._id” -- only 51 possibilities


                                                 indexes
#3 find by category
 Businesses in the Auto Repair category
// find by category id
db.businesses.find({
   "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
})


// the index
db.businesses.ensureIndex({
   "categories._id":1
})




                               businesses by category
#4 - find by category + location
   Businesses in the Plumbing category in Chicago, IL
// find by city id and category id
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"),
   "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
})




                         businesses by category + city
// city id
 {"location.city._id":1}


         ~ or ~

  // category id
  {"categories._id":1}




 answer: both suck
we need a compound index


         which index should we use?
db.businesses.ensureIndex({
    "location.city._id" : 1, "categories._id" : 1
 })

                     ~ or ~
 db.businesses.ensureIndex({
    "categories._id" : 1, "location.city._id" : 1
 })


      35,000 cities & 2,500 categories


   answer: cities ! categories
create one for zip codes and categories too!

                                          which order?
{"location.city._id" : 1}
 {"location.city._id" : 1, "categories._id" : 1}




                 answer: yes

db.businesses.dropIndex("location.city._id_1")




              don’t we have 2 indexes on city id?
#5 - find by keyword
  “something awesome” in Boulder, CO
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "keywords" : [
      "glass",
      "repair",
      "acme",
      ...
    ]
}



db.businesses.ensureIndex({
   "location.city._id":1,
   "keywords":1
})



db.businesses.find({
   "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"),
   "keywords":/glass/i
})




             find businesses in city by keyword
me: we’re switching from postgres+solr to mongo
kyle: oh wow, you can replace solr with mongo?
me: with some creativity
kyle: seems like it’d still be hard to get just right
me: it works well
kyle: gotcha



                                chat with Kyle Banker
i was wrong, kyle was right
I




        I’ll never leave you again

...until MongoDB supports full text later this year
                      :)
aggregation
map/reduce to the rescue
sitemaps
big list of every url
• xml files containing each unique url ~ 24M
• 50,000 urls per file, about 500 files
• urls are generated from live data
• http://companyx.com/sitemaps/1.xml


                                              sitemaps
>> "hello!".hash % 6 #=> 5

>> "/ny/new-york/c/apartments".hash % 6 #=> 5




    returns an integer between 0 and the
              number specified




                   partition by consistent hash
1. map each url in the site to a partition
2. reduce all partitions to a single document containing
   all urls in that partition
3. save to a permanent collection




                                             map/reduce
/il/chicago/c/pizza                      4
                                             1
/ny/new-york/c/apartments                1
nd/rugby/c/apartments                    6   2
/14076500-bayside-marina                 2
/13401000-comtrak-logistics-inc          3   3
/12347500-allstate-auto-insurance        1
il/downers-grove/c/computer-web-design   6   4
/1009500-heidelberg-lodges               5
mn/redwood-falls/c/food-service          4   5
/14077000-bank-of-america                5
mn/savage/c/audio-visual-equipment       1   6
...


                                             map
{
                                             {
    "total" : 2,
                                                 "total" : 1,
    "urls" : [
                                                 "urls" : [
      "/12347500-allstate-auto-insurance",
                                                   "/mn/savage/c/audio-visual-equipment"
      "/ny/new-york/c/apartments"
                                                 ]
    ]
                                             }
}




         {
             "_id" : 1,
             "value" : {
               "total" : 2,
               "urls" : [
                 "/12347500-allstate-auto-insurance",
                 "/mn/savage/c/audio-visual-equipment",
                 "/ny/new-york/c/apartments"
               ]
             }
         }                                                                       reduce
db.sitemaps.findOne({_id:1}).value.urls




[
    "/12347500-allstate-auto-insurance",
    "/mn/savage/c/audio-visual-equipment",
    "/ny/new-york/c/apartments"
]




                                             usage
wrap up
115ms average response times


                        2 months later
thank you
 @michaeldwan

Más contenido relacionado

Destacado (8)

Keeping data-safe-webinar-2010-11-01
Keeping data-safe-webinar-2010-11-01Keeping data-safe-webinar-2010-11-01
Keeping data-safe-webinar-2010-11-01
 
Schema design short
Schema design shortSchema design short
Schema design short
 
Indexing and Query Optimizer
Indexing and Query OptimizerIndexing and Query Optimizer
Indexing and Query Optimizer
 
Building a Mongo DSL in Scala at Hot Potato
Building a Mongo DSL in Scala at Hot PotatoBuilding a Mongo DSL in Scala at Hot Potato
Building a Mongo DSL in Scala at Hot Potato
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Mongo Berlin - Mastering the Shell
Mongo Berlin - Mastering the ShellMongo Berlin - Mastering the Shell
Mongo Berlin - Mastering the Shell
 
Securing Data in MongoDB with Gazzang and Chef
Securing Data in MongoDB with Gazzang and ChefSecuring Data in MongoDB with Gazzang and Chef
Securing Data in MongoDB with Gazzang and Chef
 
Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)
 

Similar a Modeling for Performance

Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Is
oscon2007
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 

Similar a Modeling for Performance (20)

Rich Results and Structured Data
Rich Results and Structured DataRich Results and Structured Data
Rich Results and Structured Data
 
Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japan
 
MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA
 
Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
 
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesBig Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Is
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
MongoDB With Style
MongoDB With StyleMongoDB With Style
MongoDB With Style
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and Indexing
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Designing Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppDesigning Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad App
 
Interactive analytics at scale with druid
Interactive analytics at scale with druidInteractive analytics at scale with druid
Interactive analytics at scale with druid
 
WordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsWordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin plugins
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
 
JSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestJSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa Techfest
 

Más de MongoDB

Más de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Modeling for Performance

  • 1. Data Modeling for Performance Mongo Boulder Michael Dwan January 21, 2010 Snapjoy
  • 2. i’m michael dwan @michaeldwan on the twitter
  • 3. the project Company X
  • 4. • find business details (web + api) • search by category/keyword + geo (web + api) • update (api) application spec
  • 5. 100,000 30,000 100,000,000 geo areas tags partners 2,300 15,000,000 categories businesses 2,000,000 requests daily 24,000,000 urls in sitemap why is this interesting?
  • 6. • infrequent changes • monthly updates w/ 12M monthly changes • “zero downtime” updates
  • 7. the problem mo’ data, mo’ problems
  • 9. providers mappings phone_numbers zips assets businesses _phone_numbers cities categorizations businesses states categories businesses_neighborhoods taggings users tags neighborhoods
  • 10. x xx x architecture
  • 12. dow n ti me solr
  • 14. dow n ti me migrations
  • 16. > gem install acts_as_web_scale
  • 17.
  • 18.
  • 19. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", } a business...
  • 20. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", } a business... has many phone numbers
  • 21. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ] } a business... has many phone numbers
  • 22. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ] } a business... has coordinates
  • 23. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ] } a business... has coordinates
  • 24. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ] } a business... has many tags
  • 25. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ] } a business... has many tags
  • 26. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ] } a business... has an address
  • 27. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... has an address
  • 29. { "_id" : ObjectId("4ce82937961552247900000f"), "name" : "Illinois", "slug" : "il", ... } a state
  • 30. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... belongs to a state
  • 31. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... belongs to a state
  • 32. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } } } a business... belongs to a state
  • 33. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } } } a business... belongs to a city
  • 34. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, } } a business... belongs to a city
  • 35. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, } } a business... belongs to a zip code
  • 36. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } } } a business... belongs to a zip code
  • 38. { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "name" : "Auto Glass", "slug" : "3063-auto-glass", "tags" : [ "windshields" ], ... } a category
  • 39. "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } } } a business... belongs to a zip code
  • 40. } } } a business... belongs to many categories
  • 41. } }, "categories" : [ { "_id" : ObjectId("4ce82e50d3dfaa16360004f2"), "meta" : { "slug" : "282-glass", "tags" : [ "windows" ], }, "display_name" : "Glass" }, { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "meta" : { "slug" : "3063-auto-glass", "tags" : [ "windshields" ], }, "display_name" : "Auto Glass" } ] } a business... belongs to many categories
  • 42. queries & indexes know what you want
  • 43. #1 find a business I want *that* one
  • 44. // single business db.businesses.findOne({ _id: ObjectId("4ce838ef4a882579960001b9") }) find a business
  • 45. #2 find by location Businesses in San Francisco, CA
  • 46. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) find businesses by state/city/zip
  • 47. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) // find all within city db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95") }) find businesses by state/city/zip
  • 48. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) // find all within city db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95") }) // find all within zip db.businesses.find({ "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0") }) find businesses by state/city/zip
  • 49. // the indexes db.businesses.ensureIndex({"location.city._id": 1}) db.businesses.ensureIndex({"location.zip._id": 1}) 1.5GB each skip “location.state._id” -- only 51 possibilities indexes
  • 50. #3 find by category Businesses in the Auto Repair category
  • 51. // find by category id db.businesses.find({ "categories._id": ObjectId("4ce82e50d3dfaa16360004f2") }) // the index db.businesses.ensureIndex({ "categories._id":1 }) businesses by category
  • 52. #4 - find by category + location Businesses in the Plumbing category in Chicago, IL
  • 53. // find by city id and category id db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"), "categories._id": ObjectId("4ce82e50d3dfaa16360004f2") }) businesses by category + city
  • 54. // city id {"location.city._id":1} ~ or ~ // category id {"categories._id":1} answer: both suck we need a compound index which index should we use?
  • 55. db.businesses.ensureIndex({ "location.city._id" : 1, "categories._id" : 1 }) ~ or ~ db.businesses.ensureIndex({ "categories._id" : 1, "location.city._id" : 1 }) 35,000 cities & 2,500 categories answer: cities ! categories create one for zip codes and categories too! which order?
  • 56. {"location.city._id" : 1} {"location.city._id" : 1, "categories._id" : 1} answer: yes db.businesses.dropIndex("location.city._id_1") don’t we have 2 indexes on city id?
  • 57. #5 - find by keyword “something awesome” in Boulder, CO
  • 58. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "keywords" : [ "glass", "repair", "acme", ... ] } db.businesses.ensureIndex({ "location.city._id":1, "keywords":1 }) db.businesses.find({ "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"), "keywords":/glass/i }) find businesses in city by keyword
  • 59. me: we’re switching from postgres+solr to mongo kyle: oh wow, you can replace solr with mongo? me: with some creativity kyle: seems like it’d still be hard to get just right me: it works well kyle: gotcha chat with Kyle Banker
  • 60. i was wrong, kyle was right
  • 61. I I’ll never leave you again ...until MongoDB supports full text later this year :)
  • 63. sitemaps big list of every url
  • 64. • xml files containing each unique url ~ 24M • 50,000 urls per file, about 500 files • urls are generated from live data • http://companyx.com/sitemaps/1.xml sitemaps
  • 65. >> "hello!".hash % 6 #=> 5 >> "/ny/new-york/c/apartments".hash % 6 #=> 5 returns an integer between 0 and the number specified partition by consistent hash
  • 66. 1. map each url in the site to a partition 2. reduce all partitions to a single document containing all urls in that partition 3. save to a permanent collection map/reduce
  • 67. /il/chicago/c/pizza 4 1 /ny/new-york/c/apartments 1 nd/rugby/c/apartments 6 2 /14076500-bayside-marina 2 /13401000-comtrak-logistics-inc 3 3 /12347500-allstate-auto-insurance 1 il/downers-grove/c/computer-web-design 6 4 /1009500-heidelberg-lodges 5 mn/redwood-falls/c/food-service 4 5 /14077000-bank-of-america 5 mn/savage/c/audio-visual-equipment 1 6 ... map
  • 68. { { "total" : 2, "total" : 1, "urls" : [ "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment" "/ny/new-york/c/apartments" ] ] } } { "_id" : 1, "value" : { "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] } } reduce
  • 69. db.sitemaps.findOne({_id:1}).value.urls [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] usage
  • 71. 115ms average response times 2 months later