SlideShare una empresa de Scribd logo
1 de 20
SPS – Scale to 15k RPS
      Patrice Pelland
        Microsoft
Overview and Goals of SPS
• SPS (Shared Personalization Service)
• It is a backend storage and service
• Enables following scenarios:
  • Explicit personalization
  • Implicit content optimization
  • Geo based customization
Scenario #1
Scenario#1 – WL Anonymous ID and Machine
Anonymous ID - based Explicit Personalization
Examples: Locations for weather, news, events, favorite
sports team, personal shopping list, customized page
settings, etc.
Scenario #2
Scenario#2 – WL Anonymous ID and Machine
Anonymous ID - based Implicit Content
Optimization
Examples: User demographic & behavior based content
optimizations and/or personalization (e.g. personal
recommendation)
Scenario #3
Scenario#3 – GEO based customization
SPS provides a Geolookup service that allows partner to
enable IP based customizations (e.g. default location,
Location based contents, GEO fencing, etc.)
Scaling? Availability? Perf?
• Why? 150 Million users visit US Home Page /
  month and with peeks of 15,000 RPS and up
  to 75 million users on other HP.
• Latency goals: Read < 25 ms – update < 50 ms
• Pages have to be up - $$$ loss if not
• Need to be stateless
Overall Architecture
                     SPS
                                                                                           AppFabric Cache
                               SPS FE Cluster                                                  Cluster

                                                                                                 Cache
                              SPSAdapter                                                          Cache
                                                                                                  Box
 CMS Rendering                                                                                     Cache
                               (SPS MSN              Geo
                                                                                                   Box
                                                                                                     Cache
                                                                                                    Box
    System                    CMS service           Service   Cached Data                             Box
                                wrapper)

                       Load
                       Bala                                                                                     Lookup
                       ncer                                                                                     System
                                                    Cache
                                                    Access
Partner web server
                                            SPS                Database Access
                               WCFService
                                            Logic
                                                    Webstor
                                                     e DB
                                                    Access




                                                                            Webstore Config
                                                                               Server                        Database
                                                                                                             Partitions
                                                                             SPS Configuration



                                                                             SPS Deployment
                                                                                  Data
                                                                                 Lookup
                                                                             Deployment Data
How?
• Everything is Stateless
• Windows AppFabric Caching service with
  many nodes – reliable and redundant
  – Similar to memcache
  – 240 GB of memory cache in the US
• SQL Server DB Partitioning with lookup system
  master/backup at each level
Facts
•   Availability
      –     Designed with no single point of failure
      –     Web - multiple web servers behind a LB.
      –     DB
                  •   Each DB partition has a primary & secondary DB setup with multi master topology.
                  •   Transactional replication is used by SQL to sync the primary & secondary. If a primary DB server goes down, requests are handled by secondary DB server.
      –     File share: WAN Sync is used to replicate critical files across primary & secondary file server. VIP ensures automatic read availability for SPS Service when
            primary goes down. Write availability for backend services is ensured by manual fail over.
      –     Throttling to prevent outage from abnormal traffic – throttling is configurable both at server level and at partner level. Partner level throttling is based on
            around 200% of normal peak traffic
      –     Load balancer also has a secondary backup

•   Scalability
      –     Web & AppFabric cache: Scalability is achieved by adding new nodes. Everything is stateless…
      –     DB: Databases are hosted as webstore application. Scalability is achieved by partitioning. Adding additional data partition is very easy.

•   Live site metrics
      –     Latency: 10 ms read, 30 ms update, 12ms (async update)
      –     US: 39 web servers, 15 AppFabric caching server, 10 SQL lookup server and 12 SQL backend (data) servers
      –     Asia: 17 web servers, 8 AppFabric caching server, 8 SQL lookup server and 10 SQL backend (data) servers
      –     Europe: 16 web servers, 8 AppFabric caching server, 8 SQL lookup server and 10 SQL backend (data) servers
      –     Current Peak RPS per web box in US is 375 (14.7K RPS US), Peak CPU 40%. Server capacity is around 600RPS with 70% CPU




                                                                                                                                                                                 12
High-level Features
• Support shared namespace definition – reduce # of calls
• Support multiple levels of access control of shared namespace
   – Behind corp firewall
• Plug-in smart defaults for namespace
   – Smart Defaults return faster for cases where the user doesn’t have
     customizations yet.


                                                                          13
High-level Features
• Plug-in smart data validation for namespace
   – Small DLLs validate pre-compiled on the server
• Bulk upload of implicit user preference or clustering info
• Geolookup service – One stop shop – reduce calls
• Support both netTCP calls and WCF calls – if in the same DC
  then netTCP 35% faster than normal TCP
• Service is available globally: US, Europe and Asia – Closer to
  the user.
                                                                   14
High-level Features
• Introduction of an API for Async update
   – Designed to support implicit updates or storing session data. In this case, user does not
     explicitly make an effort to update his/her setting. Instead, by just browsing a page, or click a
     link, corresponding settings are stored on SPS.
         •   Examples: Recent stock list from doing stock quotes on MSN Money site, Search History, Article List where user clicked
             thumb up/down, etc.
   – Two stage updates: 1) data from client request is first saved in cache; then 2) batch updates to
     DB, thus allowing faster response time to client. Optimized for writes

   – Data is in memory for a short period of time before being written to DB. We are using
     AppFabric high availability mode (i.e. dual cache copy) to minimize potential data loss. Data
     loss may occur only if both cache servers are down at the same time.

   – Async update can be turned on/off at attribute level via admin UI. E.g. User’s preferred
     locations are not using Async update, but Money Recent Quotes may be.
                                                                                                                                      15
(0)                      (12)
                                        Partner                  Response
                                        Request
 Partition                   (3)                                                     (1)
    Partition        UserId for lookup                                      Lookup Data in Cache
  Lookup
     Lookup            (Cache miss)                                             (2)
                             (4)
                                                                        Return Data Found in          AppFabric
                 Core Partition Information
                     for User Record                   SPS                     Cache
                                                                                       (11)            Cache
                                                     Endpoint
                                                                                  Write to Cache
                                                     (WCF, CF)
                                (7)
  MSN Geo                     User IP
                                                                                         (5)
Lookup Service                (8)                                                 Query for records   Core
                            User                                                                       Core
                     Location/Connection
                                                                                         (6)
                                                                                    User Records
                                                                                                         Core
                                                                                                           Core
                             Info
                                       (9)                             (10)
                                      User
                                  RevIPInfo and
                                  Data missing
                                    from DB
                                                                   Defaults for
                                                                   Missing Data                Anatomy of a
                                                                                                Get API Call
                                                  Smart Defaults
                                                    Smart Defaults
                                                    Provider Defaults
                                                       Smart
                                                       Provider
                                                          Provider
                                                                                                                  16
(0)                (10)
                                       Partner            Response
                                       Request
                         (3)
Partition         UserId for lookup
  Partition              (4)
 Lookup
   Lookup         User Not Found                                                               AppFabric
                         (5)
                Create lookup record               SPS                                          Cache
                                                                                   (7)
                          (6)
                                                 Endpoint                   Invalidate Cache
              Core Partition Information
                  for User Record
                                                 (WCF, CF)
                                                                                   (8)
                                                                              Write records    Core
                                                                                                Core
                                                                                    (9)
                                                                              Success/Fail
                                                                                                  Core
                                                                                                    Core
                                  (1)                            (2)
                                Validate
                                Request

                                                  Smart
                                                             Success/Fail
                                                                                    Anatomy of an
                                                    Smart
                                                      Smart
                                                 Defaults
                                                   Defaults
                                                     Validator
                                                 Provider
                                                   Provider
                                                     Provider
                                                                                    Update API Call        17
Anatomy of an
1. Async Write Request             5. Response
                                                                         Async Write Call
                                            2. Invalidate Main cache

                           SPS                                           Main Cache
                         Endpoint
                         (WCF, CF)          3. Write to Async cache

                                           4. Return (success)
                                           c. Invalidate Async cache     Async Cache


                                          a. Batch Read for DB Loading
                         CacheSweeper                                       Core
                                                                             Core
                                                                               Core
                                             b. DB Load                                 18
1.Read Request         11. Response

                                                        4. Read from Main Cache

                                                        5. Cache miss from Main cache
                                                                                                  Main Cache
                                              SPS       10. Write to Main Cache


                 6. UserId for lookup
                                            Endpoint
Partition           (Cache miss)            (WCF, CF)   2. Read from Async cache
  Partition
 Lookup
   Lookup           7. Core Partition
                                                        3. Cache miss from Async cache
              Information for User Record                                                         Async Cache


Anatomy of an                                                                   9. User Records

                                                                                                     Core
Async Read Call                                          8. Query for records                         Core
                                                                                                        Core
                                                                                                          Core
                                                                                                                 19
Q&A
Thank you!

Más contenido relacionado

La actualidad más candente

Solaris cluster roadshow day 2 technical presentation
Solaris cluster roadshow day 2 technical presentationSolaris cluster roadshow day 2 technical presentation
Solaris cluster roadshow day 2 technical presentation
xKinAnx
 
Vizuri exadata virtual
Vizuri exadata virtualVizuri exadata virtual
Vizuri exadata virtual
Zack Belcher
 
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Benoit Hudzia
 
Launch webinar-introducing couchbase server 2.0-01202013
Launch webinar-introducing couchbase server 2.0-01202013Launch webinar-introducing couchbase server 2.0-01202013
Launch webinar-introducing couchbase server 2.0-01202013
Dipti Borkar
 
Polyserve DB Consolidation Platform - Clemens Esser
Polyserve DB Consolidation Platform - Clemens EsserPolyserve DB Consolidation Platform - Clemens Esser
Polyserve DB Consolidation Platform - Clemens Esser
HPDutchWorld
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
MongoDB
 
SCM Dashboard
SCM DashboardSCM Dashboard
SCM Dashboard
Perforce
 
Sp2010 high availlability_sql
Sp2010 high availlability_sqlSp2010 high availlability_sql
Sp2010 high availlability_sql
Samuel Zürcher
 

La actualidad más candente (19)

SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul BertucciSQL Server Workshop Paul Bertucci
SQL Server Workshop Paul Bertucci
 
Solaris cluster roadshow day 2 technical presentation
Solaris cluster roadshow day 2 technical presentationSolaris cluster roadshow day 2 technical presentation
Solaris cluster roadshow day 2 technical presentation
 
Vizuri exadata virtual
Vizuri exadata virtualVizuri exadata virtual
Vizuri exadata virtual
 
Hecatonchire kvm forum_2012_benoit_hudzia
Hecatonchire kvm forum_2012_benoit_hudziaHecatonchire kvm forum_2012_benoit_hudzia
Hecatonchire kvm forum_2012_benoit_hudzia
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
 
Launch webinar-introducing couchbase server 2.0-01202013
Launch webinar-introducing couchbase server 2.0-01202013Launch webinar-introducing couchbase server 2.0-01202013
Launch webinar-introducing couchbase server 2.0-01202013
 
Polyserve DB Consolidation Platform - Clemens Esser
Polyserve DB Consolidation Platform - Clemens EsserPolyserve DB Consolidation Platform - Clemens Esser
Polyserve DB Consolidation Platform - Clemens Esser
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis
 
Couchbase Korea User Gorup 2nd Meetup #1
Couchbase Korea User Gorup 2nd Meetup #1Couchbase Korea User Gorup 2nd Meetup #1
Couchbase Korea User Gorup 2nd Meetup #1
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
SQL Server User Group 02/2009
SQL Server User Group 02/2009SQL Server User Group 02/2009
SQL Server User Group 02/2009
 
Ta3
Ta3Ta3
Ta3
 
Gain Insight Into DB2 9 And DB2 10 for z/OS Performance Updates And Save Cost...
Gain Insight Into DB2 9 And DB2 10 for z/OS Performance Updates And Save Cost...Gain Insight Into DB2 9 And DB2 10 for z/OS Performance Updates And Save Cost...
Gain Insight Into DB2 9 And DB2 10 for z/OS Performance Updates And Save Cost...
 
SCM Dashboard
SCM DashboardSCM Dashboard
SCM Dashboard
 
Sp2010 high availlability_sql
Sp2010 high availlability_sqlSp2010 high availlability_sql
Sp2010 high availlability_sql
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
2009 Q2 WSO2 Technical Update
2009 Q2 WSO2 Technical Update2009 Q2 WSO2 Technical Update
2009 Q2 WSO2 Technical Update
 

Destacado

Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)
Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)
Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)
Ontico
 
О кэшировании (Андрей Шетухин)
О кэшировании (Андрей Шетухин)О кэшировании (Андрей Шетухин)
О кэшировании (Андрей Шетухин)
Ontico
 
Использование хранимых процедур в MySQL (Константин Осипов)
Использование хранимых процедур в MySQL (Константин Осипов)Использование хранимых процедур в MySQL (Константин Осипов)
Использование хранимых процедур в MySQL (Константин Осипов)
Ontico
 
Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)
Ontico
 
Microsoft Ajax Minifier – автоматическая опитимизация JavaScript и CSS для ве...
Microsoft Ajax Minifier – автоматическая опитимизация JavaScript и CSS для ве...Microsoft Ajax Minifier – автоматическая опитимизация JavaScript и CSS для ве...
Microsoft Ajax Minifier – автоматическая опитимизация JavaScript и CSS для ве...
Ontico
 
Эволюция разработки крупного интернет-проекта (Ярослав Сергеев)
Эволюция разработки крупного интернет-проекта (Ярослав Сергеев)Эволюция разработки крупного интернет-проекта (Ярослав Сергеев)
Эволюция разработки крупного интернет-проекта (Ярослав Сергеев)
Ontico
 
How to use many core CPUs
How to use many core CPUsHow to use many core CPUs
How to use many core CPUs
Ontico
 
Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)
Ontico
 
Как делать большой Scrum? (Борис Вольфсон)
Как делать большой Scrum? (Борис Вольфсон)Как делать большой Scrum? (Борис Вольфсон)
Как делать большой Scrum? (Борис Вольфсон)
Ontico
 

Destacado (9)

Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)
Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)
Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)
 
О кэшировании (Андрей Шетухин)
О кэшировании (Андрей Шетухин)О кэшировании (Андрей Шетухин)
О кэшировании (Андрей Шетухин)
 
Использование хранимых процедур в MySQL (Константин Осипов)
Использование хранимых процедур в MySQL (Константин Осипов)Использование хранимых процедур в MySQL (Константин Осипов)
Использование хранимых процедур в MySQL (Константин Осипов)
 
Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)
 
Microsoft Ajax Minifier – автоматическая опитимизация JavaScript и CSS для ве...
Microsoft Ajax Minifier – автоматическая опитимизация JavaScript и CSS для ве...Microsoft Ajax Minifier – автоматическая опитимизация JavaScript и CSS для ве...
Microsoft Ajax Minifier – автоматическая опитимизация JavaScript и CSS для ве...
 
Эволюция разработки крупного интернет-проекта (Ярослав Сергеев)
Эволюция разработки крупного интернет-проекта (Ярослав Сергеев)Эволюция разработки крупного интернет-проекта (Ярослав Сергеев)
Эволюция разработки крупного интернет-проекта (Ярослав Сергеев)
 
How to use many core CPUs
How to use many core CPUsHow to use many core CPUs
How to use many core CPUs
 
Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)
 
Как делать большой Scrum? (Борис Вольфсон)
Как делать большой Scrum? (Борис Вольфсон)Как делать большой Scrum? (Борис Вольфсон)
Как делать большой Scrum? (Борис Вольфсон)
 

Similar a Shared personalization service. How to scale to 15 k rps (Patrice Pelland)

Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Fuenteovejuna
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
Haseeb Alam
 
Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011
Membase
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
IndicThreads
 
Memcached, presented to LCA2010
Memcached, presented to LCA2010Memcached, presented to LCA2010
Memcached, presented to LCA2010
Mark Atwood
 
Windows Server AppFabric Cache
Windows Server AppFabric Cache Windows Server AppFabric Cache
Windows Server AppFabric Cache
Pradeep S
 
The Art & Sience of Optimization
The Art & Sience of OptimizationThe Art & Sience of Optimization
The Art & Sience of Optimization
Hertzel Karbasi
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
Membase
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
oscon2007
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
jucaab
 

Similar a Shared personalization service. How to scale to 15 k rps (Patrice Pelland) (20)

Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
 
Memcached, presented to LCA2010
Memcached, presented to LCA2010Memcached, presented to LCA2010
Memcached, presented to LCA2010
 
Scalable Resilient Web Services In .Net
Scalable Resilient Web Services In .NetScalable Resilient Web Services In .Net
Scalable Resilient Web Services In .Net
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the Cloud
 
Using Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisUsing Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data Analysis
 
Veloxum corporate introduction for crowdfunder may 29 2012
Veloxum corporate introduction for crowdfunder may 29 2012Veloxum corporate introduction for crowdfunder may 29 2012
Veloxum corporate introduction for crowdfunder may 29 2012
 
Deep dive into AWS fargate
Deep dive into AWS fargateDeep dive into AWS fargate
Deep dive into AWS fargate
 
Windows Server AppFabric Cache
Windows Server AppFabric Cache Windows Server AppFabric Cache
Windows Server AppFabric Cache
 
BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003
 
20080528dublinpt1
20080528dublinpt120080528dublinpt1
20080528dublinpt1
 
The Art & Sience of Optimization
The Art & Sience of OptimizationThe Art & Sience of Optimization
The Art & Sience of Optimization
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFire
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
 
Google App Engine At A Glance
Google App Engine At A GlanceGoogle App Engine At A Glance
Google App Engine At A Glance
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 

Más de Ontico

Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Ontico
 

Más de Ontico (20)

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Shared personalization service. How to scale to 15 k rps (Patrice Pelland)

  • 1. SPS – Scale to 15k RPS Patrice Pelland Microsoft
  • 2. Overview and Goals of SPS • SPS (Shared Personalization Service) • It is a backend storage and service • Enables following scenarios: • Explicit personalization • Implicit content optimization • Geo based customization
  • 3. Scenario #1 Scenario#1 – WL Anonymous ID and Machine Anonymous ID - based Explicit Personalization Examples: Locations for weather, news, events, favorite sports team, personal shopping list, customized page settings, etc.
  • 4.
  • 5. Scenario #2 Scenario#2 – WL Anonymous ID and Machine Anonymous ID - based Implicit Content Optimization Examples: User demographic & behavior based content optimizations and/or personalization (e.g. personal recommendation)
  • 6.
  • 7. Scenario #3 Scenario#3 – GEO based customization SPS provides a Geolookup service that allows partner to enable IP based customizations (e.g. default location, Location based contents, GEO fencing, etc.)
  • 8.
  • 9. Scaling? Availability? Perf? • Why? 150 Million users visit US Home Page / month and with peeks of 15,000 RPS and up to 75 million users on other HP. • Latency goals: Read < 25 ms – update < 50 ms • Pages have to be up - $$$ loss if not • Need to be stateless
  • 10. Overall Architecture SPS AppFabric Cache SPS FE Cluster Cluster Cache SPSAdapter Cache Box CMS Rendering Cache (SPS MSN Geo Box Cache Box System CMS service Service Cached Data Box wrapper) Load Bala Lookup ncer System Cache Access Partner web server SPS Database Access WCFService Logic Webstor e DB Access Webstore Config Server Database Partitions SPS Configuration SPS Deployment Data Lookup Deployment Data
  • 11. How? • Everything is Stateless • Windows AppFabric Caching service with many nodes – reliable and redundant – Similar to memcache – 240 GB of memory cache in the US • SQL Server DB Partitioning with lookup system master/backup at each level
  • 12. Facts • Availability – Designed with no single point of failure – Web - multiple web servers behind a LB. – DB • Each DB partition has a primary & secondary DB setup with multi master topology. • Transactional replication is used by SQL to sync the primary & secondary. If a primary DB server goes down, requests are handled by secondary DB server. – File share: WAN Sync is used to replicate critical files across primary & secondary file server. VIP ensures automatic read availability for SPS Service when primary goes down. Write availability for backend services is ensured by manual fail over. – Throttling to prevent outage from abnormal traffic – throttling is configurable both at server level and at partner level. Partner level throttling is based on around 200% of normal peak traffic – Load balancer also has a secondary backup • Scalability – Web & AppFabric cache: Scalability is achieved by adding new nodes. Everything is stateless… – DB: Databases are hosted as webstore application. Scalability is achieved by partitioning. Adding additional data partition is very easy. • Live site metrics – Latency: 10 ms read, 30 ms update, 12ms (async update) – US: 39 web servers, 15 AppFabric caching server, 10 SQL lookup server and 12 SQL backend (data) servers – Asia: 17 web servers, 8 AppFabric caching server, 8 SQL lookup server and 10 SQL backend (data) servers – Europe: 16 web servers, 8 AppFabric caching server, 8 SQL lookup server and 10 SQL backend (data) servers – Current Peak RPS per web box in US is 375 (14.7K RPS US), Peak CPU 40%. Server capacity is around 600RPS with 70% CPU 12
  • 13. High-level Features • Support shared namespace definition – reduce # of calls • Support multiple levels of access control of shared namespace – Behind corp firewall • Plug-in smart defaults for namespace – Smart Defaults return faster for cases where the user doesn’t have customizations yet. 13
  • 14. High-level Features • Plug-in smart data validation for namespace – Small DLLs validate pre-compiled on the server • Bulk upload of implicit user preference or clustering info • Geolookup service – One stop shop – reduce calls • Support both netTCP calls and WCF calls – if in the same DC then netTCP 35% faster than normal TCP • Service is available globally: US, Europe and Asia – Closer to the user. 14
  • 15. High-level Features • Introduction of an API for Async update – Designed to support implicit updates or storing session data. In this case, user does not explicitly make an effort to update his/her setting. Instead, by just browsing a page, or click a link, corresponding settings are stored on SPS. • Examples: Recent stock list from doing stock quotes on MSN Money site, Search History, Article List where user clicked thumb up/down, etc. – Two stage updates: 1) data from client request is first saved in cache; then 2) batch updates to DB, thus allowing faster response time to client. Optimized for writes – Data is in memory for a short period of time before being written to DB. We are using AppFabric high availability mode (i.e. dual cache copy) to minimize potential data loss. Data loss may occur only if both cache servers are down at the same time. – Async update can be turned on/off at attribute level via admin UI. E.g. User’s preferred locations are not using Async update, but Money Recent Quotes may be. 15
  • 16. (0) (12) Partner Response Request Partition (3) (1) Partition UserId for lookup Lookup Data in Cache Lookup Lookup (Cache miss) (2) (4) Return Data Found in AppFabric Core Partition Information for User Record SPS Cache (11) Cache Endpoint Write to Cache (WCF, CF) (7) MSN Geo User IP (5) Lookup Service (8) Query for records Core User Core Location/Connection (6) User Records Core Core Info (9) (10) User RevIPInfo and Data missing from DB Defaults for Missing Data Anatomy of a Get API Call Smart Defaults Smart Defaults Provider Defaults Smart Provider Provider 16
  • 17. (0) (10) Partner Response Request (3) Partition UserId for lookup Partition (4) Lookup Lookup User Not Found AppFabric (5) Create lookup record SPS Cache (7) (6) Endpoint Invalidate Cache Core Partition Information for User Record (WCF, CF) (8) Write records Core Core (9) Success/Fail Core Core (1) (2) Validate Request Smart Success/Fail Anatomy of an Smart Smart Defaults Defaults Validator Provider Provider Provider Update API Call 17
  • 18. Anatomy of an 1. Async Write Request 5. Response Async Write Call 2. Invalidate Main cache SPS Main Cache Endpoint (WCF, CF) 3. Write to Async cache 4. Return (success) c. Invalidate Async cache Async Cache a. Batch Read for DB Loading CacheSweeper Core Core Core b. DB Load 18
  • 19. 1.Read Request 11. Response 4. Read from Main Cache 5. Cache miss from Main cache Main Cache SPS 10. Write to Main Cache 6. UserId for lookup Endpoint Partition (Cache miss) (WCF, CF) 2. Read from Async cache Partition Lookup Lookup 7. Core Partition 3. Cache miss from Async cache Information for User Record Async Cache Anatomy of an 9. User Records Core Async Read Call 8. Query for records Core Core Core 19

Notas del editor

  1. SPS stands for Shared Personalization ServiceSPS is a service created by MSN to stop the proliferation of profiles. It is used by many teams at Microsoft mostly in MSN.Backend Storage of user customizations, optimization keys and a service backbone that offers different entry points to the data
  2. Anonymous – no way to track who is the person from this.