SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
abstractions at scale
            our experiences at twitter

            marius a. eriksen
            @marius
            QConSF, November 2010




                                         TM




Sunday, November 14, 2010
twitter
         ‣     real-time information network
               ‣    70M tweets/day (800/s)
               ‣    150M users
               ‣    70k API calls/s




Sunday, November 14, 2010
agenda
         ‣     scale & scalability
         ‣     the role of abstraction
         ‣     good abstractions, bad abstractions
         ‣     abstractions & scale
         ‣     examples
         ‣     “just right” APIs
         ‣     conclusions



Sunday, November 14, 2010
scale & scalability



         “Scalability is a desirable property of a system, a
         network, or a process, which indicates its ability to
         either handle growing amounts of work in a
         graceful manner or to be readily enlarged”

         (Wikipedia)




Sunday, November 14, 2010
scale & scalability (cont’d)

         ‣     only “horizontal” scaling allows unbounded
               growth
               ‣    not entirely true: eg. due to network effects
               ‣    not a panacea
         ‣     “vertical” scaling is often desirable & required
               ‣    contain costs
               ‣    curtail network effects


Sunday, November 14, 2010
scale & scalability (cont’d)
         ‣     the target architecture is the datacenter
               ‣     network is a critical component
               ‣     deeper storage hierarchy
               ‣     higher performance variance
               ‣     complex failure modes
               ‣     but our programming models don’t account
                     for these resource & failure models explicitly




Sunday, November 14, 2010
abstraction
         “freedom from representational qualities”


         ‣     the chief purpose of abstraction is to manage
               complexity & provide composability
         ‣     in software, abstraction is manifested through
               common interfaces
               ‣     explicit semantics
               ‣     implicit “contracts”


Sunday, November 14, 2010
abstraction (cont’d)



         ‣     as systems become more complex, abstraction
               becomes increasingly important
               ‣    especially as number of engineers grow
         ‣     modern systems are highly complex and are
               highly abstracted




Sunday, November 14, 2010
type systems
         ‣     [static] type systems can encode some of the
               contracts for us
               ‣     giving us static guarantees
               ‣     academia is pushing the envelope here with
                     dependent types
               ‣     they also compose


         ‣     the line between type & program becomes
               blurred


Sunday, November 14, 2010
good abstraction? your CPU
         ‣     x86-64 is a spec
               ‣     you don’t care if it’s provided by AMD or Intel
         ‣     excepting a few compiler & OS authors, most of
               you don’t think about
               ‣     pipelining
               ‣     out of order & speculative execution
               ‣     branch prediction
               ‣     cache coherency
               ‣     etc…

Sunday, November 14, 2010
good …? your memory hierarchy
         ‣     you don’t interface with it directly
         ‣     purist view: addressable memory cells
         ‣     reality: has scary-good optimizations for
               common access patterns. highly optimized.
         ‣     you don’t think (often) about:
               ‣     cache locality
               ‣     TLB effects
               ‣     MMU ops scheduling


Sunday, November 14, 2010
bad abstraction? ia64
         ‣     (at least initially) compilers couldn’t live up to it
               ‣     hardware promise was delegated to the
                     compiler
               ‣     compilers failed to reliably produce sufficiently
                     fast code
               ‣     abstraction was broken
               ‣     good for certain scientific computing domains




Sunday, November 14, 2010
a lens
         ‣     scaling issues occur when abstractions become
               leaky
               ‣     RDBMS fails to perform sophisticated queries
                     on highly normalized data
               ‣     your GC thrashes after a certain allocation
                     volume
               ‣     OS thread scheduling becomes unviable after
                     N × 1000 threads are created




Sunday, November 14, 2010
¶ threads
         ‣     threads offer a familiar and linear model of execution
               ‣     scheduling overhead becomes important after a
                     certain amount of parallelism
               ‣     stack allocation can become troublesome
               ‣     fails to be explicit about latency, backpressure
         ‣     alternative: asynchronous programming
               ‣     makes queuing, latency explicit
               ‣     allows SEDA-style control
         ‣     a compromise? LWT

Sunday, November 14, 2010
¶ sequence abstractions
         ‣     produces concise, beautiful, composable code


              trait Places extends Seq[Place]
              places.chunks(5000).map(_.toList).parForeach { chunk =>
                   …
              }


         ‣     access patterns aren’t propagated down the
               stack
         ‣     missed optimizations

Sunday, November 14, 2010
¶ RDBMS
         ‣     are [by definition] generic
         ‣     encourage normalized data storage
               ‣     very powerful data model
               ‣     little need to know access patterns a priori
         ‣     provide general (magical) querying mechanics
               ‣     bag of tricks: query planning, table statistics,
                     covering indices




Sunday, November 14, 2010
¶ RDBMS
         ‣     at scale, the most viable strategy is: What You
               Serve Is What You Store (WYSIWYS)
               ‣     or at least very close
         ‣     this brings about a whole host of new problems
               ‣     data (in)consistency
               ‣     multiple indices
               ‣     “re-normalization”




Sunday, November 14, 2010
¶ RDBMS
         ‣     at-scale, querying is highly predictable, most of
               the time:
               ‣     don’t need fancy query planning
               ‣     don’t need statistics
         ‣     in fact, we know a-priori how to efficiently query
               the underlying datastructures
               ‣     wish: don’t give me a query engine, give me
                     primitives!
               ‣     maybe there’s a “just right” API here


Sunday, November 14, 2010
¶ in-memory representations
         ‣     having tight control over representation is often
               crucial to resource utilization
               ‣     [space vs. time] memory bandwidth is
                     precious, CPU is plentiful
               ‣     cache locality can often make an enormous
                     difference — even to the point of less code is
                     better than more efficient code(!)
         ‣     at odds with modern GC’d languages automatic
               memory management & layout



Sunday, November 14, 2010
¶ in-memory representations
         ‣     optimize memory layout
         ‣     pack data
         ‣     compression
               ‣     varint, difference, zigzag, etc.
         ‣     L1:main memory latency ≈ 1:200 (!)


         ‣     example: geometry of Canada ~ jts normalized,
               vs. WKB
               ‣     wkb is ≈ 600 KB, JTS representation ≈ 2-3MB

Sunday, November 14, 2010
¶ garbage collection
         ‣     we love garbage collection
         ‣     attempts to encode common patterns:
               generational hypothesis
               ‣     not always quite right
               ‣     the application almost always has some idea
                     about object lifetime & semantics
         ‣     proposal: talk to each other!
               ‣     backpressure, thresholding, application-guided
                     GC


Sunday, November 14, 2010
¶ virtual memory


                               “You’re Doing it Wrong”
                      Poul-Henning Kamp, ACM Queue, June 2010




          "… Varnish does not ignore the fact that memory is virtual;
                           it actively exploits it”




Sunday, November 14, 2010
¶ virtual memory


         ‣     maybe he is doing it wrong?
         ‣     varnish uses data structures designed to
               anticipate virtual memory layout & behavior
               ‣    translates application semantics (eg. LRU)
         ‣     instead, you could have direct control over those
               resources




Sunday, November 14, 2010
“just right” abstractions
         ‣     high level abstractions are absolutely necessary
               to deal with today’s complex systems
         ‣     but providing good abstractions is hard
         ‣     what are the “just right” abstractions?
               ‣     exploit common patterns
               ‣     give enough degrees of freedom to the
                     underlying platform
               ‣     usually target a narrow(er) domain
               ‣     retain high level interfaces

Sunday, November 14, 2010
¶ mapreduce
               def map(datum):
                 words = {}
                 for word in parse_words(datum):
                   word[word] += 1
                 for (word, count) in words.items():
                   output(word, count)

               def reduce(key, values):
                 output(key, mean(values))
              ‣     much freedom is given to the scheduler
              ‣     exploits data locality (predictably)
Sunday, November 14, 2010
¶ shared-nothing web apps



               def handle(request):
                 return Response(
                   “hello %s!” % request.get_user())


              ‣     eg: google’s app engine, django, rails, etc




Sunday, November 14, 2010
¶ bigtable
         ‣     very simple data model
               ‣     but composable — effectively every other
                     database squeezes (more) sophisticated data
                     models down to 1 dimensional storage(s)
         ‣     explicit memory hierarchy (pinning column
               families to memory)
         ‣     provides load balancer/scheduler much freedom
         ‣     only magic: compactions. challenge: resource
               isolation.


Sunday, November 14, 2010
¶ LWT
                 lwt ai = Lwt_lib.getaddrinfo
                   "localhost" "8080"
                   [Unix.AI_FAMILY Unix.PF_INET;
                    Unix.AI_SOCKTYPE Unix.SOCK_STREAM] in

                 lwt (input, output) =
                   match ai with
                     | [] -> fail Not_found
                     | a :: _ -> Lwt_io.open_connection
                                   a.Unix.ai_addr in

                 Lwt_io.write output "GET / HTTP/1.1rnrn" >>
                 Lwt_io.read input




Sunday, November 14, 2010
theme
         ‣     provide a programming model that provide a
               narrow (but flexible) interface to resources
               ‣     mapreduce
               ‣     shared-nothing web apps
         ‣     provide a programming model that make
               resources explicit
               ‣     bigtable
               ‣     LWT



Sunday, November 14, 2010
meta pattern(s)


         ‣     addressing separation of concerns:
               ‣    (asynchronous) execution policy vs.
                    (synchronous) application logic
               ‣    data locality vs. data operations
               ‣    data model vs. data distribution
               ‣    data locality vs. data model



Sunday, November 14, 2010
the future?
         ‣     database systems
         ‣     search systems
         ‣     ... or any online query system?


         ‣     some academic work already in this area:
               ‣     OPIS (distributed arrows w/ combinators)
               ‣     ypnos (grid compiler)
               ‣     skywriting (scripted dataflow)


Sunday, November 14, 2010
conclusions
         ‣     we need high level abstractions
               ‣     they are simply necessary
               ‣     allows us to develop faster and safer
         ‣     many high level abstractions aren’t “just right”
               ‣     can become highly inoptimal (often orders of
                     magnitudes can be reclaimed)
         ‣     some systems do provide good compromises
               ‣     makes resources explicit
         ‣     the future is exciting!

Sunday, November 14, 2010
that’s it!



         ‣     follow me: @marius
         ‣     marius@twitter.com




Sunday, November 14, 2010

Más contenido relacionado

Similar a Abstractions at Scale – Our Experiences at Twitter

AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionDaniel Norman
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsAndy Gross
 
Challenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Javalucenerevolution
 
Mark Little Fence Sitting Soa Geek
Mark Little Fence Sitting Soa GeekMark Little Fence Sitting Soa Geek
Mark Little Fence Sitting Soa Geekdeimos
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackRyan Aydelott
 
Inside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudInside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudAtlassian
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleMayaData
 
An Overview of Flash Storage for Databases
An Overview of Flash Storage for DatabasesAn Overview of Flash Storage for Databases
An Overview of Flash Storage for DatabasesConFoo
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL DatabaseSteve Min
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupMayaData Inc
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupGianmario Spacagna
 
JavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsJavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsPiyush Katariya
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Onyxfish
 

Similar a Abstractions at Scale – Our Experiences at Twitter (20)

AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard Problems
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
Challenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Java
 
Mark Little Fence Sitting Soa Geek
Mark Little Fence Sitting Soa GeekMark Little Fence Sitting Soa Geek
Mark Little Fence Sitting Soa Geek
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
 
!Chaos Control System
!Chaos Control System!Chaos Control System
!Chaos Control System
 
Inside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudInside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private Cloud
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps style
 
An Overview of Flash Storage for Databases
An Overview of Flash Storage for DatabasesAn Overview of Flash Storage for Databases
An Overview of Flash Storage for Databases
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo Applications
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris Meetup
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
JavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsJavaScript for Enterprise Applications
JavaScript for Enterprise Applications
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.
 

Último

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Último (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Abstractions at Scale – Our Experiences at Twitter

  • 1. abstractions at scale our experiences at twitter marius a. eriksen @marius QConSF, November 2010 TM Sunday, November 14, 2010
  • 2. twitter ‣ real-time information network ‣ 70M tweets/day (800/s) ‣ 150M users ‣ 70k API calls/s Sunday, November 14, 2010
  • 3. agenda ‣ scale & scalability ‣ the role of abstraction ‣ good abstractions, bad abstractions ‣ abstractions & scale ‣ examples ‣ “just right” APIs ‣ conclusions Sunday, November 14, 2010
  • 4. scale & scalability “Scalability is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner or to be readily enlarged” (Wikipedia) Sunday, November 14, 2010
  • 5. scale & scalability (cont’d) ‣ only “horizontal” scaling allows unbounded growth ‣ not entirely true: eg. due to network effects ‣ not a panacea ‣ “vertical” scaling is often desirable & required ‣ contain costs ‣ curtail network effects Sunday, November 14, 2010
  • 6. scale & scalability (cont’d) ‣ the target architecture is the datacenter ‣ network is a critical component ‣ deeper storage hierarchy ‣ higher performance variance ‣ complex failure modes ‣ but our programming models don’t account for these resource & failure models explicitly Sunday, November 14, 2010
  • 7. abstraction “freedom from representational qualities” ‣ the chief purpose of abstraction is to manage complexity & provide composability ‣ in software, abstraction is manifested through common interfaces ‣ explicit semantics ‣ implicit “contracts” Sunday, November 14, 2010
  • 8. abstraction (cont’d) ‣ as systems become more complex, abstraction becomes increasingly important ‣ especially as number of engineers grow ‣ modern systems are highly complex and are highly abstracted Sunday, November 14, 2010
  • 9. type systems ‣ [static] type systems can encode some of the contracts for us ‣ giving us static guarantees ‣ academia is pushing the envelope here with dependent types ‣ they also compose ‣ the line between type & program becomes blurred Sunday, November 14, 2010
  • 10. good abstraction? your CPU ‣ x86-64 is a spec ‣ you don’t care if it’s provided by AMD or Intel ‣ excepting a few compiler & OS authors, most of you don’t think about ‣ pipelining ‣ out of order & speculative execution ‣ branch prediction ‣ cache coherency ‣ etc… Sunday, November 14, 2010
  • 11. good …? your memory hierarchy ‣ you don’t interface with it directly ‣ purist view: addressable memory cells ‣ reality: has scary-good optimizations for common access patterns. highly optimized. ‣ you don’t think (often) about: ‣ cache locality ‣ TLB effects ‣ MMU ops scheduling Sunday, November 14, 2010
  • 12. bad abstraction? ia64 ‣ (at least initially) compilers couldn’t live up to it ‣ hardware promise was delegated to the compiler ‣ compilers failed to reliably produce sufficiently fast code ‣ abstraction was broken ‣ good for certain scientific computing domains Sunday, November 14, 2010
  • 13. a lens ‣ scaling issues occur when abstractions become leaky ‣ RDBMS fails to perform sophisticated queries on highly normalized data ‣ your GC thrashes after a certain allocation volume ‣ OS thread scheduling becomes unviable after N × 1000 threads are created Sunday, November 14, 2010
  • 14. ¶ threads ‣ threads offer a familiar and linear model of execution ‣ scheduling overhead becomes important after a certain amount of parallelism ‣ stack allocation can become troublesome ‣ fails to be explicit about latency, backpressure ‣ alternative: asynchronous programming ‣ makes queuing, latency explicit ‣ allows SEDA-style control ‣ a compromise? LWT Sunday, November 14, 2010
  • 15. ¶ sequence abstractions ‣ produces concise, beautiful, composable code trait Places extends Seq[Place] places.chunks(5000).map(_.toList).parForeach { chunk => … } ‣ access patterns aren’t propagated down the stack ‣ missed optimizations Sunday, November 14, 2010
  • 16. ¶ RDBMS ‣ are [by definition] generic ‣ encourage normalized data storage ‣ very powerful data model ‣ little need to know access patterns a priori ‣ provide general (magical) querying mechanics ‣ bag of tricks: query planning, table statistics, covering indices Sunday, November 14, 2010
  • 17. ¶ RDBMS ‣ at scale, the most viable strategy is: What You Serve Is What You Store (WYSIWYS) ‣ or at least very close ‣ this brings about a whole host of new problems ‣ data (in)consistency ‣ multiple indices ‣ “re-normalization” Sunday, November 14, 2010
  • 18. ¶ RDBMS ‣ at-scale, querying is highly predictable, most of the time: ‣ don’t need fancy query planning ‣ don’t need statistics ‣ in fact, we know a-priori how to efficiently query the underlying datastructures ‣ wish: don’t give me a query engine, give me primitives! ‣ maybe there’s a “just right” API here Sunday, November 14, 2010
  • 19. ¶ in-memory representations ‣ having tight control over representation is often crucial to resource utilization ‣ [space vs. time] memory bandwidth is precious, CPU is plentiful ‣ cache locality can often make an enormous difference — even to the point of less code is better than more efficient code(!) ‣ at odds with modern GC’d languages automatic memory management & layout Sunday, November 14, 2010
  • 20. ¶ in-memory representations ‣ optimize memory layout ‣ pack data ‣ compression ‣ varint, difference, zigzag, etc. ‣ L1:main memory latency ≈ 1:200 (!) ‣ example: geometry of Canada ~ jts normalized, vs. WKB ‣ wkb is ≈ 600 KB, JTS representation ≈ 2-3MB Sunday, November 14, 2010
  • 21. ¶ garbage collection ‣ we love garbage collection ‣ attempts to encode common patterns: generational hypothesis ‣ not always quite right ‣ the application almost always has some idea about object lifetime & semantics ‣ proposal: talk to each other! ‣ backpressure, thresholding, application-guided GC Sunday, November 14, 2010
  • 22. ¶ virtual memory “You’re Doing it Wrong” Poul-Henning Kamp, ACM Queue, June 2010 "… Varnish does not ignore the fact that memory is virtual; it actively exploits it” Sunday, November 14, 2010
  • 23. ¶ virtual memory ‣ maybe he is doing it wrong? ‣ varnish uses data structures designed to anticipate virtual memory layout & behavior ‣ translates application semantics (eg. LRU) ‣ instead, you could have direct control over those resources Sunday, November 14, 2010
  • 24. “just right” abstractions ‣ high level abstractions are absolutely necessary to deal with today’s complex systems ‣ but providing good abstractions is hard ‣ what are the “just right” abstractions? ‣ exploit common patterns ‣ give enough degrees of freedom to the underlying platform ‣ usually target a narrow(er) domain ‣ retain high level interfaces Sunday, November 14, 2010
  • 25. ¶ mapreduce def map(datum): words = {} for word in parse_words(datum): word[word] += 1 for (word, count) in words.items(): output(word, count) def reduce(key, values): output(key, mean(values)) ‣ much freedom is given to the scheduler ‣ exploits data locality (predictably) Sunday, November 14, 2010
  • 26. ¶ shared-nothing web apps def handle(request): return Response( “hello %s!” % request.get_user()) ‣ eg: google’s app engine, django, rails, etc Sunday, November 14, 2010
  • 27. ¶ bigtable ‣ very simple data model ‣ but composable — effectively every other database squeezes (more) sophisticated data models down to 1 dimensional storage(s) ‣ explicit memory hierarchy (pinning column families to memory) ‣ provides load balancer/scheduler much freedom ‣ only magic: compactions. challenge: resource isolation. Sunday, November 14, 2010
  • 28. ¶ LWT lwt ai = Lwt_lib.getaddrinfo "localhost" "8080" [Unix.AI_FAMILY Unix.PF_INET; Unix.AI_SOCKTYPE Unix.SOCK_STREAM] in lwt (input, output) = match ai with | [] -> fail Not_found | a :: _ -> Lwt_io.open_connection a.Unix.ai_addr in Lwt_io.write output "GET / HTTP/1.1rnrn" >> Lwt_io.read input Sunday, November 14, 2010
  • 29. theme ‣ provide a programming model that provide a narrow (but flexible) interface to resources ‣ mapreduce ‣ shared-nothing web apps ‣ provide a programming model that make resources explicit ‣ bigtable ‣ LWT Sunday, November 14, 2010
  • 30. meta pattern(s) ‣ addressing separation of concerns: ‣ (asynchronous) execution policy vs. (synchronous) application logic ‣ data locality vs. data operations ‣ data model vs. data distribution ‣ data locality vs. data model Sunday, November 14, 2010
  • 31. the future? ‣ database systems ‣ search systems ‣ ... or any online query system? ‣ some academic work already in this area: ‣ OPIS (distributed arrows w/ combinators) ‣ ypnos (grid compiler) ‣ skywriting (scripted dataflow) Sunday, November 14, 2010
  • 32. conclusions ‣ we need high level abstractions ‣ they are simply necessary ‣ allows us to develop faster and safer ‣ many high level abstractions aren’t “just right” ‣ can become highly inoptimal (often orders of magnitudes can be reclaimed) ‣ some systems do provide good compromises ‣ makes resources explicit ‣ the future is exciting! Sunday, November 14, 2010
  • 33. that’s it! ‣ follow me: @marius ‣ marius@twitter.com Sunday, November 14, 2010