SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Building Distributed Systems in Scala
A presentation to Emerging Technologies for the Enterprise
April 8, 2010 – Philadelphia, PA




                                                             TM
About @al3x
‣   At Twitter since 2007
‣   Working on the Web
    since 1995
‣   Co-author of
    Programming Scala
    (O’Reilly, 2009)
‣   Into programming
    languages,
    distributed systems.
About Twitter
‣   Social messaging – a
    new way to
    communicate
‣   Launched in
    mid-2006
‣   Hit the mainstream in
    2008
‣   50+ million tweets per
    day (600+ per
    second)
‣   Millions of users
    worldwide
Technologies Used At Twitter
Languages                         Frameworks
‣   Ruby, JavaScript              ‣   Rails
‣   Scala                         ‣   jQuery
‣   lil’ bit of C, Python, Java


Data Storage                      Misc.
‣   MySQL                         ‣   memcached
‣   Cassandra                     ‣   ZooKeeper
‣   HBase (Hadoop)                ‣   Jetty
                                  ‣   so much more!
Why Scala?
‣   A language that’s both fun and productive.
‣   Great performance (on par with Java).
‣   Object-oriented and functional programming,
    together.
‣   Ability to reuse existing Java libraries.
‣   Flexible concurrency (Actors, threads, events).
‣   A smart community with infectious momentum.
Hawkwind
A case study in (re)building
a distributed system in Scala.
Requirements
‣   Search for people by name, username, eventually
    by other attributes.
‣   Order the results some sensible way (ex: by
    number of followers).
‣   Offer suggestions for misspellings/alternate names.
‣   Handle case-folding and other text normalization
    concerns on the query string.
‣   Return results in about a second, preferably less.
Finding People on Twitter
Finding People on Twitter




results
Finding People on Twitter



                 suggestion




results
Finding People on Twitter

                              speedy!

                 suggestion




results
First Attempt: acts_as_solr
‣   Crunched on time, so we wanted the fastest
    route to working user search.
‣   Uses the Solr distribution/platform from Apache
    Lucene.
‣   Tries to make Rails integration straightforward
    and idiomatic.
‣   Easy to get running, hard to operationalize.
In the Interim: A Move to SOA
‣   Stopped thinking of our architecture as just a
    Rails app and the components that orbit it.
‣   Started building isolated services that
    communicate with the rest of the system via
    Thrift (an RPC and server framework).
‣   Allows us freedom to change the underlying
    implementation of services without modifying the
    rest of the system.
Thrift Example
   struct Results {
     1: list<i64> people
     2: string suggestion
     3: i32 processingTime /* milliseconds */
     4: list<i32> timings
     5: i32 totalResults
   }

   service NameSearch {
    Results find(1: string name, 2: i32 maxResults, 3: bool
   wantSuggestion)

  Results find_with_ranking(1: string name, 2: i32 maxResults, 3: bool
wantSuggestion, 4: Ranker ranking)
}
Second Attempt: Hawkwind 1
‣   A quick (three weeks) bespoke Scala project to
    “stop the bleeding”.
‣   Vertically but not horizontally scalable: no
    sharding, no failover, machine-level redundancy.
‣   Ran into memory and disk space limits.
‣   Reused Java code but didn’t offer nice Scala
    wrappers or rewrites.
‣   Still, planned to grow 10x, grew 25x!
Goals for Hawkwind 2
‣   Horizontally scalable: sharded corpus,
    replication of shards, easy to grow the service.
‣   Faster.
‣   Higher-quality results.
‣   Better use of Scala (language features,
    programming style).
‣   Maintainable code base, make it easy to add
    features.
High-Level Concepts
‣   Shards: pieces of the user corpus.
‣   Replicas: copies of shards.
‣   Document Servers.
‣   Merge Servers.
‣   Every machine gets the same code, can be
    either a Document Server or a Merge Server.
Hawkwind 2                                          Internet




High-Level                               queries for users, API requests




Architecture                                    Rails Cluster



                                    Thrift call to semi-random Merge Server




                                     Merge           Merge           Merge
                                     Server          Server          Server


                                Thrift calls to semi-random replica of each shard




                Shard 1      Shard 1         Shard 2           Shard 2         Shard 3      Shard 3
               Doc Server   Doc Server      Doc Server        Doc Server      Doc Server   Doc Server




                                  periodic deliveries of sharded user corpus




                                               Hadoop (HBase)
Taking Care of Data
‣   A Hadoop job gathers up the user data and slices it
    into shards.
‣   A cron job fetches these data dumps several times
    per day.
‣   To load a new corpus on a Document Server, simply
    restart the process.
‣   Redundancy and staggered scheduling keeps the
    system from running too hot while restarts are in
    progress.
What a Document Server does
‣   On startup, load Thrift serialized User objects.
‣   Populate an Inverted Index, Map, and Trie with
    normalized attributes of those User objects.
‣   Once ready, listen for queries.
‣   Answering a query basically means looking
    stuff up in those pre-populated data structures.
‣   Maintains a connection pool for Thrift requests,
    wrapping org.apache.commons.pool.
What a Merge Server does
‣   Gets queries.
‣   Fans out queries to Document Servers.
‣   Waits for queries to come back using a custom
    ParallelFuture class, which wraps a number of
    java.util.concurrent classes.
‣   Merges together the result sets, re-ranks them,
    and ships ‘em back to the requesting client.
How to model a distributed system?
‣   Literal decomposition: classes for all
    architectural components (Shard, Replica, etc.).
‣   Each component knows/does as little as
    possible.
‣   Isolate mutable state, test carefully.
‣   Cleanly delegate calls.
Literal Decomposition: Replica
case class Replica(val shard: Shard, val server: Server) {
 private val log = Logger.get
 val BACKEND_ERROR = Stats.getCounter("backend_timeout")

    def query(q: Query): DocResults = w3c.time("replica-query") {
      server.thriftCall { client =>
        // logic goes here
      }
    }

    def ping(): Boolean = server.thriftCall { client =>
      log.debug("calling ping via thrift for %s", server)
      val rv = client.ping()
      log.debug("ping returned %s from %s", rv, server)
      rv
    }
}
Literal Decomposition: Server
 case class Server(val hostname: String, val port: Int) {
  val pool = ConnectionPool(hostname, port)
  private val log = Logger.get

     def thriftCall[A](f: Client => A) = {
       log.debug("making thriftCall for server %s", this)
       pool.withClient { client => f(client) }
     }

     def replica: Replica = {
       Replica(ShardMap.serversToShards(this), this)
     }
 }
Hawkwind 2
Query Call
                      MergeLayer.query




Graph                  ShardMap.query




                 shard.replicaManager ! query




                         shard.query




                       randomReplica()




                        replica.query




                       server.thriftCall




             NameSearchDocumentLayerClient.find
Hawkwind 2
Query Call
                                   MergeLayer.query




Graph      what’s this?             ShardMap.query




                              shard.replicaManager ! query




                                      shard.query




                                    randomReplica()




                                     replica.query




                                    server.thriftCall




                          NameSearchDocumentLayerClient.find
ShardMap: Isolating Mutable State
‣   A singleton and an Actor.
‣   Contains a map from Servers to their
    corresponding Shards.
‣   Also contains a map from Shards to the Replicas
    of those shards.
‣   Responsible for populating and managing
    those maps.
‣   Send it a message to evict or reinsert a Replica.
‣   Fans out queries to Shards.
ReplicaHealthChecker
‣   Much like the ShardMap, a singleton and an
    Actor.
‣   Maintains mutable lists of unhealthy Replicas
    (“the penalty box”).
‣   Constantly checking to see if evicted Replicas
    are healthy again (back online).
‣   Sends messages to itself – an effective Actor
    technique.
Challenges, Large and Small
‣   Fast importing of huge serialized Thrift object
    dumps.
‣   Testing the ShardMap and ReplicaHealthChecker
    (mutable state wants to hurt you).
‣   Efficient accent normalization and filtering for
    special characters.
‣   Working with the Apache Commons object pool.
‣   Breaking out different ranking mechanisms in a
    clean, reusable way.
Libraries & Tools
Things that make working in Scala
way more productive.
sbt – the Simple Build Tool
‣   Scala’s answer to Ant and Maven.
‣   Sets up new projects.
‣   Maintains project configuration, build tasks,
    and dependencies in pure Scala. Totally open-
    ended.
‣   Interactive console.
‣   Will run tasks as soon as files in your project
    change – automatically compile and run tests!
Ostrich
‣   Gather statistics about your application.
‣   Counters, gauges, and timings.
‣   Share stats via JMX, a plain-text socket, a web
    interface, or log files.
‣   Ex:
          Stats.time("foo") {
            timeConsumingOperation()
          }
Configgy
‣   Manages configuration files and logging.
‣   Flexible file format, can include files in other files.
‣   Inheritance, variable substitution.
‣   Tunable logging, logging with Scribe.
‣   Subscription API: push and validate
    configuration changes to running processes.
‣   Ex:
      val foo = config.getString(“foo”)
Specs + xrayspecs
 ‣   A behavior-driven development (BDD) testing
     framework for Scala.
 ‣   Elegant, readable, fun-to-write tests.
 ‣   Support for several mocking frameworks (we
     like Mockito).
 ‣   Test concurrent operations, time, much more.
 ‣   Ex:
"suggestion with a List of null does not blow up" in {
  MergeLayer.suggestion("steve", List(null)) mustEqual None
}
Questions?                                 Follow me at
                                           twitter.com/al3x

Learn with us at engineering.twitter.com
Work with us at jobs.twitter.com




                                                   TM

Más contenido relacionado

La actualidad más candente

How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in sparkPeng Cheng
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with ScalaHimanshu Gupta
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchMark Miller
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbLucidworks
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTPRoland Kuhn
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database huguk
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthdaveconnors
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop UsersKathleen Ting
 
Topic Modeling via Tensor Factorization - Use Case for Apache REEF
Topic Modeling via Tensor Factorization - Use Case for Apache REEFTopic Modeling via Tensor Factorization - Use Case for Apache REEF
Topic Modeling via Tensor Factorization - Use Case for Apache REEFSergiy Matusevych
 
Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework
Topic Modeling via Tensor Factorization Use Case for Apache REEF FrameworkTopic Modeling via Tensor Factorization Use Case for Apache REEF Framework
Topic Modeling via Tensor Factorization Use Case for Apache REEF FrameworkDataWorks Summit
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 

La actualidad más candente (20)

How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with Scala
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
 
Topic Modeling via Tensor Factorization - Use Case for Apache REEF
Topic Modeling via Tensor Factorization - Use Case for Apache REEFTopic Modeling via Tensor Factorization - Use Case for Apache REEF
Topic Modeling via Tensor Factorization - Use Case for Apache REEF
 
Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework
Topic Modeling via Tensor Factorization Use Case for Apache REEF FrameworkTopic Modeling via Tensor Factorization Use Case for Apache REEF Framework
Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 

Destacado

Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1datamantra
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaVladimir Kostyukov
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in ScalaPatrick Nicolas
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)Kevin Weil
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosPaco Nathan
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In ScalaKnoldus Inc.
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with CassandraRyan King
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in PythonSarah Mount
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드봉조 김
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회봉조 김
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서봉조 김
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료봉조 김
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2봉조 김
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcareTaposh Roy
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda ExpressionsHaim Michael
 
AWS Innovate: AWS Container Management using Amazon EC2 Container Service an...
AWS Innovate:  AWS Container Management using Amazon EC2 Container Service an...AWS Innovate:  AWS Container Management using Amazon EC2 Container Service an...
AWS Innovate: AWS Container Management using Amazon EC2 Container Service an...Amazon Web Services Korea
 

Destacado (20)

Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in Scala
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In Scala
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in Python
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
 
AWS Innovate: AWS Container Management using Amazon EC2 Container Service an...
AWS Innovate:  AWS Container Management using Amazon EC2 Container Service an...AWS Innovate:  AWS Container Management using Amazon EC2 Container Service an...
AWS Innovate: AWS Container Management using Amazon EC2 Container Service an...
 

Similar a Building Distributed Systems in Scala

High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStackKamesh Pemmaraju
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Reactivesummit
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaAkara Sucharitakul
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at TwitterAlex Payne
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Timothy McPhillips
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Martin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMartin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMarakana Inc.
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientistsJenn Rawlins
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStackCisco DevNet
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
 

Similar a Building Distributed Systems in Scala (20)

High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStack
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Martin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMartin Odersky: What's next for Scala
Martin Odersky: What's next for Scala
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Scala+data
Scala+dataScala+data
Scala+data
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStack
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Couchbase Data Pipeline
Couchbase Data PipelineCouchbase Data Pipeline
Couchbase Data Pipeline
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 

Más de Alex Payne

Splitting up your web app
Splitting up your web appSplitting up your web app
Splitting up your web appAlex Payne
 
The perils and rewards of working on stuff that matters
The perils and rewards of working on stuff that mattersThe perils and rewards of working on stuff that matters
The perils and rewards of working on stuff that mattersAlex Payne
 
Emerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the HorizonEmerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the HorizonAlex Payne
 
Speedy, Stable, and Secure: Better Web Apps Through Functional Languages
Speedy, Stable, and Secure: Better Web Apps Through Functional LanguagesSpeedy, Stable, and Secure: Better Web Apps Through Functional Languages
Speedy, Stable, and Secure: Better Web Apps Through Functional LanguagesAlex Payne
 
Mind The Tools
Mind The ToolsMind The Tools
Mind The ToolsAlex Payne
 
Strange Loop 2009 Keynote: Minimalism in Computing
Strange Loop 2009 Keynote: Minimalism in ComputingStrange Loop 2009 Keynote: Minimalism in Computing
Strange Loop 2009 Keynote: Minimalism in ComputingAlex Payne
 
The Business Value of Twitter
The Business Value of TwitterThe Business Value of Twitter
The Business Value of TwitterAlex Payne
 
Twitter API 2.0
Twitter API 2.0Twitter API 2.0
Twitter API 2.0Alex Payne
 
The Interaction Design Of APIs
The Interaction Design Of APIsThe Interaction Design Of APIs
The Interaction Design Of APIsAlex Payne
 
Why Scala for Web 2.0?
Why Scala for Web 2.0?Why Scala for Web 2.0?
Why Scala for Web 2.0?Alex Payne
 
The Twitter API: A Presentation to Adobe
The Twitter API: A Presentation to AdobeThe Twitter API: A Presentation to Adobe
The Twitter API: A Presentation to AdobeAlex Payne
 
Protecting Public Hotspots
Protecting Public HotspotsProtecting Public Hotspots
Protecting Public HotspotsAlex Payne
 
Twitter at BarCamp 2008
Twitter at BarCamp 2008Twitter at BarCamp 2008
Twitter at BarCamp 2008Alex Payne
 
Securing Rails
Securing RailsSecuring Rails
Securing RailsAlex Payne
 
Designing Your API
Designing Your APIDesigning Your API
Designing Your APIAlex Payne
 
Scaling Twitter - Railsconf 2007
Scaling Twitter - Railsconf 2007Scaling Twitter - Railsconf 2007
Scaling Twitter - Railsconf 2007Alex Payne
 

Más de Alex Payne (17)

Splitting up your web app
Splitting up your web appSplitting up your web app
Splitting up your web app
 
The perils and rewards of working on stuff that matters
The perils and rewards of working on stuff that mattersThe perils and rewards of working on stuff that matters
The perils and rewards of working on stuff that matters
 
Emerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the HorizonEmerging Languages: A Tour of the Horizon
Emerging Languages: A Tour of the Horizon
 
Speedy, Stable, and Secure: Better Web Apps Through Functional Languages
Speedy, Stable, and Secure: Better Web Apps Through Functional LanguagesSpeedy, Stable, and Secure: Better Web Apps Through Functional Languages
Speedy, Stable, and Secure: Better Web Apps Through Functional Languages
 
Mind The Tools
Mind The ToolsMind The Tools
Mind The Tools
 
Strange Loop 2009 Keynote: Minimalism in Computing
Strange Loop 2009 Keynote: Minimalism in ComputingStrange Loop 2009 Keynote: Minimalism in Computing
Strange Loop 2009 Keynote: Minimalism in Computing
 
The Business Value of Twitter
The Business Value of TwitterThe Business Value of Twitter
The Business Value of Twitter
 
Twitter API 2.0
Twitter API 2.0Twitter API 2.0
Twitter API 2.0
 
The Interaction Design Of APIs
The Interaction Design Of APIsThe Interaction Design Of APIs
The Interaction Design Of APIs
 
Why Scala for Web 2.0?
Why Scala for Web 2.0?Why Scala for Web 2.0?
Why Scala for Web 2.0?
 
The Twitter API: A Presentation to Adobe
The Twitter API: A Presentation to AdobeThe Twitter API: A Presentation to Adobe
The Twitter API: A Presentation to Adobe
 
Protecting Public Hotspots
Protecting Public HotspotsProtecting Public Hotspots
Protecting Public Hotspots
 
Twitter at BarCamp 2008
Twitter at BarCamp 2008Twitter at BarCamp 2008
Twitter at BarCamp 2008
 
Securing Rails
Securing RailsSecuring Rails
Securing Rails
 
Why Scala?
Why Scala?Why Scala?
Why Scala?
 
Designing Your API
Designing Your APIDesigning Your API
Designing Your API
 
Scaling Twitter - Railsconf 2007
Scaling Twitter - Railsconf 2007Scaling Twitter - Railsconf 2007
Scaling Twitter - Railsconf 2007
 

Último

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 

Último (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 

Building Distributed Systems in Scala

  • 1. Building Distributed Systems in Scala A presentation to Emerging Technologies for the Enterprise April 8, 2010 – Philadelphia, PA TM
  • 2. About @al3x ‣ At Twitter since 2007 ‣ Working on the Web since 1995 ‣ Co-author of Programming Scala (O’Reilly, 2009) ‣ Into programming languages, distributed systems.
  • 3. About Twitter ‣ Social messaging – a new way to communicate ‣ Launched in mid-2006 ‣ Hit the mainstream in 2008 ‣ 50+ million tweets per day (600+ per second) ‣ Millions of users worldwide
  • 4. Technologies Used At Twitter Languages Frameworks ‣ Ruby, JavaScript ‣ Rails ‣ Scala ‣ jQuery ‣ lil’ bit of C, Python, Java Data Storage Misc. ‣ MySQL ‣ memcached ‣ Cassandra ‣ ZooKeeper ‣ HBase (Hadoop) ‣ Jetty ‣ so much more!
  • 5. Why Scala? ‣ A language that’s both fun and productive. ‣ Great performance (on par with Java). ‣ Object-oriented and functional programming, together. ‣ Ability to reuse existing Java libraries. ‣ Flexible concurrency (Actors, threads, events). ‣ A smart community with infectious momentum.
  • 6. Hawkwind A case study in (re)building a distributed system in Scala.
  • 7. Requirements ‣ Search for people by name, username, eventually by other attributes. ‣ Order the results some sensible way (ex: by number of followers). ‣ Offer suggestions for misspellings/alternate names. ‣ Handle case-folding and other text normalization concerns on the query string. ‣ Return results in about a second, preferably less.
  • 9. Finding People on Twitter results
  • 10. Finding People on Twitter suggestion results
  • 11. Finding People on Twitter speedy! suggestion results
  • 12. First Attempt: acts_as_solr ‣ Crunched on time, so we wanted the fastest route to working user search. ‣ Uses the Solr distribution/platform from Apache Lucene. ‣ Tries to make Rails integration straightforward and idiomatic. ‣ Easy to get running, hard to operationalize.
  • 13. In the Interim: A Move to SOA ‣ Stopped thinking of our architecture as just a Rails app and the components that orbit it. ‣ Started building isolated services that communicate with the rest of the system via Thrift (an RPC and server framework). ‣ Allows us freedom to change the underlying implementation of services without modifying the rest of the system.
  • 14. Thrift Example struct Results { 1: list<i64> people 2: string suggestion 3: i32 processingTime /* milliseconds */ 4: list<i32> timings 5: i32 totalResults } service NameSearch { Results find(1: string name, 2: i32 maxResults, 3: bool wantSuggestion) Results find_with_ranking(1: string name, 2: i32 maxResults, 3: bool wantSuggestion, 4: Ranker ranking) }
  • 15. Second Attempt: Hawkwind 1 ‣ A quick (three weeks) bespoke Scala project to “stop the bleeding”. ‣ Vertically but not horizontally scalable: no sharding, no failover, machine-level redundancy. ‣ Ran into memory and disk space limits. ‣ Reused Java code but didn’t offer nice Scala wrappers or rewrites. ‣ Still, planned to grow 10x, grew 25x!
  • 16. Goals for Hawkwind 2 ‣ Horizontally scalable: sharded corpus, replication of shards, easy to grow the service. ‣ Faster. ‣ Higher-quality results. ‣ Better use of Scala (language features, programming style). ‣ Maintainable code base, make it easy to add features.
  • 17. High-Level Concepts ‣ Shards: pieces of the user corpus. ‣ Replicas: copies of shards. ‣ Document Servers. ‣ Merge Servers. ‣ Every machine gets the same code, can be either a Document Server or a Merge Server.
  • 18. Hawkwind 2 Internet High-Level queries for users, API requests Architecture Rails Cluster Thrift call to semi-random Merge Server Merge Merge Merge Server Server Server Thrift calls to semi-random replica of each shard Shard 1 Shard 1 Shard 2 Shard 2 Shard 3 Shard 3 Doc Server Doc Server Doc Server Doc Server Doc Server Doc Server periodic deliveries of sharded user corpus Hadoop (HBase)
  • 19. Taking Care of Data ‣ A Hadoop job gathers up the user data and slices it into shards. ‣ A cron job fetches these data dumps several times per day. ‣ To load a new corpus on a Document Server, simply restart the process. ‣ Redundancy and staggered scheduling keeps the system from running too hot while restarts are in progress.
  • 20. What a Document Server does ‣ On startup, load Thrift serialized User objects. ‣ Populate an Inverted Index, Map, and Trie with normalized attributes of those User objects. ‣ Once ready, listen for queries. ‣ Answering a query basically means looking stuff up in those pre-populated data structures. ‣ Maintains a connection pool for Thrift requests, wrapping org.apache.commons.pool.
  • 21. What a Merge Server does ‣ Gets queries. ‣ Fans out queries to Document Servers. ‣ Waits for queries to come back using a custom ParallelFuture class, which wraps a number of java.util.concurrent classes. ‣ Merges together the result sets, re-ranks them, and ships ‘em back to the requesting client.
  • 22. How to model a distributed system? ‣ Literal decomposition: classes for all architectural components (Shard, Replica, etc.). ‣ Each component knows/does as little as possible. ‣ Isolate mutable state, test carefully. ‣ Cleanly delegate calls.
  • 23. Literal Decomposition: Replica case class Replica(val shard: Shard, val server: Server) { private val log = Logger.get val BACKEND_ERROR = Stats.getCounter("backend_timeout") def query(q: Query): DocResults = w3c.time("replica-query") { server.thriftCall { client => // logic goes here } } def ping(): Boolean = server.thriftCall { client => log.debug("calling ping via thrift for %s", server) val rv = client.ping() log.debug("ping returned %s from %s", rv, server) rv } }
  • 24. Literal Decomposition: Server case class Server(val hostname: String, val port: Int) { val pool = ConnectionPool(hostname, port) private val log = Logger.get def thriftCall[A](f: Client => A) = { log.debug("making thriftCall for server %s", this) pool.withClient { client => f(client) } } def replica: Replica = { Replica(ShardMap.serversToShards(this), this) } }
  • 25. Hawkwind 2 Query Call MergeLayer.query Graph ShardMap.query shard.replicaManager ! query shard.query randomReplica() replica.query server.thriftCall NameSearchDocumentLayerClient.find
  • 26. Hawkwind 2 Query Call MergeLayer.query Graph what’s this? ShardMap.query shard.replicaManager ! query shard.query randomReplica() replica.query server.thriftCall NameSearchDocumentLayerClient.find
  • 27. ShardMap: Isolating Mutable State ‣ A singleton and an Actor. ‣ Contains a map from Servers to their corresponding Shards. ‣ Also contains a map from Shards to the Replicas of those shards. ‣ Responsible for populating and managing those maps. ‣ Send it a message to evict or reinsert a Replica. ‣ Fans out queries to Shards.
  • 28. ReplicaHealthChecker ‣ Much like the ShardMap, a singleton and an Actor. ‣ Maintains mutable lists of unhealthy Replicas (“the penalty box”). ‣ Constantly checking to see if evicted Replicas are healthy again (back online). ‣ Sends messages to itself – an effective Actor technique.
  • 29. Challenges, Large and Small ‣ Fast importing of huge serialized Thrift object dumps. ‣ Testing the ShardMap and ReplicaHealthChecker (mutable state wants to hurt you). ‣ Efficient accent normalization and filtering for special characters. ‣ Working with the Apache Commons object pool. ‣ Breaking out different ranking mechanisms in a clean, reusable way.
  • 30. Libraries & Tools Things that make working in Scala way more productive.
  • 31. sbt – the Simple Build Tool ‣ Scala’s answer to Ant and Maven. ‣ Sets up new projects. ‣ Maintains project configuration, build tasks, and dependencies in pure Scala. Totally open- ended. ‣ Interactive console. ‣ Will run tasks as soon as files in your project change – automatically compile and run tests!
  • 32. Ostrich ‣ Gather statistics about your application. ‣ Counters, gauges, and timings. ‣ Share stats via JMX, a plain-text socket, a web interface, or log files. ‣ Ex: Stats.time("foo") { timeConsumingOperation() }
  • 33. Configgy ‣ Manages configuration files and logging. ‣ Flexible file format, can include files in other files. ‣ Inheritance, variable substitution. ‣ Tunable logging, logging with Scribe. ‣ Subscription API: push and validate configuration changes to running processes. ‣ Ex: val foo = config.getString(“foo”)
  • 34. Specs + xrayspecs ‣ A behavior-driven development (BDD) testing framework for Scala. ‣ Elegant, readable, fun-to-write tests. ‣ Support for several mocking frameworks (we like Mockito). ‣ Test concurrent operations, time, much more. ‣ Ex: "suggestion with a List of null does not blow up" in { MergeLayer.suggestion("steve", List(null)) mustEqual None }
  • 35. Questions? Follow me at twitter.com/al3x Learn with us at engineering.twitter.com Work with us at jobs.twitter.com TM

Notas del editor

  1. This is literally all there is to this class!