SlideShare a Scribd company logo
1 of 43
Beyond Batch
   Drill & Storm

   Brad Anderson


©MapR Technologies
whoami
• Brad Anderson

• Solutions Architect at MapR (Atlanta)

• ATLHUG co-chair

• „boorad‟ most places (twitter, github)

• banderson@maprtech.com
©MapR Technologies
MapR - Faster and More Scalable
Benchmark        MapR 2.1.1        CDH 4.1.1       MapR Speed
                                                    Increase

Terasort (1x replication, compression disabled)

Total              13m 35s          26m 6s             1.9x

Map                 7m 58s          21m 8s             2.7x

Reduce             13m 32s          23m 37s            1.7x

DFSIO throughput/node
                                                                          MapR/Googl   Apache
Read              1003 MB/s        656 MB/s            1.5x                   e        Hadoop
Write             924 MB/s         654 MB/s            1.4x       Time      54 sec     62 sec
YCSB (50% read, 50% update)                                       Nodes     1,003      1,460
Throughput      36,584.4 op/s    12,500.5 op/s         2.9x       Disks     1,003      5,840
                                                                  Cores     4,012      11,680
Runtime             3.80 hr         11.11 hr           2.9x

YCSB (95% read, 5% update)

Throughput      24,704.3 op/s    10,776.4 op/s         2.3x

Runtime             0.56 hr          1.29 hr           2.3x

              Benchmark hardware configuration:
 10 servers, 12 x 2 cores (2.4 GHz), 12 x 2TB, 48 GB, 1 x 10GbE
Beyond Batch
                HBase & M7

                Apache Drill

                Storm

                Solr & Elastic Search

©MapR Technologies
Latency Matters

          Batch      Interactive   Streaming




©MapR Technologies
Big Data Picture
                        Batch processing    Interactive analysis     Stream processing

Query runtime           Minutes to hours   Milliseconds to minutes     Never-ending

Data volume                TBs to PBs           GBs to PBs           Continuous stream

Programming model         MapReduce               Queries                  DAG

Users                      Developers      Analysts and Developers      Developers

Google project            MapReduce               Dremel

Open source project Hadoop MapReduce                                     Storm, S4




                                           Apache Drill
   ©MapR Technologies
Interactive SQL Initiatives for Hadoop



    SQL based OLTP                                    SQL based
                                                       analytics


                                    Real-time
                               interactive queries
         Impala*
        Real-time                                    SQL conversion
   interactive queries                               to MapReduce
* Does not work with other distributions
©MapR Technologies
Google Dremel
• Interactive analysis of large-scale datasets
      • Trillion records at interactive speeds
      • Complementary to MapReduce
      • Used by thousands of Google employees
      • Paper published at VLDB 2010
• Model
      • Nested data model with schema
           • Most data at Google is stored/transferred in Protocol Buffers
      • SQL-like query language with nested data support
• Implementation
      • Column-based storage and processing
      • In-situ data access (GFS and Bigtable)
      • Tree architecture as in Web search (and databases)
©MapR Technologies
Google BigQuery
• Hosted Dremel (Dremel as a Service)
• CLI (bq) and Web UI
• Import data from Google Cloud Storage or local files
           • Files must be in CSV format
           • Nested data not supported [yet] except built-in datasets
           • Schema definition required




©MapR Technologies
Drill Design Principles
Flexible                          Easy
•Pluggable query languages        •Unzip and run
•Extensible execution engine      •Zero configuration
•Pluggable data formats           •Reverse DNS not needed
   • Columns and Rows             •IP addresses can change
   • Schema and Schema-less       •Clear and concise log messages
•Pluggable data sources


Fast                              Dependable
•C/C++ core with Java support     •No SPOF
    • Google C++ style guide      •Instant recovery from crashes
•Min latency and max throughput
(limited only by hardware)


 ©MapR Technologies
DrQL Example
 DocId: 10
 Links
  Forward: 20        SELECT DocId AS Id,
  Forward: 40         COUNT(Name.Language.Code) WITHIN Name AS Cnt,
  Forward: 60         Name.Url + ',' + Name.Language.Code AS Str
 Name                FROM t
  Language           WHERE REGEXP(Name.Url, '^http') AND DocId < 20;
    Code: 'en-us'
    Country: 'us'
  Language
    Code: 'en'                          Id: 10
  Url: 'http://A'                       Name
 Name                                    Cnt: 2
  Url: 'http://B'                        Language
 Name                                      Str: 'http://A,en-us'
  Language                                 Str: 'http://A,en'
    Code: 'en-gb'                       Name
    Country: 'gb'                        Cnt: 0
©MapR Technologies
                                       * Example from the Dremel paper
Data Flow




©MapR Technologies
Extensibility
• Nested query languages
      • DrQL
      • Mongo Query Language
      • Cascading, Hive, Pig


• Distributed execution engine
      • Extensible model (eg, Dryad)
      • Low-latency
      • Fault tolerant



©MapR Technologies
Extensibility
Nested data formats
      • Pluggable model
        • Column-based (ColumnIO/Dremel, Trevni, RCFile)
        • Row-based (RecordIO, Avro, JSON, CSV)
        • Schema (Protocol Buffers, Avro, CSV)
        • Schema-less (JSON, BSON)

Scalable data sources
      • Pluggable model
      • Hadoop
      • HBase

©MapR Technologies
Drill Architecture
    Client                                                     Cluster

                                                                            Execu2on4
        Driver                 Parser               Compiler                                  Data4
                                                                                                  Source
                                                                             Engine


                 Query4
                      (text)            AST4
                                           (text)              Plan4
                                                                   (text)               API


   Public interfaces enable extensibility
    –    Add a new query language by implementing a parser
    –    Add a new data source by implementing an API
    –    Provide a plan directly to the execution engine to control execution
   Each level of the plan has a human readable representation
    –    Facilitates debugging and development
Drill Architecture (2)

               DrQL%
                   Clients

                   Driver           Drill%
                                         Query%
                                              Servers

                                DrQL%
                                    Parser
                                                 Compiler   Drill%
                                                                 Worker
                                                             Drill%Worker
 Cascading/Pig/...%
                  Clients       Other%
                                     Parser
                                                              Drill%Worker

                Intermediate%
  Driver
                   Parser
Query Components
• Query components:
      •   SELECT
      •   FROM
      •   WHERE
      •   GROUP BY
      •   HAVING
      •   JOIN

• Key logical operators:
      •   Scan
      •   Filter
      •   Aggregate
      •   Join

©MapR Technologies
Scan Operators
• Drill supports multiple data formats by having per-format scan operators
• Queries involving multiple data formats/sources are supported

• Fields and predicates can be pushed down into the scan operator

• Scan operators may have adaptive side-effects (database cracking)
• Produce ColumnIO from RecordIO
• Google PowerDrill stores materialized expressions with the data

                              Scan with schema                  Scan without schema

 Operator output Protocol Buffers                      JSON-like (MessagePack)

                ColumnIO (column-based protobuf/Dremel)
 Supported data                                         JSON
                RecordIO (row-based protobuf)
 formats                                                HBase
                CSV
 SELECT …            ColumnIO(proto URI, data URI)     Json(data URI)
 FROM …              RecordIO(proto URI, data URI)     HBase(table name)
©MapR Technologies
Execution Engine Layers
• Drill execution engine has two layers
      • Operator layer is serialization-aware
           • Processes individual records
      • Execution layer is not serialization-aware
           • Processes batches of records (blobs)
           • Responsible for communication, dependencies and fault tolerance




©MapR Technologies
Hadoop Integration
• Hadoop data sources
      • Hadoop FileSystem API (HDFS/MapR-FS)
      • HBase
• Hadoop data formats
      • Apache Avro
      • RCFile
• MapReduce-based tools to create column-based formats
• Table registry in HCatalog
• Run long-running services in YARN


©MapR Technologies
Fully Open




©MapR Technologies
Momentum
Over 200 people on the Drill mailing list
Over 200 members of the Bay Area Drill User Group
Over 100 participants the first meetup in Sunnyvale, CA
 • MapR, Cisco, Intel, eBay, Google, Yahoo!, LinkedIn, …
Drill meetups across the US and Europe
OpenDremel team and source code merged with Apache Drill
Simba Technologies – ODBC inventor developing a Drill
 ODBC driver
 • Tableau, MicroStrategy, Excel, SAP Crystal Reports, …
Storm




©MapR Technologies
Before Storm




                     Queues   Workers


©MapR Technologies
Example




©MapR Technologies
                     (simplified)
Storm

                     Guaranteed data processing
                     Horizontal scalability
                     Fault-tolerance
                     No intermediate message brokers!
                     Higher level abstraction than
                     message passing
                     “Just works”
©MapR Technologies
Concepts




©MapR Technologies
Streams



   Tuple              Tuple   Tuple   Tuple   Tuple   Tuple   Tuple




                     Unbounded sequence of tuples
©MapR Technologies
Spouts




                     Source of streams

©MapR Technologies
Spouts

public interface ISpout extends Serializable {
     void open(Map conf,
                  TopologyContext context,
          SpoutOutputCollector collector);
     void close();
     void nextTuple();
     void ack(Object msgId);
     void fail(Object msgId);
}



©MapR Technologies
Bolts



                               Tuple   Tuple   Tuple   Tuple




Processes input streams and produces new streams

 ©MapR Technologies
Bolts
 public class DoubleAndTripleBolt extends BaseRichBolt {
      private OutputCollectorBase _collector;

            public void prepare(Map conf,
                                   TopologyContext context,
                                   OutputCollectorBase collector) {
                 _collector = collector;
            }

            public void execute(Tuple input) {
                 int val = input.getInteger(0);
                 _collector.emit(input, new Values(val*2, val*3));
                 _collector.ack(input);
            }

            public void declareOutputFields(OutputFieldsDeclarer declarer) {
                 declarer.declare(new Fields("double", "triple"));
            }
 }



©MapR Technologies
Topologies




                     Network of spouts and bolts
©MapR Technologies
Trident
Cascading for Storm




©MapR Technologies
Trident
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
     topology.newStream("spout1", spout)
       .each(new Fields("sentence"),
              new Split(),
              new Fields("word"))
       .groupBy(new Fields("word"))
       .persistentAggregate(new MemoryMapState.Factory(),
                             new Count(),
                new Fields("count"))
       .parallelismHint(6);




 ©MapR Technologies
Interoperability




©MapR Technologies
Spouts
          Kafka (with transactions)
          Kestrel
          JMS
          AMQP



©MapR Technologies
Bolts
 Functions
 Filters
 Aggregation
 Joins
 Talk to databases, Hadoop write-
       behind


©MapR Technologies
Storm

                               realtime
                              processes
                                                         Apps
                      Queue


Raw
                                                        Busines
Data
                                                           s
                                                         Value
                                             Hadoop

Parallel Cluster Ingest

                                              batch
                                            processes
 ©MapR Technologies
Storm

                                                       realtime
                                                      processes
                                                                   Apps
                                      TailSpout
                      Queue


Raw
                                                                  Busines
Data
                                                                     s
                              Georg



                                                                   Value
                                            Hadoop




                                                    batch
                                                  processes
 ©MapR Technologies
Georg and TailSpout
Get Involved!
• Slides
      • http://slideshare.net/boorad/phillydb

• Join the Apache Drill mailing list
      • drill-dev-subscribe@incubator.apache.org


• Watch TailSpout & Georg development
      • https://github.com/{tdunning | boorad | rlankenau}/mapr-spout


• Join MapR
      • jobs@mapr.com
      • banderson@maprtech.com

• @boorad
©MapR Technologies

More Related Content

What's hot

Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptxTed Dunning
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterAltoros
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 

What's hot (20)

Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Spark vstez
Spark vstezSpark vstez
Spark vstez
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
10c introduction
10c introduction10c introduction
10c introduction
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptx
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 

Similar to PhillyDB Talk - Beyond Batch

Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19jasonfrantz
 
Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis Yahoo Developer Network
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04Ted Dunning
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 

Similar to PhillyDB Talk - Beyond Batch (20)

HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
 
Drill dchug-29 nov2012
Drill dchug-29 nov2012Drill dchug-29 nov2012
Drill dchug-29 nov2012
 
Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19
 
Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Drill at the Chicago Hug
Drill at the Chicago HugDrill at the Chicago Hug
Drill at the Chicago Hug
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 

More from boorad

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Toolsboorad
 
DevNexus 2011
DevNexus 2011DevNexus 2011
DevNexus 2011boorad
 
DevNation Atlanta
DevNation AtlantaDevNation Atlanta
DevNation Atlantaboorad
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloudboorad
 
Why Erlang? - Bar Camp Atlanta 2008
Why Erlang?  - Bar Camp Atlanta 2008Why Erlang?  - Bar Camp Atlanta 2008
Why Erlang? - Bar Camp Atlanta 2008boorad
 

More from boorad (11)

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
 
DevNexus 2011
DevNexus 2011DevNexus 2011
DevNexus 2011
 
DevNation Atlanta
DevNation AtlantaDevNation Atlanta
DevNation Atlanta
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloud
 
Why Erlang? - Bar Camp Atlanta 2008
Why Erlang?  - Bar Camp Atlanta 2008Why Erlang?  - Bar Camp Atlanta 2008
Why Erlang? - Bar Camp Atlanta 2008
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

PhillyDB Talk - Beyond Batch

  • 1. Beyond Batch Drill & Storm Brad Anderson ©MapR Technologies
  • 2. whoami • Brad Anderson • Solutions Architect at MapR (Atlanta) • ATLHUG co-chair • „boorad‟ most places (twitter, github) • banderson@maprtech.com ©MapR Technologies
  • 3. MapR - Faster and More Scalable Benchmark MapR 2.1.1 CDH 4.1.1 MapR Speed Increase Terasort (1x replication, compression disabled) Total 13m 35s 26m 6s 1.9x Map 7m 58s 21m 8s 2.7x Reduce 13m 32s 23m 37s 1.7x DFSIO throughput/node MapR/Googl Apache Read 1003 MB/s 656 MB/s 1.5x e Hadoop Write 924 MB/s 654 MB/s 1.4x Time 54 sec 62 sec YCSB (50% read, 50% update) Nodes 1,003 1,460 Throughput 36,584.4 op/s 12,500.5 op/s 2.9x Disks 1,003 5,840 Cores 4,012 11,680 Runtime 3.80 hr 11.11 hr 2.9x YCSB (95% read, 5% update) Throughput 24,704.3 op/s 10,776.4 op/s 2.3x Runtime 0.56 hr 1.29 hr 2.3x Benchmark hardware configuration: 10 servers, 12 x 2 cores (2.4 GHz), 12 x 2TB, 48 GB, 1 x 10GbE
  • 4. Beyond Batch HBase & M7 Apache Drill Storm Solr & Elastic Search ©MapR Technologies
  • 5. Latency Matters Batch Interactive Streaming ©MapR Technologies
  • 6. Big Data Picture Batch processing Interactive analysis Stream processing Query runtime Minutes to hours Milliseconds to minutes Never-ending Data volume TBs to PBs GBs to PBs Continuous stream Programming model MapReduce Queries DAG Users Developers Analysts and Developers Developers Google project MapReduce Dremel Open source project Hadoop MapReduce Storm, S4 Apache Drill ©MapR Technologies
  • 7. Interactive SQL Initiatives for Hadoop SQL based OLTP SQL based analytics Real-time interactive queries Impala* Real-time SQL conversion interactive queries to MapReduce * Does not work with other distributions
  • 9. Google Dremel • Interactive analysis of large-scale datasets • Trillion records at interactive speeds • Complementary to MapReduce • Used by thousands of Google employees • Paper published at VLDB 2010 • Model • Nested data model with schema • Most data at Google is stored/transferred in Protocol Buffers • SQL-like query language with nested data support • Implementation • Column-based storage and processing • In-situ data access (GFS and Bigtable) • Tree architecture as in Web search (and databases) ©MapR Technologies
  • 10. Google BigQuery • Hosted Dremel (Dremel as a Service) • CLI (bq) and Web UI • Import data from Google Cloud Storage or local files • Files must be in CSV format • Nested data not supported [yet] except built-in datasets • Schema definition required ©MapR Technologies
  • 11. Drill Design Principles Flexible Easy •Pluggable query languages •Unzip and run •Extensible execution engine •Zero configuration •Pluggable data formats •Reverse DNS not needed • Columns and Rows •IP addresses can change • Schema and Schema-less •Clear and concise log messages •Pluggable data sources Fast Dependable •C/C++ core with Java support •No SPOF • Google C++ style guide •Instant recovery from crashes •Min latency and max throughput (limited only by hardware) ©MapR Technologies
  • 12. DrQL Example DocId: 10 Links Forward: 20 SELECT DocId AS Id, Forward: 40 COUNT(Name.Language.Code) WITHIN Name AS Cnt, Forward: 60 Name.Url + ',' + Name.Language.Code AS Str Name FROM t Language WHERE REGEXP(Name.Url, '^http') AND DocId < 20; Code: 'en-us' Country: 'us' Language Code: 'en' Id: 10 Url: 'http://A' Name Name Cnt: 2 Url: 'http://B' Language Name Str: 'http://A,en-us' Language Str: 'http://A,en' Code: 'en-gb' Name Country: 'gb' Cnt: 0 ©MapR Technologies * Example from the Dremel paper
  • 14. Extensibility • Nested query languages • DrQL • Mongo Query Language • Cascading, Hive, Pig • Distributed execution engine • Extensible model (eg, Dryad) • Low-latency • Fault tolerant ©MapR Technologies
  • 15. Extensibility Nested data formats • Pluggable model • Column-based (ColumnIO/Dremel, Trevni, RCFile) • Row-based (RecordIO, Avro, JSON, CSV) • Schema (Protocol Buffers, Avro, CSV) • Schema-less (JSON, BSON) Scalable data sources • Pluggable model • Hadoop • HBase ©MapR Technologies
  • 16. Drill Architecture Client Cluster Execu2on4 Driver Parser Compiler Data4 Source Engine Query4 (text) AST4 (text) Plan4 (text) API  Public interfaces enable extensibility – Add a new query language by implementing a parser – Add a new data source by implementing an API – Provide a plan directly to the execution engine to control execution  Each level of the plan has a human readable representation – Facilitates debugging and development
  • 17. Drill Architecture (2) DrQL% Clients Driver Drill% Query% Servers DrQL% Parser Compiler Drill% Worker Drill%Worker Cascading/Pig/...% Clients Other% Parser Drill%Worker Intermediate% Driver Parser
  • 18. Query Components • Query components: • SELECT • FROM • WHERE • GROUP BY • HAVING • JOIN • Key logical operators: • Scan • Filter • Aggregate • Join ©MapR Technologies
  • 19. Scan Operators • Drill supports multiple data formats by having per-format scan operators • Queries involving multiple data formats/sources are supported • Fields and predicates can be pushed down into the scan operator • Scan operators may have adaptive side-effects (database cracking) • Produce ColumnIO from RecordIO • Google PowerDrill stores materialized expressions with the data Scan with schema Scan without schema Operator output Protocol Buffers JSON-like (MessagePack) ColumnIO (column-based protobuf/Dremel) Supported data JSON RecordIO (row-based protobuf) formats HBase CSV SELECT … ColumnIO(proto URI, data URI) Json(data URI) FROM … RecordIO(proto URI, data URI) HBase(table name) ©MapR Technologies
  • 20. Execution Engine Layers • Drill execution engine has two layers • Operator layer is serialization-aware • Processes individual records • Execution layer is not serialization-aware • Processes batches of records (blobs) • Responsible for communication, dependencies and fault tolerance ©MapR Technologies
  • 21. Hadoop Integration • Hadoop data sources • Hadoop FileSystem API (HDFS/MapR-FS) • HBase • Hadoop data formats • Apache Avro • RCFile • MapReduce-based tools to create column-based formats • Table registry in HCatalog • Run long-running services in YARN ©MapR Technologies
  • 23. Momentum Over 200 people on the Drill mailing list Over 200 members of the Bay Area Drill User Group Over 100 participants the first meetup in Sunnyvale, CA • MapR, Cisco, Intel, eBay, Google, Yahoo!, LinkedIn, … Drill meetups across the US and Europe OpenDremel team and source code merged with Apache Drill Simba Technologies – ODBC inventor developing a Drill ODBC driver • Tableau, MicroStrategy, Excel, SAP Crystal Reports, …
  • 25. Before Storm Queues Workers ©MapR Technologies
  • 27. Storm Guaranteed data processing Horizontal scalability Fault-tolerance No intermediate message brokers! Higher level abstraction than message passing “Just works” ©MapR Technologies
  • 29. Streams Tuple Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples ©MapR Technologies
  • 30. Spouts Source of streams ©MapR Technologies
  • 31. Spouts public interface ISpout extends Serializable { void open(Map conf, TopologyContext context, SpoutOutputCollector collector); void close(); void nextTuple(); void ack(Object msgId); void fail(Object msgId); } ©MapR Technologies
  • 32. Bolts Tuple Tuple Tuple Tuple Processes input streams and produces new streams ©MapR Technologies
  • 33. Bolts public class DoubleAndTripleBolt extends BaseRichBolt { private OutputCollectorBase _collector; public void prepare(Map conf, TopologyContext context, OutputCollectorBase collector) { _collector = collector; } public void execute(Tuple input) { int val = input.getInteger(0); _collector.emit(input, new Values(val*2, val*3)); _collector.ack(input); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("double", "triple")); } } ©MapR Technologies
  • 34. Topologies Network of spouts and bolts ©MapR Technologies
  • 36. Trident TridentTopology topology = new TridentTopology(); TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) .parallelismHint(6); ©MapR Technologies
  • 38. Spouts Kafka (with transactions) Kestrel JMS AMQP ©MapR Technologies
  • 39. Bolts Functions Filters Aggregation Joins Talk to databases, Hadoop write- behind ©MapR Technologies
  • 40. Storm realtime processes Apps Queue Raw Busines Data s Value Hadoop Parallel Cluster Ingest batch processes ©MapR Technologies
  • 41. Storm realtime processes Apps TailSpout Queue Raw Busines Data s Georg Value Hadoop batch processes ©MapR Technologies
  • 43. Get Involved! • Slides • http://slideshare.net/boorad/phillydb • Join the Apache Drill mailing list • drill-dev-subscribe@incubator.apache.org • Watch TailSpout & Georg development • https://github.com/{tdunning | boorad | rlankenau}/mapr-spout • Join MapR • jobs@mapr.com • banderson@maprtech.com • @boorad ©MapR Technologies