The venerable MapReduce framework has allowed Hadoop to prove its worth in the big data space, and to store and analyze much larger data sets than was possible before. But there is a lot of activity in the big data ecosystem currently surrounding other major categories of workflows beyond batch.
These emerging tools include low latency i/o (HBase), interactive queries (Drill), stream processing (Storm), and text processing / indexing (Solr). This talk discusses some of the more interesting developments in Drill and Storm, their capabilities, and how they are being put to use in real world situations.
7. Interactive SQL Initiatives for Hadoop
SQL based OLTP SQL based
analytics
Real-time
interactive queries
Impala*
Real-time SQL conversion
interactive queries to MapReduce
* Does not work with other distributions
16. Drill Architecture
Client Cluster
Execu2on4
Driver Parser Compiler Data4
Source
Engine
Query4
(text) AST4
(text) Plan4
(text) API
Public interfaces enable extensibility
– Add a new query language by implementing a parser
– Add a new data source by implementing an API
– Provide a plan directly to the execution engine to control execution
Each level of the plan has a human readable representation
– Facilitates debugging and development
23. Momentum
Over 200 people on the Drill mailing list
Over 200 members of the Bay Area Drill User Group
Over 100 participants the first meetup in Sunnyvale, CA
• MapR, Cisco, Intel, eBay, Google, Yahoo!, LinkedIn, …
Drill meetups across the US and Europe
OpenDremel team and source code merged with Apache Drill
Simba Technologies – ODBC inventor developing a Drill
ODBC driver
• Tableau, MicroStrategy, Excel, SAP Crystal Reports, …