Más contenido relacionado La actualidad más candente (20) Similar a TriHUG - Beyond Batch (20) TriHUG - Beyond Batch1. Beyond Batch
HBase, Drill, & Storm
Brad Anderson
©MapR Technologies
2. whoami
• Brad Anderson
• Solutions Architect at MapR (Atlanta)
• ATLHUG co-chair
• ‘boorad’ most places (twitter, github)
• banderson@maprtech.com
©MapR Technologies
3. • The open enterprise-grade distribution for Hadoop
• Easy, dependable and fast
• Open source with standards-based extensions
• MapR is deployed at 1000’s of companies
• From small Internet startups to the world’s largest enterprises
• MapR customers analyze massive amounts of data:
• Hundreds of billions of events daily
• 90% of the world’s Internet population monthly
• $1 trillion in retail purchases annually
• MapR Cloud Partners
• Google to provide Hadoop on Google Compute Engine
• Amazon for Elastic Map Reduce + instances
©MapR Technologies
6. HBase Issues
Reliability
• Compactions disrupt operations
• Very slow crash recovery
• Unreliable splitting
Business continuity
• Common hardware/software issues cause downtime
• Administration requires downtime
• No point-in-time recovery
• Complex backup process
Performance
• Many bottlenecks result in low throughput
• Limited data locality
• Limited # of tables
Manageability
• Compactions, splits and merges must be done manually (in reality)
• Basic operations like backup or table rename are complex
©MapR Technologies
7. M7
An integrated system for unstructured and structured data
– Unified namespace for files and tables
– Data management
– Data protection
– Disaster recovery
– No additional administration
An architecture that delivers reliability and performance
– Fewer layers
– No compactions
– Seamless splits
– Automatic merges
– Single network hop
– Instant recovery
– Reduced read and write amplification
©MapR Technologies
8. Unified Namespace
$ pwd
/mapr/default/user/boorad
$ ls
file1 file2 table1 table2
$ hbase shell
hbase(main):003:0> create '/user/boorad/table3', 'cf1', 'cf2', 'cf3'
0 row(s) in 0.1570 seconds
$ ls
file1 file2 table1 table2 table3
$ hadoop fs -ls /user/boorad
Found 5 items
-rw-r--r-- 3 mapr mapr 16 2012-09-28 08:34 /user/boorad/file1
-rw-r--r-- 3 mapr mapr 22 2012-09-28 08:34 /user/boorad/file2
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:32 /user/boorad/table1
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:33 /user/boorad/table2
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:38 /user/boorad/table3
©MapR Technologies
10. No RegionServers?
One network hop
No daemons to manage
One cache
©MapR Technologies
15
11. No RegionServers?
One network hop
No daemons to manage
One cache
©MapR Technologies
15
14. Instant Recovery
Apache HBase experiences an outage when any node
crashes
– Each RegionServer replays WAL before any region can be
recovered
– All regions served by that RegionServer cannot be accessed
M7 provides instant recovery
– M7 uses small WALs
• Multiple WALs per region vs. 1 per RegionServer (1000 regions)
– Instant recovery on put
– 1000-10000x faster recovery on get
How?
– M7 leverages unique MapR-FS capabilities, not impacted by
HDFS limitations
• Append support
• No limit to # of files
©MapR Technologies
15. LSMT (FTW)
Traditional disk-based index structures like B-
Trees are expensive to maintain in real-time
Log Structured Merge Trees reduce the cost by
deferring and batching index changes
Writes
– Writes go to an in-memory index
• And a commit log in case the node crashes and recovery is
needed
– The in-memory index is occasionally merged into the
disk-based index
• This may trigger a compaction
Reads
– Reads hit the in-memory index and the disk-based
index
©MapR Technologies
16. Storage Subsystem Performance
What does it cost to merge the in-memory index into the disk-
based index?
HBase-style LevelDB-style M7
Examples BigTable, HBase, Cassandra, Riak M7
Cassandra, Riak
WAF Low High Low
RAF High Low Low
I/O storms Yes No No
Disk space High (2x) Low Low
overhead
Skewed data Bad Good Good
handling
Rewrite large Yes Yes No
values
Terminology:
Write-amplification factor (WAF): The ratio between writes to disk and
application writes. Note that data must be rewritten in every indexed structure.
Read-amplification factor (RAF): The ratio between reads from disk and
application reads.
Skewed data handling: When inserting values with similar keys (eg, increasing
©MapR Technologies
17. Other M7 Features
Smaller disk footprint
– HBase stores key & column name for every version of
every cell
– M7 never repeats the key or column name
Columnar layout
– HBasesupports 2-3 column families in practice
– M7 supports 64 column families
Online schema changes
– No need to disable table to add/remove/modify
column families
©MapR Technologies
19. Big Data Picture
Batch processing Interactive analysis Stream processing
Query runtime Minutes to hours Milliseconds to minutes Never-ending
Data volume TBs to PBs GBs to PBs Continuous stream
Programming model MapReduce Queries DAG
Users Developers Analysts and Developers Developers
Google project MapReduce Dremel
Open source project Hadoop MapReduce Storm, S4
©MapR Technologies
20. Big Data Picture
Batch processing Interactive analysis Stream processing
Query runtime Minutes to hours Milliseconds to minutes Never-ending
Data volume TBs to PBs GBs to PBs Continuous stream
Programming model MapReduce Queries DAG
Users Developers Analysts and Developers Developers
Google project MapReduce Dremel
Open source project Hadoop MapReduce Storm, S4
Apache Drill
©MapR Technologies
21. Google Dremel
• Interactive analysis of large-scale datasets
• Trillion records at interactive speeds
• Complementary to MapReduce
• Used by thousands of Google employees
• Paper published at VLDB 2010
• Model
• Nested data model with schema
• Most data at Google is stored/transferred in Protocol Buffers
• Normalization (to relational) is prohibitive
• SQL-like query language with nested data support
• Implementation
• Column-based storage and processing
• In-situ data access (GFS and Bigtable)
• Tree architecture as in Web search (and databases)
©MapR Technologies
22. Google BigQuery
• Hosted Dremel (Dremel as a Service)
• CLI (bq) and Web UI
• Import data from Google Cloud Storage or local files
• Files must be in CSV format
• Nested data not supported [yet] except built-in datasets
• Schema definition required
©MapR Technologies
23. DrQL Example
DocId: 10
Links
Forward: 20 SELECT DocId AS Id,
Forward: 40 COUNT(Name.Language.Code) WITHIN Name AS
Forward: 60 Cnt,
Name Name.Url + ',' + Name.Language.Code AS Str
Language FROM t
Code: 'en-us' WHERE REGEXP(Name.Url, '^http') AND DocId < 20;
Country: 'us'
Language
Code: 'en' Id: 10
Url: 'http://A' Name
Name Cnt: 2
Url: 'http://B' Language
Name Str: 'http://A,en-us'
Language Str: 'http://A,en'
Code: 'en-gb' Name
Country: 'gb' Cnt: 0
©MapR Technologies
* Example from the Dremel paper
25. Extensibility
• Nested query languages
• Pluggable model
• DrQL
• Mongo Query Language
• Cascading
• Distributed execution engine
• Extensible model (eg, Dryad)
• Low-latency
• Fault tolerant
©MapR Technologies
26. Extensibility
• Nested data formats
• Pluggable model
• Column-based (ColumnIO/Dremel, Trevni, RCFile)
• Row-based (RecordIO, Avro, JSON, CSV)
• Schema (Protocol Buffers, Avro, CSV)
• Schema-less (JSON, BSON)
• Scalable data sources
• Pluggable model
• Hadoop
• HBase
©MapR Technologies
27. Architecture
• Only the execution engine knows the physical attributes of the
cluster
• # nodes, hardware, file locations, …
• Public interfaces enable extensibility
• Developers can build parsers for new query languages
• Developers can provide an execution plan directly
• Each level of the plan has a human readable representation
• Facilitates debugging and unit testing
©MapR Technologies
29. Query Components
• Query components:
• SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• (JOIN)
• Key logical operators:
• Scan
• Filter
• Aggregate
• (Join)
©MapR Technologies
30. Execution Engine Layers
• Drill execution engine has two layers
• Operator layer is serialization-aware
• Processes individual records
• Execution layer is not serialization-aware
• Processes batches of records (blobs)
• Responsible for communication, dependencies and fault tolerance
©MapR Technologies
31. Design Principles
Flexible Easy
• Pluggable query languages • Unzip and run
• Extensible execution engine • Zero configuration
• Pluggable data formats • Reverse DNS not needed
• Column-based and row- • IP addresses can change
based • Clear and concise log
• Schema and schema-less messages
Fast Dependable
• C/C++ core with Java • No SPOF
support • Instant recovery from
• Google C++ style guide crashes
• Min latency and max
throughput (limited only by
hardware)
©MapR Technologies
32. Hadoop Integration
• Hadoop data sources
• Hadoop FileSystem API (HDFS/MapR-FS)
• HBase
• Hadoop data formats
• Apache Avro
• RCFile
• MapReduce-based tools to create column-based
formats
©MapR Technologies
37. Storm
Guaranteed data processing
Horizontal scalability
Fault-tolerance
No intermediate message brokers!
Higher level abstraction than
message passing
“Just works”
©MapR Technologies
39. Streams
Tuple Tuple Tuple Tuple Tuple Tuple Tuple
Unbounded sequence of tuples
©MapR Technologies
40. Spouts
Tuple
Tuple Tuple Tuple
Tuple Tuple Tuple
Tuple Tuple
Tuple Tuple
Tuple Tuple
Tuple
Source of streams
©MapR Technologies
41. Spouts
public interface ISpout extends Serializable {
void open(Map conf,
TopologyContext context,
SpoutOutputCollector collector);
void close();
void nextTuple();
void ack(Object msgId);
void fail(Object msgId);
}
©MapR Technologies
42. Bolts
Tuple Tuple Tuple Tuple Tuple Tuple Tuple
Tuple Tuple Tuple Tuple
Tuple Tuple
Tuple Tuple
Tuple Tuple
Tuple
Processes input streams and produces new streams
©MapR Technologies
43. Bolts
public class DoubleAndTripleBolt extends BaseRichBolt {
private OutputCollectorBase _collector;
public void prepare(Map conf,
TopologyContext context,
OutputCollectorBase collector) {
_collector = collector;
}
public void execute(Tuple input) {
int val = input.getInteger(0);
_collector.emit(input, new Values(val*2, val*3));
_collector.ack(input);
}
public void declareOutputFields(OutputFieldsDeclarer
declarer) {
declarer.declare(new Fields("double", "triple"));
}
}
©MapR Technologies
44. Topologies
Network of spouts and bolts
©MapR Technologies
46. Trident
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"),
new Split(),
new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(),
new Count(),
new Fields("count"))
.parallelismHint(6);
©MapR Technologies
48. Spouts
•Kafka (with transactions)
•Kestrel
•JMS
•AMQP
•Beanstalkd
©MapR Technologies
50. Storm
realtime
processes
Apps
Queue
Raw
Data Business
Value
Hadoop
batch
processes
©MapR Technologies
51. Storm
realtime
processes
Apps
Queue
Raw
Data Business
Value
Hadoop
Parallel Cluster Ingest
batch
processes
©MapR Technologies
52. Storm
realtime
processes
Apps
Queue
Raw
Data Business
Value
Hadoop
batch
processes
©MapR Technologies
53. Storm
realtime
processes
Apps
Raw
Data Business
Value
Hadoop
batch
processes
©MapR Technologies
54. Get Involved!
• Get more details on M7
• http://mapr.com/products/mapr-editions/m7-edition
• Join the Apache Drill mailing list
• drill-dev-subscribe@incubator.apache.org
• Watch TailSpout development
• https://github.com/{tdunning | boorad}/mapr-spout
• Join MapR
• jobs@mapr.com
• banderson@maprtech.com
• @boorad
©MapR Technologies
Notas del editor \n \n \n \n hbase - random reads/writes - 45% of all hadoop clusters\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n Drill \nRemove schema requirement\nIn-situ for real since we&#x2019;ll support multiple formats\n\nNote: MR needed for big joins so to speak\n Drill\nWill support nested\nNo schema required\n Protocol buffers are conceptual data model\nWill support multiple data models\nWill have to define a way to explain data format\n (filtering, fields, etc)\nSchema-less will have perf penalty\nHbase will be one format\n Likely to support these\nCould add HiveQL and more as well. Could even be clever and support HiveQL to MR or Drill based upon query\nPig as well\n\nPluggability\nData format\nQuery language\n\nSomething 6-9 months alpha quality\nCommunity driven, I can&#x2019;t speak for project\n\nMapR\nFS gives better chunk size control\nNFS support may make small test drivers easier\nUnified namespace will allow multi-cluster access\nMight even have drill component that autoformats data\n\n\nRead only model\n Example query that Drill should support\n\nNeed to talk more here about what Dremel does\n Load data into Drill (optional)\nCould just use as is in &#x201C;row&#x201D; format\nMultiple query languages\nPluggability very important\n Note: we have an already partially built execution engine\n Note: we have an already partially built execution engine\n \n \n \n \n \n \n Be prepared for Apache questions\nCommitter vs committee vs contributor\n\nIf can&#x2019;t answer question, ask them to answer and contribute\nLisa - Need landing page\nReferences to paper and such at end\n \n \n \n scaling is painful\npoor fault tolerance\ncoding is hard\n \n \n tweets stock ticks manufacturing machine data sensor messages\n \n \n \n \n DAG\n\nruns continuously\n abstractions like Cascading, Hive, Pig make MR approachable\n\ncode size reduction\n \n \n kestrel - via thrift\nkafka - transactional topologies, idempotentcy, process only once\nactivemq\n \n current architecture\n\ndata ingest tool for hadoop (avoid Flume madness)\n current architecture\n\ndata ingest tool for hadoop (avoid Flume madness)\n \n