Más contenido relacionado La actualidad más candente (20) Similar a MapR M7: Providing an enterprise quality Apache HBase API (20) MapR M7: Providing an enterprise quality Apache HBase API2. 2©MapR Technologies - Confidential
MapR: Lights Out Data Center Ready
• Automated stateful failover
• Automated re-replication
• Self-healing from HW and SW
failures
• Load balancing
• Rolling upgrades
• No lost jobs or data
• 99999’s of uptime
Reliable Compute Dependable Storage
• Business continuity with snapshots
and mirrors
• Recover to a point in time
• End-to-end check summing
• Strong consistency
• Built-in compression
• Mirror between two sites by RTO
policy
3. 3©MapR Technologies - Confidential
MapR does MapReduce (fast)
TeraSort Record
1 TB in 54 seconds
1003 nodes
MinuteSort Record
1.5 TB in 59 seconds
2103 nodes
4. 4©MapR Technologies - Confidential
MapR does MapReduce (faster)
TeraSort Record
1 TB in 54 seconds
1003 nodes
MinuteSort Record
1.5 TB in 59 seconds
2103 nodes
1.65
300
6. 6©MapR Technologies - Confidential
HBase Table Architecture
Tables are divided into key ranges (regions)
Regions are served by nodes (RegionServers)
Columns are divided into access groups (columns families)
CF1 CF2 CF3 CF4 CF5
R1
R2
R3
R4
7. 7©MapR Technologies - Confidential
HBase Architecture is Better
Strong consistency model
– when a write returns, all readers will see same value
– "eventually consistent" is often "eventually inconsistent"
Scan works
– does not broadcast
– ring-based NoSQL databases (eg, Cassandra, Riak) suffer on scans
Scales automatically
– Splits when regions become too large
– Uses HDFS to spread data, manage space
Integrated with Hadoop
– map-reduce on HBase is straightforward
9. 9©MapR Technologies - Confidential
MapR M7 Tables
Binary compatible with Apache HBase
– no recompilation needed to access M7 tables
– Just set CLASSPATH
– including HBase CLI
M7 tables accessed via pathname
– openTable( "hello") … uses HBase
– openTable( "/hello") … uses M7
– openTable( "/user/srivas/hello") … uses M7
9
10. 10©MapR Technologies - Confidential
Binary Compatible
HBase applications work "as is" with M7
– No need to recompile , just set CLASSPATH
Can run M7 and HBase side-by-side on the same cluster
– eg, during a migration
– can access both M7 table and HBase table in same program
Use standard Apache HBase CopyTable tool to copy a table
from HBase to M7 or vice-versa, viz.,
% hbase org.apache.hadoop.hbase.mapreduce.CopyTable
--new.name=/user/srivas/mytable oldtable
11. 11©MapR Technologies - Confidential
Features
Unlimited number of tables
– HBase is typically 10-20 tables (max 100)
No compaction
Instant-On
– zero recovery time
8x insert/update perf
10x random scan perf
10x faster with flash - special flash support
11
13. 13©MapR Technologies - Confidential
M7 tables in a MapR Cluster
M7 tables integrated into storage
– always available on every node
– no separate process to start/stop/monitor
– zero administration
– no tuning parameters … just works
M7 tables work 'as expected'
– First copy local to writing client
– Snapshots and mirrors
– Quotas , repl factor, data placement
13
14. 14©MapR Technologies - Confidential
Unified Namespace for Files and Tables
$ pwd
/mapr/default/user/dave
$ ls
file1 file2 table1 table2
$ hbase shell
hbase(main):003:0> create '/user/dave/table3', 'cf1', 'cf2', 'cf3'
0 row(s) in 0.1570 seconds
$ ls
file1 file2 table1 table2 table3
$ hadoop fs -ls /user/dave
Found 5 items
-rw-r--r-- 3 mapr mapr 16 2012-09-28 08:34 /user/dave/file1
-rw-r--r-- 3 mapr mapr 22 2012-09-28 08:34 /user/dave/file2
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:32 /user/dave/table1
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:33 /user/dave/table2
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:38 /user/dave/table3
16. 16©MapR Technologies - Confidential
Tables for End Users
Users can create and manage their own tables
– Unlimited # of tables
– first copy local
Tables can be created in any directory
– Tables count towards volume and user quotas
No admin intervention needed
– do stuff on the fly, no stop/restart servers
Automatic data protection and disaster recovery
– Users can recover from snapshots/mirrors on their own
17. 17©MapR Technologies - Confidential
M7 combines the best of LSM and BTrees
LSM Trees reduce insert cost by deferring and batching index changes
– If don't compact often, read perf is impacted
– If compact too often, write perf is impacted
B-Trees are great for reads
– but expensive to update in real-time
Can we combine both ideas?
Writes cannot be done better than W = 2.5x
write to log + write data to somewhere + update meta-data
18. 18©MapR Technologies - Confidential
M7 from MapR
Twisting BTree's
– leaves are variable size (8K - 8M or larger)
– can stay unbalanced for long periods of time
• more inserts will balance it eventually
• automatically throttles updates to interior btree nodes
– M7 inserts "close to" where the data is supposed to go
Reads
– Uses BTree structure to get "close" very fast
• very high branching with key-prefix-compression
– Utilizes a separate lower-level index to find it exactly
• updated "in-place"bloom-filters for gets, range-maps for scans
Overhead
– 1K record read will transfer about 32K from disk in logN seeks
20. 20©MapR Technologies - Confidential
Apache HBase HFile Structure
64Kbyte blocks
are compressed
An index into the
compressed blocks is
created as a btree
Key-value
pairs are
laid out in
increasing
order
Each cell is an individual key + value
- a row repeats the key for each column
21. 21©MapR Technologies - Confidential
HBase Region Operation
Typical region size is a few GB, sometimes even 10G or 20G
RS holds data in memory until full, then writes a new HFile
– Logical view of database constructed by layering these files, with the
latest on top
Key range represented by this region
newest
oldest
22. 22©MapR Technologies - Confidential
HBase Read Amplification
When a get/scan comes in, all the files have to be examined
– schema-less, so where is the column?
– Done in-memory and does not change what's on disk
• Bloom-filters do not help in scans
newest
oldest
With 7 files, a 1K-record get () takes about 30 seeks, 7 block decompressions,
and a total data transfer of about 130K from HDFS.
23. 23©MapR Technologies - Confidential
HBase Write Amplification
To reduce the read-amplification, HBase merges the HFiles
periodically
– process called compaction
– runs automatically when too many files
– usually turned off due to I/O storms
– and kicked-off manually on weekends
Compaction reads all files and merges
into a single HFile
24. 24©MapR Technologies - Confidential
HBase Compaction Analysis
Assume 10G per region, write 10% per day, grow 10% per week
– 1G of writes
– after 7 days, 7 files of 1G and 1file of 10G
Compaction
– Total reads: 17G (= 7 x 1G + 1 x 10G)
– Total writes: 25G (= 7G wal + 7G flush + 11G write to new HFile)
500 regions
– read 8.5T, write 12.5T major outage on node
– with fewer hfiles, it only gets worse
Best practice, serve < 500g per node (50 regions)
25. 25©MapR Technologies - Confidential
Level-DB
Tiered, logarithmic increase
– L1: 2 x 1M files
– L2: 10 x 1M
– L3: 100 x 1M
– L4: 1,000 x 1M, etc
Compaction overhead
– avoids IO storms (i/o done in smaller increments of ~10M)
– but significantly extra bandwidth compared to HBase
Read overhead is still high
– 10-15 seeks, perhaps more if the lowest level is very large
– 40K - 60K read from disk to retrieve a 1K record
26. 26©MapR Technologies - Confidential
BTree analysis
Read finds data directly, proven to be fastest
– interior nodes only hold keys
– very large branching factor
– values only at leaves
– thus caches work
– R = logN seeks, if no caching
– 1K record read will transfer about logN blocks from disk
Writes are slow on inserts
– inserted into correct place right away
– otherwise read will not find it
– requires btree to be continuously rebalanced
– causes extreme random i/o in insert path
– W = 2.5x + logN seeks if no caching
32. 33©MapR Technologies - Confidential
MapR M7 Accelerates HBase Applications
Benchmark MapR 3.0.1
(M7)
CDH 4.3.0
(HBase)
MapR
Increase
50% read,
50% update
8000 1695 5.5x
95% read, 5%
update
3716 602 6x
Reads 5520 764 7.2x
Scans
(50 rows)
1080 156 6.9x
CPU: 2 x Intel Xeon CPU E5645 2.40GHz 12 cores
RAM: 48GB
Disk: 12 x 3TB (7200 RPM)
Record size: 1KB
Data size: 2TB
OS: CentOS Release 6.2 (Final)
Benchmark MapR 3.0.1
(M7)
CDH 4.3.0
(HBase)
MapR
Increase
50% read,
50% update
21328 2547 8.4x
95% read, 5%
update
13455 2660 5x
Reads 18206 1605 11.3x
Scans
(50 rows)
1298 116 11.2x
CPU: 2 x Intel Xeon CPU E5620 2.40GHz 8 cores
RAM: 24GB
Disk: 1 x 1.2TB Fusion I/O ioDrive2
Record size: 1KB
Data size: 600GB
OS: CentOS Release 6.3 (Final)
MapR speedup with HDDs: 5x-7x MapR speedup with SSD: 5x-11.3x
33. 34©MapR Technologies - Confidential
M7: Fileservers Serve Regions
Region lives entirely inside a container
– Does not coordinate through ZooKeeper
Containers support distributed transactions
– with replication built-in
Only coordination in the system is for splits
– Between region-map and data-container
– already solved this problem for files and its chunks
34. 35©MapR Technologies - Confidential
Server Reboot
Full container-reports are tiny
– CLDB needs 2G dram for 1000-node cluster
Volumes come online very fast
– each volume independent of others
– as soon as min-repl # of containers ready
35. 36©MapR Technologies - Confidential
Server Reboot
Full container-reports are tiny
– CLDB needs 2G dram for 1000-node cluster
Volumes come online very fast
– each volume independent of others
– as soon as min-repl # of containers ready
– does not wait for whole cluster
(eg, HDFS waits for 99.9% blocks reporting)
36. 37©MapR Technologies - Confidential
Server Reboot
Full container-reports are tiny
– CLDB needs 2G dram for 1000-node cluster
Volumes come online very fast
– each volume independent of others
– as soon as min-repl # of containers ready
– does not wait for whole cluster
(eg, HDFS waits for 99.9% blocks reporting)
1000-node cluster restart < 5 mins
37. 38©MapR Technologies - Confidential
M7 provides Instant Recovery
0-40 microWALs per region
– idle WALs go to zero quickly, so most are empty
– region is up before all microWALs are recovered
– recovers region in background in parallel
– when a key is accessed, that microWAL is recovered inline
– 1000-10000x faster recovery
38. 39©MapR Technologies - Confidential
M7 provides Instant Recovery
0-40 microWALs per region
– idle WALs go to zero quickly, so most are empty
– region is up before all microWALs are recovered
– recovers region in background in parallel
– when a key is accessed, that microWAL is recovered inline
– 1000-10000x faster recovery
Why doesn't HBase do this?
– M7 leverages unique MapR-FS capabilities, not impacted by HDFS
limitations
– No limit to # of files on disk
– No limit to # open files
– I/O path translates random writes to sequential writes on disk
39. 40©MapR Technologies - Confidential
MapR M7 Accelerates HBase Applications
Benchmark MapR 3.0.1
(M7)
CDH 4.3.0
(HBase)
MapR
Increase
50% read,
50% update
8000 1695 5.5x
95% read, 5%
update
3716 602 6x
Reads 5520 764 7.2x
Scans
(50 rows)
1080 156 6.9x
CPU: 2 x Intel Xeon CPU E5645 2.40GHz 12 cores
RAM: 48GB
Disk: 12 x 3TB (7200 RPM)
Record size: 1KB
Data size: 2TB
OS: CentOS Release 6.2 (Final)
Benchmark MapR 3.0.1
(M7)
CDH 4.3.0
(HBase)
MapR
Increase
50% read,
50% update
21328 2547 8.4x
95% read, 5%
update
13455 2660 5x
Reads 18206 1605 11.3x
Scans
(50 rows)
1298 116 11.2x
CPU: 2 x Intel Xeon CPU E5620 2.40GHz 8 cores
RAM: 24GB
Disk: 1 x 1.2TB Fusion I/O ioDrive2
Record size: 1KB
Data size: 600GB
OS: CentOS Release 6.3 (Final)
MapR speedup with HDDs: 5x-7x MapR speedup with SSD: 5x-11.3x