2. How to perform Incremental Backup/Restore?
• HBase ships with a handful of useful tools
– CopyTable
– Export / Import
3. CopyTable
• Purpose:
– Copy part of or all of a table, either to the same cluster or
another cluster
• Usage:
– bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [--
endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename
• Options:
– starttime: Beginning of the time range.
– endtime: End of the time range. Without endtime means
starttime to forever.
– new.name: New table's name.
– peer.adr: Address of the peer cluster given in the format
hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeepe
r.znode.parent
– families: Comma-separated list of ColumnFamilies to copy.
4. CopyTable (cont.)
• Limitation
– Can only backup to another table (Scan + Put)
– While a CopyTable is running, newly inserted or updated rows
may occur and these concurrent edits may cause inconsistency.
5. Export
• Purpose:
– Dump the contents of table to HDFS in a sequence file
• Usage:
– $ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename>
<outputdir> [[<starttime> [<endtime>]]]
• Options:
– *tablename: The name of the table to export
– *outputdir: The location in HDFS to store the exported data
– starttime: Beginning of the time range
– endtime: The matching end time for the time range of the scan
used
6. Export (cont.)
• Limitation
– Can only backup to HDFS in a sequence file (Scan + Write to
HDFS).
– While a CopyTable is running, newly inserted or updated rows
may occur and these concurrent edits may cause inconsistency.
7. Import
• Purpose:
– Load data that has been exported back into HBase
• Usage
– $ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename>
<inputdir>
8. Conclusion
• Regular (ex. Daily) Incremental backup
– Use Export and organize output dir as a meaningful hierarchy
• /table_name
/2012 (year)
/07 (month)
/01 (date)
/02
…
/31
/01 (hour)
…
/24
– Perform Import to restore data on-demand
• To reduce the overhead, don’t perform it during the
peak time