1500 JIRAs in 20 minutes - HBaseCon 2013

1500 JIRAs in 20 Minutes
The Evolution of HBase, 2012-2013
Ian Varley, Salesforce.com
@thefutureian

It's been a year since the
first HBaseCon.
What's changed?

It's been a year since the
first HBaseCon.
What's changed?
(besides my beard length)

One lens on the evolution of
HBase is through JIRA
(issue tracking system).

HBase has a lot of activity.
Total JIRAs, all time: ~8700

Opened in last year: ~2500

Fixed in last year: 1638

Fixed in last year: 1638
resolved >= 2012-05-23
AND resolved <= 2013-05-24
AND resolution in (Fixed, Implemented)

So we're going to talk about
them all. One by one.

First, let's get rid of the nonfunctional changes:

Test:

307

Test: 307
Build: 55

Test: 307
Build: 55
Doc: 107

Test: 307
Build: 55
Doc: 107
Ports: 62

Test: 307
Build: 55
Doc: 107
Ports: 62
Total: 503

(some overlap)

Test: 307
Build: 55
Doc: 107
Ports: 62
Total: 503

(some overlap)

"test", "junit", etc.
"pom", "classpath", "mvn", "build", etc.
"book", "[site]", "[refGuide]", "javadoc", etc.
"backport", "forward port", etc.

That leaves 1135 functional
changes to go over.
(In 18 minutes.)

Break what's left into 2 parts:
● Big Topics (20+ JIRAs on same issue)
● Indie Hits (Cool for some other reason)

Top 10 "big topics":
Snapshots:

82

Snapshots:
Replication:

82
58

Snapshots:
Replication:
Compaction:

82
58
54

Snapshots:
Replication:
Compaction:
Metrics:

82
58
54
53

Snapshots:
Replication:
Compaction:
Metrics:
Assignment:

82
58
54
53
44

Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:

82
58
54
53
44
37

Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:

82
58
54
53
44
37
34

Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:

82
58
54
53
44
37
34
28

Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:

82
58
54
53
44
37
34
28
23

Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:

82
58
54
53
44
37
34
28
23
21

Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:

82
58
54
53
44
37
34
28
23
21
416

(some overlap)

(305 functional, 111 non-functional)

Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Let's
Protobufs: dive
Security:
Bulk Loading:
Modularization:

82
58
54
53
44
37
in to
34
28
23
21
416

(some overlap)

the top 3.


Snapshots
The gist: Take advantage of the fact that files in HDFS are already immutable
to get fast "snapshots" of tables that you can roll back to. This is pretty tricky
when you consider HBase is a distributed system and you want a point in time.
Top contributors: Matteo B, Jonathan H, Ted Y, Jesse Y, Enis S
Main JIRAs:
● HBASE-6055 - Offline Snapshots: Take a snapshot after first disabling
the table
● HBASE-7290 - Online Snapshots: Take a snapshot of a live, running
table by splitting the memstore.
● HBASE-7360 - Backport Snapshots to 0.94

Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Main JIRAs:
● HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
● HBASE-8207 - Data loss when machine name contains "-". Doh.
● HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!

Replication
Main JIRAs:

Theme: corner cases!

Replication
Main JIRAs:

Theme: corner cases!
Plug: stick around next while Chris Trezzo tweets about Replication!!

Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Main JIRAs:
● HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
● HBASE-2231 - Compaction events should be written to HLog: deal with
the case when regions have been reassigned since compaction started.
Look for cool stuff to come in the next year with tiered (aka "leveled")
compaction policies, so you could do stuff like (e.g.) put "recent" data into
smaller files that'll be hit frequently, and the older "long tail" data into bigger
files that'll be hit less frequently.

Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Main JIRAs:
● HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
● HBASE-2231 - Compaction events should be written to HLog: deal with
corner case!
the case when regions have been reassigned since compaction started.
Look for cool stuff to come in the next year with tiered (aka "leveled")
compaction policies, so you could do stuff like (e.g.) put "recent" data into
smaller files that'll be hit frequently, and the older "long tail" data into bigger
files that'll be hit less frequently.

Snapshots:
Replication:
Compaction:
Metrics
Assignment
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
416

(some overlap)


Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
416 (305 functional, 111 non-functional)
(some overlap)

Snapshots:
Replication:
Compaction:
Assignment: it's tricky, yo.
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
(some overlap)

Snapshots:
Replication:
Compaction:
Hadoop 2: support it for HA NN.
Protobufs
Security
Bulk Loading
Modularization
(some overlap)

Snapshots:
Replication:
Compaction:
Protobufs: wire compatibility!
Security
Bulk Loading
Modularization
(some overlap)

Snapshots:
Replication:
Compaction:
Security: kerberos, in the core.
Bulk Loading
Modularization
(some overlap)

Snapshots:
Replication:
Compaction:
Bulk Loading: pop in an HFile.
Modularization
(some overlap)

Snapshots:
Replication:
Compaction:
Bulk Loading: pop in an HFile.
Modularization: break up the code.
(some overlap)

Now on to the
"Indie Hits JIRAs".

What's left? About half.
1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining

Blocker:
Critical:
Major:
Minor:
Trivial:

31
88
455
206
52
830

What's left? About half.
1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining

Let's cut out these:

Blocker:
31
Critical:
88
Major:
455
Minor:
206
Trivial:
52
830 573

We can't cover 573 issues.
Let's just hit a few cool ones.

HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868

HBASE-5416

Improve perf of scans with some kinds of filters

By: Max Lapan for original idea & patch, Sergey Shelukhin for final impl

Interesting because: most commented JIRA (200+ human comments!)
What? Avoid loading non-essential CFs until after filters run, big perf gain.
How?
+++ Filter.java:
+ abstract public boolean isFamilyEssential (byte[] name);
+++ HRegion.java:
KeyValueScanner scanner = store.getScanner(scan, entry.getValue());
scanners.add(scanner);
+
if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
+
|| this.filter.isFamilyEssential(entry.getKey())) {
+
scanners.add(scanner);
+
} else {
+
joinedScanners .add(scanner);
+
}

To save you some time, allow
me to summarize.

Reenactment ...
Feb 2012:
● Max Lapan: Hey guys, here's a cool patch!

Reenactment ...
Feb 2012:
● Nicolas S: This should be an app detail, not in core.

Reenactment ...
Feb 2012:
● Ted Yu: I fixed your typos while you were asleep!

Reenactment ...
Feb 2012:
● Nick: Not enough utest coverage to put this in core.
● Max: Agree, but I can't find any other way to do this.

Reenactment ...
Feb 2012:
● Nick: Not enough utest coverage to put this in core.
● Max: Agree, but I can't find any other way to do this.
● Kannan: Why don't you try 2-phase w/ multiget?
● Max: OK, ok, I'll try it.

Reenactment ...
May 2012:
● Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.

Reenactment ...
May 2012:
● Ted: Holy guacamole that's a big patch.

Reenactment ...
May 2012:
July 2012:
● Max: Anybody there? Here's a perf test.
● Ted: Cool!

Reenactment ...
May 2012:
July 2012:
● Max: Anybody there? Here's a perf test.
● Ted: Cool!
Oct 2012:
● Anoop: A coprocessor would make faster.
● Max: We're on 0.90 and can't use CP.
● Stack: -1, FB guys are right about needing more tests.

Reenactment ...
Dec 2012:
● Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.

Reenactment ...
Dec 2012:
● Stack: Still not enough tests. Some new code even
when disabled? Who's reviewing? Go easy lads.

Reenactment ...
Dec 2012:
● Stack: Still not enough tests. Some new code even
when disabled? Who's reviewing? Go easy lads.
● Ram: I'm on it. Couple improvements, but looks good.

Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
● Lars: Ooh, let's pull this into 0.94! I made a patch.

Reenactment ...
● Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.

Reenactment ...
● Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
● Ted: I optimized the disabled path.
● Lars: Sweet.

Reenactment ...
Jan, 2013:
● Ram: +1, let's commit.
● Ted: Committed to trunk
● Lars: Committed to 0.94.

Reenactment ...
Jan, 2013:
● Ram: +1, let's commit.
● Ted: Committed to trunk
● Lars: Committed to 0.94.
And there was much rejoi....

Reenactment ...
Feb, 2013:
● Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.

Reenactment ...
Feb, 2013:
● All: Crapface.

Reenactment ...
Feb, 2013:
● All: Crapface.
● Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.

Reenactment ...
Feb, 2013:
● All: Crapface.
● Lars: Filter is internal. Extend FilterBase maybe?
● Ted: If we take it OUT now, it's also a regression.

Reenactment ...
Feb, 2013:
● All: Crapface.
● Dave: Chill dudes, we can fix by changing our client.

Reenactment ...
Feb, 2013:
● All: Crapface.
● All: Uhh ... change it? Keep it? Change it?

Reenactment ...
Feb, 2013:
● All: Crapface.
● All: Uhh ... change it? Keep it? Change it?
Resolution: Change it (HBASE-7920)

Moral of the story?
● JIRA comments are a great way to learn.
● Do the work to keep new features from
destabilizing core code paths.
● Careful with changing interfaces.

HBASE-4676

Prefix Compression - Trie data block encoding

By: Matt Corgan

Interesting because: most watched (42 watchers), and biggest patch.
What? An optimization to compress what we store for key/value prefixes.
How? ~8000 new lines added! (Originally written in git repo, here)

At SFDC, James Taylor reported seeing 5-15x improvement in Phoenix,
with no degradation in scan performance. Woot!

HBASE-7403

Online Merge

By: Chunhui Shen

Interesting because: It's a cool feature. And went through 33 revisions!
What? The ability to merge regions online and transactionally, just like we
do with splitting regions.
How? The master moves the regions together (on the same regionserver)
and send MERGE RPC to regionserver. Merge happens in a transaction.
Example:
RegionMergeTransaction mt = new RegionMergeTransaction(conf, parent,
midKey)
if (!mt.prepare(services)) return;
try {
mt.execute(server, services);
} catch (IOException ioe) {
try {
mt.rollback(server, services);
return;
} catch (RuntimeException e) {
myAbortable.abort("Failed merge, abort");
}
}

HBASE-1212

Merge tool expects regions to have diff seq ids

By: Jean-Marc Spaggiari

Interesting because: Oldest issue (Feb, 2009) resolved w/ patch this year.
What? With aggregated hfile format, sequence id is written into file, not
along side. In rare case where two store files have same sequence id and
we want to merge the regions, it wouldn't work.
How? In conjucntion with HBASE-7287, removes the code that did this:
--- HRegion.java
List<StoreFile> srcFiles = es.getValue();
if (srcFiles.size() == 2) {
long seqA = srcFiles.get(0).getMaxSequenceId();
long seqB = srcFiles.get(1).getMaxSequenceId();
if (seqA == seqB) {
// Can't have same sequenceid since on open store, this is
what
// distingushes the files (see the map of stores how its keyed
by
// sequenceid).
throw new IOException("Files have same sequenceid: " + seqA);
}
}

HBASE-1212

Merge tool expects regions to have diff seq ids

By: Jean-Marc Spaggiari

Interesting because: Oldest issue (Feb, 2009) resolved w/ patch this year.
What? With aggregated hfile format, sequence id is written into file, not
e!
as
along side. In rare case where two store files have same sequence id and ner c
r
co
we want to merge the regions, it wouldn't work.
How? In conjucntion with HBASE-7287, removes the code that did this:
--- HRegion.java
List<StoreFile> srcFiles = es.getValue();
if (srcFiles.size() == 2) {
long seqA = srcFiles.get(0).getMaxSequenceId();
long seqB = srcFiles.get(1).getMaxSequenceId();
if (seqA == seqB) {
// Can't have same sequenceid since on open store, this is
what
// distingushes the files (see the map of stores how its keyed
by
// sequenceid).
throw new IOException("Files have same sequenceid: " + seqA);
}
}

HBASE-7801

Allow a deferred sync option per Mutation

By: Lars Hofhansl

Interesting because: has durability implications worth blogging about.
What? Previously, you could only turn WAL writing off completely, per table
or edit. Now you can choose "none", "async", "sync" or "fsync".
How?
+++ Mutation.java
+ public void setDurability(Durability d) {
+
setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal()));
+
this.writeToWAL = d != Durability.SKIP_WAL;
+ }
+++ HRegion.java
+ private void syncOrDefer(long txid, Durability durability) {
+
switch(durability) { ...
+
case SKIP_WAL: // nothing to do
+
break;
+
case ASYNC_WAL: // defer the sync, unless we globally can't
+
if (this.deferredLogSyncDisabled) { this.log.sync(txid); }
+
break;
+
case SYNC_WAL:
+
case FSYNC_WAL:
+
// sync the WAL edit (SYNC and FSYNC treated the same for now)
+
this.log.sync(txid);
+
break;
+
}

HBASE-7801

Allow a deferred sync option per Mutation

By: Lars Hofhansl

Interesting because: has durability implications worth blogging about.
What? Previously, you could only turn WAL writing off completely, per table
or edit. Now you can choose "none", "async", "sync" or "fsync".
How?
+++ Mutation.java
+ public void setDurability(Durability d) {
+
setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal()));
+
this.writeToWAL = d != Durability.SKIP_WAL;
+ }
+++ HRegion.java
+ private void syncOrDefer(long txid, Durability durability) {
+
switch(durability) { ...
+
case SKIP_WAL: // nothing to do
+
break;
+
case ASYNC_WAL: // defer the sync, unless we globally can't
+
if (this.deferredLogSyncDisabled) { this.log.sync(txid); }
Wha ... ?
+
break;
Oh. See HADOOP-6313
+
case SYNC_WAL:
+
case FSYNC_WAL:
+
// sync the WAL edit (SYNC and FSYNC treated the same for now)
+
this.log.sync(txid);
+
break;
+
}

HBASE-4072

Disable reading zoo.cfg files

By: Harsh J

Interesting because: Biggest facepalm.
What? Used to be, if two system both use ZK and one needed to override
values, the zoo.cfg values would always win. Caused a lot of goofy bugs in
hbase utils like import/export, integration with other systems like flume.
How? Put reading it behind a config that defaults to false.
+
+
+
+

if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) {
LOG.warn(
"Parsing zoo.cfg is deprecated. Place all ZK related HBase " +
"configuration under the hbase-site.xml");

HBASE-4072

Disable reading zoo.cfg files

By: Harsh J

Interesting because: Biggest facepalm.
What? Used to be, if two system both use ZK and one needed to override
e!
as
values, the zoo.cfg values would always win. Caused a lot of goofy bugs in r c
ne
or
hbase utils like import/export, integration with other systems like flume.
c
How? Put reading it behind a config that defaults to false.
+
+
+
+

if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) {
LOG.warn(
"Parsing zoo.cfg is deprecated. Place all ZK related HBase " +
"configuration under the hbase-site.xml");

HBASE-3171

Drop ROOT, store META location in ZooKeeper

By: J-D Cryans

Interesting because: Only HBase JIRA with a downfall parody.
What? The ROOT just tells you where the META table is. That's silly.
How? Pretty big patch (59 files changed, 580 insertions(+), 1749 deletions(-))

http://www.youtube.com/watch?v=tuM9MYDssvg

HBASE-6868

Avoid double checksumming blocks

By: Lars Hofhansl

Interesting because: tiny fix, but marked as a blocker, and sunk 0.94.2 RC1.
What? since HBASE-5074 (checksums), sometimes we double checksum.
How? 3 line patch to default to skip checksum if not local fs.

r

ne
or

c

+++ HFileSystem.java
// Incorrect data is read and HFileBlocks won't be able to read
// their header magic numbers. See HBASE-5885
if (useHBaseChecksum && !(fs instanceof LocalFileSystem)) {
+
conf = new Configuration(conf);
+
conf.setBoolean("dfs.client.read.shortcircuit.skip.checksum",
true);
this.noChecksumFs = newInstanceFileSystem(conf);
...
+++ HRegionServer.java
// If hbase checksum verification enabled, automatically
//switch off hdfs checksum verification.
this.useHBaseChecksum = conf.getBoolean(
HConstants.HBASE_CHECKSUM_VERIFICATION, true);
+
HConstants.HBASE_CHECKSUM_VERIFICATION, false);

e!

s
ca

What's it all mean?
Active codebase. Good!
Complexity increasing. Bad!

credit: https://www.ohloh.net/p/hbase

One more interesting stat:
"Good on you"s

One more interesting stat:
"Good on you"s
stack

everyone
else

Takeaways?
Busy community.
New features!
Fixing corner cases.

BTW: How did I do this?
JIRA API +
Phoenix on HBase +
http://github.com/ivarley/jirachi

1500 JIRAs in 20 minutes - HBaseCon 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 1500 JIRAs in 20 minutes - HBaseCon 2013

Similar to 1500 JIRAs in 20 minutes - HBaseCon 2013 (20)

Recently uploaded

Recently uploaded (20)

1500 JIRAs in 20 minutes - HBaseCon 2013