SlideShare a Scribd company logo
1 of 111
Download to read offline
1500 JIRAs in 20 Minutes
The Evolution of HBase, 2012-2013
Ian Varley, Salesforce.com
@thefutureian
It's been a year since the
first HBaseCon.
What's changed?
It's been a year since the
first HBaseCon.
What's changed?
(besides my beard length)
One lens on the evolution of
HBase is through JIRA
(issue tracking system).
HBase has a lot of activity.
HBase has a lot of activity.
Total JIRAs, all time: ~8700
HBase has a lot of activity.
Total JIRAs, all time: ~8700
Opened in last year: ~2500
HBase has a lot of activity.
Total JIRAs, all time: ~8700
Opened in last year: ~2500
Fixed in last year: 1638
HBase has a lot of activity.
Total JIRAs, all time: ~8700
Opened in last year: ~2500
Fixed in last year: 1638
resolved >= 2012-05-23
AND resolved <= 2013-05-24
AND resolution in (Fixed, Implemented)
So we're going to talk about
them all. One by one.
We need to narrow it down.
First, let's get rid of the nonfunctional changes:
First, let's get rid of the nonfunctional changes:
Test:

307
First, let's get rid of the nonfunctional changes:
Test: 307
Build: 55
First, let's get rid of the nonfunctional changes:
Test: 307
Build: 55
Doc: 107
First, let's get rid of the nonfunctional changes:
Test: 307
Build: 55
Doc: 107
Ports: 62
First, let's get rid of the nonfunctional changes:
Test: 307
Build: 55
Doc: 107
Ports: 62
Total: 503

(some overlap)
First, let's get rid of the nonfunctional changes:
Test: 307
Build: 55
Doc: 107
Ports: 62
Total: 503

(some overlap)

"test", "junit", etc.
"pom", "classpath", "mvn", "build", etc.
"book", "[site]", "[refGuide]", "javadoc", etc.
"backport", "forward port", etc.
That leaves 1135 functional
changes to go over.
(In 18 minutes.)
Break what's left into 2 parts:
● Big Topics (20+ JIRAs on same issue)
● Indie Hits (Cool for some other reason)
Top 10 "big topics":
Top 10 "big topics":
Top 10 "big topics":
Snapshots:

82
Top 10 "big topics":
Snapshots:
Replication:

82
58
Top 10 "big topics":
Snapshots:
Replication:
Compaction:

82
58
54
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:

82
58
54
53
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:

82
58
54
53
44
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:

82
58
54
53
44
37
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:

82
58
54
53
44
37
34
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:

82
58
54
53
44
37
34
28
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:

82
58
54
53
44
37
34
28
23
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:

82
58
54
53
44
37
34
28
23
21
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:

82
58
54
53
44
37
34
28
23
21
416

(some overlap)

(305 functional, 111 non-functional)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Let's
Protobufs: dive
Security:
Bulk Loading:
Modularization:

82
58
54
53
44
37
in to
34
28
23
21
416

(some overlap)

the top 3.

(305 functional, 111 non-functional)
Snapshots
The gist: Take advantage of the fact that files in HDFS are already immutable
to get fast "snapshots" of tables that you can roll back to. This is pretty tricky
when you consider HBase is a distributed system and you want a point in time.
Top contributors: Matteo B, Jonathan H, Ted Y, Jesse Y, Enis S
Main JIRAs:
● HBASE-6055 - Offline Snapshots: Take a snapshot after first disabling
the table
● HBASE-7290 - Online Snapshots: Take a snapshot of a live, running
table by splitting the memstore.
● HBASE-7360 - Backport Snapshots to 0.94
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Main JIRAs:
● HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
● HBASE-8207 - Data loss when machine name contains "-". Doh.
● HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Main JIRAs:
● HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
● HBASE-8207 - Data loss when machine name contains "-". Doh.
● HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!

Theme: corner cases!
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Main JIRAs:
● HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
● HBASE-8207 - Data loss when machine name contains "-". Doh.
● HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!

Theme: corner cases!
Plug: stick around next while Chris Trezzo tweets about Replication!!
Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Main JIRAs:
● HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
● HBASE-2231 - Compaction events should be written to HLog: deal with
the case when regions have been reassigned since compaction started.
Look for cool stuff to come in the next year with tiered (aka "leveled")
compaction policies, so you could do stuff like (e.g.) put "recent" data into
smaller files that'll be hit frequently, and the older "long tail" data into bigger
files that'll be hit less frequently.
Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Main JIRAs:
● HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
● HBASE-2231 - Compaction events should be written to HLog: deal with
corner case!
the case when regions have been reassigned since compaction started.
Look for cool stuff to come in the next year with tiered (aka "leveled")
compaction policies, so you could do stuff like (e.g.) put "recent" data into
smaller files that'll be hit frequently, and the older "long tail" data into bigger
files that'll be hit less frequently.
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics
Assignment
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
416

(some overlap)

(305 functional, 111 non-functional)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
416 (305 functional, 111 non-functional)
(some overlap)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
416 (305 functional, 111 non-functional)
(some overlap)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs
Security
Bulk Loading
Modularization
416 (305 functional, 111 non-functional)
(some overlap)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security
Bulk Loading
Modularization
416 (305 functional, 111 non-functional)
(some overlap)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading
Modularization
416 (305 functional, 111 non-functional)
(some overlap)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading: pop in an HFile.
Modularization
416 (305 functional, 111 non-functional)
(some overlap)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading: pop in an HFile.
Modularization: break up the code.
416 (305 functional, 111 non-functional)
(some overlap)
Now on to the
"Indie Hits JIRAs".
What's left? About half.
1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining

Blocker:
Critical:
Major:
Minor:
Trivial:

31
88
455
206
52
830
What's left? About half.
1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining

Let's cut out these:

Blocker:
31
Critical:
88
Major:
455
Minor:
206
Trivial:
52
830 573
We can't cover 573 issues.
Let's just hit a few cool ones.
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-5416

Improve perf of scans with some kinds of filters

By: Max Lapan for original idea & patch, Sergey Shelukhin for final impl

Interesting because: most commented JIRA (200+ human comments!)
What? Avoid loading non-essential CFs until after filters run, big perf gain.
How?
+++ Filter.java:
+ abstract public boolean isFamilyEssential (byte[] name);
+++ HRegion.java:
KeyValueScanner scanner = store.getScanner(scan, entry.getValue());
scanners.add(scanner);
+
if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
+
|| this.filter.isFamilyEssential(entry.getKey())) {
+
scanners.add(scanner);
+
} else {
+
joinedScanners .add(scanner);
+
}
200 comments? Srsly?
From whom?
To save you some time, allow
me to summarize.
Reenactment ...
Feb 2012:
● Max Lapan: Hey guys, here's a cool patch!
Reenactment ...
Feb 2012:
● Max Lapan: Hey guys, here's a cool patch!
● Nicolas S: This should be an app detail, not in core.
Reenactment ...
Feb 2012:
● Max Lapan: Hey guys, here's a cool patch!
● Nicolas S: This should be an app detail, not in core.
● Ted Yu: I fixed your typos while you were asleep!
Reenactment ...
Feb 2012:
● Max Lapan: Hey guys, here's a cool patch!
● Nicolas S: This should be an app detail, not in core.
● Ted Yu: I fixed your typos while you were asleep!
● Nick: Not enough utest coverage to put this in core.
● Max: Agree, but I can't find any other way to do this.
Reenactment ...
Feb 2012:
● Max Lapan: Hey guys, here's a cool patch!
● Nicolas S: This should be an app detail, not in core.
● Ted Yu: I fixed your typos while you were asleep!
● Nick: Not enough utest coverage to put this in core.
● Max: Agree, but I can't find any other way to do this.
● Kannan: Why don't you try 2-phase w/ multiget?
● Max: OK, ok, I'll try it.
Reenactment ...
May 2012:
● Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
Reenactment ...
May 2012:
● Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
Reenactment ...
May 2012:
● Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
● Ted: Holy guacamole that's a big patch.
Reenactment ...
May 2012:
● Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
● Ted: Holy guacamole that's a big patch.
July 2012:
● Max: Anybody there? Here's a perf test.
● Ted: Cool!
Reenactment ...
May 2012:
● Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
● Ted: Holy guacamole that's a big patch.
July 2012:
● Max: Anybody there? Here's a perf test.
● Ted: Cool!
Oct 2012:
● Anoop: A coprocessor would make faster.
● Max: We're on 0.90 and can't use CP.
● Stack: -1, FB guys are right about needing more tests.
Reenactment ...
Dec 2012:
● Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
Reenactment ...
Dec 2012:
● Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
● Stack: Still not enough tests. Some new code even
when disabled? Who's reviewing? Go easy lads.
Reenactment ...
Dec 2012:
● Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
● Stack: Still not enough tests. Some new code even
when disabled? Who's reviewing? Go easy lads.
● Ram: I'm on it. Couple improvements, but looks good.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
● Lars: Ooh, let's pull this into 0.94! I made a patch.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
● Lars: Ooh, let's pull this into 0.94! I made a patch.
● Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
● Lars: Ooh, let's pull this into 0.94! I made a patch.
● Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
● Ted: I optimized the disabled path.
● Lars: Sweet.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
● Lars: Ooh, let's pull this into 0.94! I made a patch.
● Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
● Ted: I optimized the disabled path.
● Lars: Sweet.
Reenactment ...
Jan, 2013:
● Ram: +1, let's commit.
● Ted: Committed to trunk
● Lars: Committed to 0.94.
Reenactment ...
Jan, 2013:
● Ram: +1, let's commit.
● Ted: Committed to trunk
● Lars: Committed to 0.94.
And there was much rejoi....
Reenactment ...
Feb, 2013:
● Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
Reenactment ...
Feb, 2013:
● Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
● All: Crapface.
Reenactment ...
Feb, 2013:
● Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
● All: Crapface.
● Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
Reenactment ...
Feb, 2013:
● Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
● All: Crapface.
● Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
● Lars: Filter is internal. Extend FilterBase maybe?
● Ted: If we take it OUT now, it's also a regression.
Reenactment ...
Feb, 2013:
● Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
● All: Crapface.
● Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
● Lars: Filter is internal. Extend FilterBase maybe?
● Ted: If we take it OUT now, it's also a regression.
● Dave: Chill dudes, we can fix by changing our client.
Reenactment ...
Feb, 2013:
● Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
● All: Crapface.
● Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
● Lars: Filter is internal. Extend FilterBase maybe?
● Ted: If we take it OUT now, it's also a regression.
● Dave: Chill dudes, we can fix by changing our client.
● All: Uhh ... change it? Keep it? Change it?
Reenactment ...
Feb, 2013:
● Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
● All: Crapface.
● Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
● Lars: Filter is internal. Extend FilterBase maybe?
● Ted: If we take it OUT now, it's also a regression.
● Dave: Chill dudes, we can fix by changing our client.
● All: Uhh ... change it? Keep it? Change it?
Resolution: Change it (HBASE-7920)
Moral of the story?
● JIRA comments are a great way to learn.
● Do the work to keep new features from
destabilizing core code paths.
● Careful with changing interfaces.
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-4676

Prefix Compression - Trie data block encoding

By: Matt Corgan

Interesting because: most watched (42 watchers), and biggest patch.
What? An optimization to compress what we store for key/value prefixes.
How? ~8000 new lines added! (Originally written in git repo, here)

At SFDC, James Taylor reported seeing 5-15x improvement in Phoenix,
with no degradation in scan performance. Woot!
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-7403

Online Merge

By: Chunhui Shen

Interesting because: It's a cool feature. And went through 33 revisions!
What? The ability to merge regions online and transactionally, just like we
do with splitting regions.
How? The master moves the regions together (on the same regionserver)
and send MERGE RPC to regionserver. Merge happens in a transaction.
Example:
RegionMergeTransaction mt = new RegionMergeTransaction(conf, parent,
midKey)
if (!mt.prepare(services)) return;
try {
mt.execute(server, services);
} catch (IOException ioe) {
try {
mt.rollback(server, services);
return;
} catch (RuntimeException e) {
myAbortable.abort("Failed merge, abort");
}
}
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-1212

Merge tool expects regions to have diff seq ids

By: Jean-Marc Spaggiari

Interesting because: Oldest issue (Feb, 2009) resolved w/ patch this year.
What? With aggregated hfile format, sequence id is written into file, not
along side. In rare case where two store files have same sequence id and
we want to merge the regions, it wouldn't work.
How? In conjucntion with HBASE-7287, removes the code that did this:
--- HRegion.java
List<StoreFile> srcFiles = es.getValue();
if (srcFiles.size() == 2) {
long seqA = srcFiles.get(0).getMaxSequenceId();
long seqB = srcFiles.get(1).getMaxSequenceId();
if (seqA == seqB) {
// Can't have same sequenceid since on open store, this is
what
// distingushes the files (see the map of stores how its keyed
by
// sequenceid).
throw new IOException("Files have same sequenceid: " + seqA);
}
}
HBASE-1212

Merge tool expects regions to have diff seq ids

By: Jean-Marc Spaggiari

Interesting because: Oldest issue (Feb, 2009) resolved w/ patch this year.
What? With aggregated hfile format, sequence id is written into file, not
e!
as
along side. In rare case where two store files have same sequence id and ner c
r
co
we want to merge the regions, it wouldn't work.
How? In conjucntion with HBASE-7287, removes the code that did this:
--- HRegion.java
List<StoreFile> srcFiles = es.getValue();
if (srcFiles.size() == 2) {
long seqA = srcFiles.get(0).getMaxSequenceId();
long seqB = srcFiles.get(1).getMaxSequenceId();
if (seqA == seqB) {
// Can't have same sequenceid since on open store, this is
what
// distingushes the files (see the map of stores how its keyed
by
// sequenceid).
throw new IOException("Files have same sequenceid: " + seqA);
}
}
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-7801

Allow a deferred sync option per Mutation

By: Lars Hofhansl

Interesting because: has durability implications worth blogging about.
What? Previously, you could only turn WAL writing off completely, per table
or edit. Now you can choose "none", "async", "sync" or "fsync".
How?
+++ Mutation.java
+ public void setDurability(Durability d) {
+
setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal()));
+
this.writeToWAL = d != Durability.SKIP_WAL;
+ }
+++ HRegion.java
+ private void syncOrDefer(long txid, Durability durability) {
+
switch(durability) { ...
+
case SKIP_WAL: // nothing to do
+
break;
+
case ASYNC_WAL: // defer the sync, unless we globally can't
+
if (this.deferredLogSyncDisabled) { this.log.sync(txid); }
+
break;
+
case SYNC_WAL:
+
case FSYNC_WAL:
+
// sync the WAL edit (SYNC and FSYNC treated the same for now)
+
this.log.sync(txid);
+
break;
+
}
HBASE-7801

Allow a deferred sync option per Mutation

By: Lars Hofhansl

Interesting because: has durability implications worth blogging about.
What? Previously, you could only turn WAL writing off completely, per table
or edit. Now you can choose "none", "async", "sync" or "fsync".
How?
+++ Mutation.java
+ public void setDurability(Durability d) {
+
setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal()));
+
this.writeToWAL = d != Durability.SKIP_WAL;
+ }
+++ HRegion.java
+ private void syncOrDefer(long txid, Durability durability) {
+
switch(durability) { ...
+
case SKIP_WAL: // nothing to do
+
break;
+
case ASYNC_WAL: // defer the sync, unless we globally can't
+
if (this.deferredLogSyncDisabled) { this.log.sync(txid); }
Wha ... ?
+
break;
Oh. See HADOOP-6313
+
case SYNC_WAL:
+
case FSYNC_WAL:
+
// sync the WAL edit (SYNC and FSYNC treated the same for now)
+
this.log.sync(txid);
+
break;
+
}
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-4072

Disable reading zoo.cfg files

By: Harsh J

Interesting because: Biggest facepalm.
What? Used to be, if two system both use ZK and one needed to override
values, the zoo.cfg values would always win. Caused a lot of goofy bugs in
hbase utils like import/export, integration with other systems like flume.
How? Put reading it behind a config that defaults to false.
+
+
+
+

if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) {
LOG.warn(
"Parsing zoo.cfg is deprecated. Place all ZK related HBase " +
"configuration under the hbase-site.xml");
HBASE-4072

Disable reading zoo.cfg files

By: Harsh J

Interesting because: Biggest facepalm.
What? Used to be, if two system both use ZK and one needed to override
e!
as
values, the zoo.cfg values would always win. Caused a lot of goofy bugs in r c
ne
or
hbase utils like import/export, integration with other systems like flume.
c
How? Put reading it behind a config that defaults to false.
+
+
+
+

if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) {
LOG.warn(
"Parsing zoo.cfg is deprecated. Place all ZK related HBase " +
"configuration under the hbase-site.xml");
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-3171

Drop ROOT, store META location in ZooKeeper

By: J-D Cryans

Interesting because: Only HBase JIRA with a downfall parody.
What? The ROOT just tells you where the META table is. That's silly.
How? Pretty big patch (59 files changed, 580 insertions(+), 1749 deletions(-))

http://www.youtube.com/watch?v=tuM9MYDssvg
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-6868

Avoid double checksumming blocks

By: Lars Hofhansl

Interesting because: tiny fix, but marked as a blocker, and sunk 0.94.2 RC1.
What? since HBASE-5074 (checksums), sometimes we double checksum.
How? 3 line patch to default to skip checksum if not local fs.

r

ne
or

c

+++ HFileSystem.java
// Incorrect data is read and HFileBlocks won't be able to read
// their header magic numbers. See HBASE-5885
if (useHBaseChecksum && !(fs instanceof LocalFileSystem)) {
+
conf = new Configuration(conf);
+
conf.setBoolean("dfs.client.read.shortcircuit.skip.checksum",
true);
this.noChecksumFs = newInstanceFileSystem(conf);
...
+++ HRegionServer.java
// If hbase checksum verification enabled, automatically
//switch off hdfs checksum verification.
this.useHBaseChecksum = conf.getBoolean(
HConstants.HBASE_CHECKSUM_VERIFICATION, true);
+
HConstants.HBASE_CHECKSUM_VERIFICATION, false);

e!

s
ca
What's it all mean?
Active codebase. Good!
Complexity increasing. Bad!

credit: https://www.ohloh.net/p/hbase
One more interesting stat:
One more interesting stat:
"Good on you"s
One more interesting stat:
"Good on you"s
stack

everyone
else
Takeaways?
Busy community.
New features!
Fixing corner cases.
BTW: How did I do this?
JIRA API +
Phoenix on HBase +
http://github.com/ivarley/jirachi
Thanks!
@thefutureian

More Related Content

What's hot

PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesFederico Campoli
 
RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)Daniele Dell'Aglio
 
Caching Search Engine Results over Incremental Indices
Caching Search Engine Results over Incremental IndicesCaching Search Engine Results over Incremental Indices
Caching Search Engine Results over Incremental IndicesRoi Blanco
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Brendan Tierney
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoopyaevents
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
Nyoug delphix slideshare
Nyoug delphix slideshareNyoug delphix slideshare
Nyoug delphix slideshareKyle Hailey
 
DAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceDAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceAmazon Web Services
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLFederico Campoli
 
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudyDissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudySalman Baset
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Hadoop / Spark Conference Japan
 
White Paper: Scaling Servers and Storage for Film Assets
White Paper: Scaling Servers and Storage for Film AssetsWhite Paper: Scaling Servers and Storage for Film Assets
White Paper: Scaling Servers and Storage for Film AssetsPerforce
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and PainThe Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and PainRafał Wojdyła
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Chris Fregly
 

What's hot (20)

PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databases
 
RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)
 
Caching Search Engine Results over Incremental Indices
Caching Search Engine Results over Incremental IndicesCaching Search Engine Results over Incremental Indices
Caching Search Engine Results over Incremental Indices
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Nyoug delphix slideshare
Nyoug delphix slideshareNyoug delphix slideshare
Nyoug delphix slideshare
 
DAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceDAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL Performance
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQL
 
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudyDissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
White Paper: Scaling Servers and Storage for Film Assets
White Paper: Scaling Servers and Storage for Film AssetsWhite Paper: Scaling Servers and Storage for Film Assets
White Paper: Scaling Servers and Storage for Film Assets
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and PainThe Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 

Similar to 1500 JIRAs in 20 minutes - HBaseCon 2013

NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Data Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloningData Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloning Kyle Hailey
 
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Philip Stehlik
 
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)lakeFS
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideDanairat Thanabodithammachari
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?Hortonworks
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Yahoo Developer Network
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 
Refactoring to Scala DSLs and LiftOff 2009 Recap
Refactoring to Scala DSLs and LiftOff 2009 RecapRefactoring to Scala DSLs and LiftOff 2009 Recap
Refactoring to Scala DSLs and LiftOff 2009 RecapDave Orme
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit
 

Similar to 1500 JIRAs in 20 minutes - HBaseCon 2013 (20)

NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Data Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloningData Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloning
 
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
 
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Refactoring to Scala DSLs and LiftOff 2009 Recap
Refactoring to Scala DSLs and LiftOff 2009 RecapRefactoring to Scala DSLs and LiftOff 2009 Recap
Refactoring to Scala DSLs and LiftOff 2009 Recap
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on Hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 

Recently uploaded

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

1500 JIRAs in 20 minutes - HBaseCon 2013

  • 1. 1500 JIRAs in 20 Minutes The Evolution of HBase, 2012-2013 Ian Varley, Salesforce.com @thefutureian
  • 2. It's been a year since the first HBaseCon. What's changed?
  • 3. It's been a year since the first HBaseCon. What's changed? (besides my beard length)
  • 4. One lens on the evolution of HBase is through JIRA (issue tracking system).
  • 5. HBase has a lot of activity.
  • 6. HBase has a lot of activity. Total JIRAs, all time: ~8700
  • 7. HBase has a lot of activity. Total JIRAs, all time: ~8700 Opened in last year: ~2500
  • 8.
  • 9. HBase has a lot of activity. Total JIRAs, all time: ~8700 Opened in last year: ~2500 Fixed in last year: 1638
  • 10. HBase has a lot of activity. Total JIRAs, all time: ~8700 Opened in last year: ~2500 Fixed in last year: 1638 resolved >= 2012-05-23 AND resolved <= 2013-05-24 AND resolution in (Fixed, Implemented)
  • 11. So we're going to talk about them all. One by one.
  • 12.
  • 13. We need to narrow it down.
  • 14. First, let's get rid of the nonfunctional changes:
  • 15. First, let's get rid of the nonfunctional changes: Test: 307
  • 16. First, let's get rid of the nonfunctional changes: Test: 307 Build: 55
  • 17. First, let's get rid of the nonfunctional changes: Test: 307 Build: 55 Doc: 107
  • 18. First, let's get rid of the nonfunctional changes: Test: 307 Build: 55 Doc: 107 Ports: 62
  • 19. First, let's get rid of the nonfunctional changes: Test: 307 Build: 55 Doc: 107 Ports: 62 Total: 503 (some overlap)
  • 20. First, let's get rid of the nonfunctional changes: Test: 307 Build: 55 Doc: 107 Ports: 62 Total: 503 (some overlap) "test", "junit", etc. "pom", "classpath", "mvn", "build", etc. "book", "[site]", "[refGuide]", "javadoc", etc. "backport", "forward port", etc.
  • 21. That leaves 1135 functional changes to go over. (In 18 minutes.)
  • 22. Break what's left into 2 parts: ● Big Topics (20+ JIRAs on same issue) ● Indie Hits (Cool for some other reason)
  • 23. Top 10 "big topics":
  • 24. Top 10 "big topics":
  • 25. Top 10 "big topics": Snapshots: 82
  • 26. Top 10 "big topics": Snapshots: Replication: 82 58
  • 27. Top 10 "big topics": Snapshots: Replication: Compaction: 82 58 54
  • 28. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: 82 58 54 53
  • 29. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: 82 58 54 53 44
  • 30. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: 82 58 54 53 44 37
  • 31. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: 82 58 54 53 44 37 34
  • 32. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: 82 58 54 53 44 37 34 28
  • 33. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: 82 58 54 53 44 37 34 28 23
  • 34. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: Modularization: 82 58 54 53 44 37 34 28 23 21
  • 35. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: Modularization: 82 58 54 53 44 37 34 28 23 21 416 (some overlap) (305 functional, 111 non-functional)
  • 36. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Let's Protobufs: dive Security: Bulk Loading: Modularization: 82 58 54 53 44 37 in to 34 28 23 21 416 (some overlap) the top 3. (305 functional, 111 non-functional)
  • 37. Snapshots The gist: Take advantage of the fact that files in HDFS are already immutable to get fast "snapshots" of tables that you can roll back to. This is pretty tricky when you consider HBase is a distributed system and you want a point in time. Top contributors: Matteo B, Jonathan H, Ted Y, Jesse Y, Enis S Main JIRAs: ● HBASE-6055 - Offline Snapshots: Take a snapshot after first disabling the table ● HBASE-7290 - Online Snapshots: Take a snapshot of a live, running table by splitting the memstore. ● HBASE-7360 - Backport Snapshots to 0.94
  • 38. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H Main JIRAs: ● HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. ● HBASE-8207 - Data loss when machine name contains "-". Doh. ● HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter!
  • 39. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H Main JIRAs: ● HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. ● HBASE-8207 - Data loss when machine name contains "-". Doh. ● HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter! Theme: corner cases!
  • 40. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H Main JIRAs: ● HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. ● HBASE-8207 - Data loss when machine name contains "-". Doh. ● HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter! Theme: corner cases! Plug: stick around next while Chris Trezzo tweets about Replication!!
  • 41. Compaction The gist: In an LSM store, if you don't compact the store files, you end up with lots of 'em, which makes reads slower. Not a new feature, just improvements. Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y Main JIRAs: ● HBASE-7516 - Make compaction policy pluggable: allow users to customize which files are included for compaction. ● HBASE-2231 - Compaction events should be written to HLog: deal with the case when regions have been reassigned since compaction started. Look for cool stuff to come in the next year with tiered (aka "leveled") compaction policies, so you could do stuff like (e.g.) put "recent" data into smaller files that'll be hit frequently, and the older "long tail" data into bigger files that'll be hit less frequently.
  • 42. Compaction The gist: In an LSM store, if you don't compact the store files, you end up with lots of 'em, which makes reads slower. Not a new feature, just improvements. Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y Main JIRAs: ● HBASE-7516 - Make compaction policy pluggable: allow users to customize which files are included for compaction. ● HBASE-2231 - Compaction events should be written to HLog: deal with corner case! the case when regions have been reassigned since compaction started. Look for cool stuff to come in the next year with tiered (aka "leveled") compaction policies, so you could do stuff like (e.g.) put "recent" data into smaller files that'll be hit frequently, and the older "long tail" data into bigger files that'll be hit less frequently.
  • 43. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics Assignment Hadoop 2 Protobufs Security Bulk Loading Modularization 416 (some overlap) (305 functional, 111 non-functional)
  • 44. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment Hadoop 2 Protobufs Security Bulk Loading Modularization 416 (305 functional, 111 non-functional) (some overlap)
  • 45. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2 Protobufs Security Bulk Loading Modularization 416 (305 functional, 111 non-functional) (some overlap)
  • 46. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs Security Bulk Loading Modularization 416 (305 functional, 111 non-functional) (some overlap)
  • 47. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security Bulk Loading Modularization 416 (305 functional, 111 non-functional) (some overlap)
  • 48. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading Modularization 416 (305 functional, 111 non-functional) (some overlap)
  • 49. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading: pop in an HFile. Modularization 416 (305 functional, 111 non-functional) (some overlap)
  • 50. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading: pop in an HFile. Modularization: break up the code. 416 (305 functional, 111 non-functional) (some overlap)
  • 51. Now on to the "Indie Hits JIRAs".
  • 52. What's left? About half. 1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining Blocker: Critical: Major: Minor: Trivial: 31 88 455 206 52 830
  • 53. What's left? About half. 1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining Let's cut out these: Blocker: 31 Critical: 88 Major: 455 Minor: 206 Trivial: 52 830 573
  • 54. We can't cover 573 issues. Let's just hit a few cool ones.
  • 57. HBASE-5416 Improve perf of scans with some kinds of filters By: Max Lapan for original idea & patch, Sergey Shelukhin for final impl Interesting because: most commented JIRA (200+ human comments!) What? Avoid loading non-essential CFs until after filters run, big perf gain. How? +++ Filter.java: + abstract public boolean isFamilyEssential (byte[] name); +++ HRegion.java: KeyValueScanner scanner = store.getScanner(scan, entry.getValue()); scanners.add(scanner); + if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand() + || this.filter.isFamilyEssential(entry.getKey())) { + scanners.add(scanner); + } else { + joinedScanners .add(scanner); + }
  • 60. To save you some time, allow me to summarize.
  • 61. Reenactment ... Feb 2012: ● Max Lapan: Hey guys, here's a cool patch!
  • 62. Reenactment ... Feb 2012: ● Max Lapan: Hey guys, here's a cool patch! ● Nicolas S: This should be an app detail, not in core.
  • 63. Reenactment ... Feb 2012: ● Max Lapan: Hey guys, here's a cool patch! ● Nicolas S: This should be an app detail, not in core. ● Ted Yu: I fixed your typos while you were asleep!
  • 64. Reenactment ... Feb 2012: ● Max Lapan: Hey guys, here's a cool patch! ● Nicolas S: This should be an app detail, not in core. ● Ted Yu: I fixed your typos while you were asleep! ● Nick: Not enough utest coverage to put this in core. ● Max: Agree, but I can't find any other way to do this.
  • 65. Reenactment ... Feb 2012: ● Max Lapan: Hey guys, here's a cool patch! ● Nicolas S: This should be an app detail, not in core. ● Ted Yu: I fixed your typos while you were asleep! ● Nick: Not enough utest coverage to put this in core. ● Max: Agree, but I can't find any other way to do this. ● Kannan: Why don't you try 2-phase w/ multiget? ● Max: OK, ok, I'll try it.
  • 66. Reenactment ... May 2012: ● Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom.
  • 67. Reenactment ... May 2012: ● Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom.
  • 68. Reenactment ... May 2012: ● Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. ● Ted: Holy guacamole that's a big patch.
  • 69. Reenactment ... May 2012: ● Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. ● Ted: Holy guacamole that's a big patch. July 2012: ● Max: Anybody there? Here's a perf test. ● Ted: Cool!
  • 70. Reenactment ... May 2012: ● Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. ● Ted: Holy guacamole that's a big patch. July 2012: ● Max: Anybody there? Here's a perf test. ● Ted: Cool! Oct 2012: ● Anoop: A coprocessor would make faster. ● Max: We're on 0.90 and can't use CP. ● Stack: -1, FB guys are right about needing more tests.
  • 71. Reenactment ... Dec 2012: ● Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests.
  • 72. Reenactment ... Dec 2012: ● Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests. ● Stack: Still not enough tests. Some new code even when disabled? Who's reviewing? Go easy lads.
  • 73. Reenactment ... Dec 2012: ● Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests. ● Stack: Still not enough tests. Some new code even when disabled? Who's reviewing? Go easy lads. ● Ram: I'm on it. Couple improvements, but looks good.
  • 74. Reenactment ... Dec 31st, 2012 (while everyone else is partying): ● Lars: Ooh, let's pull this into 0.94! I made a patch.
  • 75. Reenactment ... Dec 31st, 2012 (while everyone else is partying): ● Lars: Ooh, let's pull this into 0.94! I made a patch. ● Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%.
  • 76. Reenactment ... Dec 31st, 2012 (while everyone else is partying): ● Lars: Ooh, let's pull this into 0.94! I made a patch. ● Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%. ● Ted: I optimized the disabled path. ● Lars: Sweet.
  • 77. Reenactment ... Dec 31st, 2012 (while everyone else is partying): ● Lars: Ooh, let's pull this into 0.94! I made a patch. ● Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%. ● Ted: I optimized the disabled path. ● Lars: Sweet.
  • 78. Reenactment ... Jan, 2013: ● Ram: +1, let's commit. ● Ted: Committed to trunk ● Lars: Committed to 0.94.
  • 79. Reenactment ... Jan, 2013: ● Ram: +1, let's commit. ● Ted: Committed to trunk ● Lars: Committed to 0.94. And there was much rejoi....
  • 80. Reenactment ... Feb, 2013: ● Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter.
  • 81. Reenactment ... Feb, 2013: ● Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. ● All: Crapface.
  • 82. Reenactment ... Feb, 2013: ● Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. ● All: Crapface. ● Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI.
  • 83. Reenactment ... Feb, 2013: ● Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. ● All: Crapface. ● Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. ● Lars: Filter is internal. Extend FilterBase maybe? ● Ted: If we take it OUT now, it's also a regression.
  • 84. Reenactment ... Feb, 2013: ● Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. ● All: Crapface. ● Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. ● Lars: Filter is internal. Extend FilterBase maybe? ● Ted: If we take it OUT now, it's also a regression. ● Dave: Chill dudes, we can fix by changing our client.
  • 85. Reenactment ... Feb, 2013: ● Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. ● All: Crapface. ● Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. ● Lars: Filter is internal. Extend FilterBase maybe? ● Ted: If we take it OUT now, it's also a regression. ● Dave: Chill dudes, we can fix by changing our client. ● All: Uhh ... change it? Keep it? Change it?
  • 86. Reenactment ... Feb, 2013: ● Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. ● All: Crapface. ● Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. ● Lars: Filter is internal. Extend FilterBase maybe? ● Ted: If we take it OUT now, it's also a regression. ● Dave: Chill dudes, we can fix by changing our client. ● All: Uhh ... change it? Keep it? Change it? Resolution: Change it (HBASE-7920)
  • 87. Moral of the story? ● JIRA comments are a great way to learn. ● Do the work to keep new features from destabilizing core code paths. ● Careful with changing interfaces.
  • 89. HBASE-4676 Prefix Compression - Trie data block encoding By: Matt Corgan Interesting because: most watched (42 watchers), and biggest patch. What? An optimization to compress what we store for key/value prefixes. How? ~8000 new lines added! (Originally written in git repo, here) At SFDC, James Taylor reported seeing 5-15x improvement in Phoenix, with no degradation in scan performance. Woot!
  • 91. HBASE-7403 Online Merge By: Chunhui Shen Interesting because: It's a cool feature. And went through 33 revisions! What? The ability to merge regions online and transactionally, just like we do with splitting regions. How? The master moves the regions together (on the same regionserver) and send MERGE RPC to regionserver. Merge happens in a transaction. Example: RegionMergeTransaction mt = new RegionMergeTransaction(conf, parent, midKey) if (!mt.prepare(services)) return; try { mt.execute(server, services); } catch (IOException ioe) { try { mt.rollback(server, services); return; } catch (RuntimeException e) { myAbortable.abort("Failed merge, abort"); } }
  • 93. HBASE-1212 Merge tool expects regions to have diff seq ids By: Jean-Marc Spaggiari Interesting because: Oldest issue (Feb, 2009) resolved w/ patch this year. What? With aggregated hfile format, sequence id is written into file, not along side. In rare case where two store files have same sequence id and we want to merge the regions, it wouldn't work. How? In conjucntion with HBASE-7287, removes the code that did this: --- HRegion.java List<StoreFile> srcFiles = es.getValue(); if (srcFiles.size() == 2) { long seqA = srcFiles.get(0).getMaxSequenceId(); long seqB = srcFiles.get(1).getMaxSequenceId(); if (seqA == seqB) { // Can't have same sequenceid since on open store, this is what // distingushes the files (see the map of stores how its keyed by // sequenceid). throw new IOException("Files have same sequenceid: " + seqA); } }
  • 94. HBASE-1212 Merge tool expects regions to have diff seq ids By: Jean-Marc Spaggiari Interesting because: Oldest issue (Feb, 2009) resolved w/ patch this year. What? With aggregated hfile format, sequence id is written into file, not e! as along side. In rare case where two store files have same sequence id and ner c r co we want to merge the regions, it wouldn't work. How? In conjucntion with HBASE-7287, removes the code that did this: --- HRegion.java List<StoreFile> srcFiles = es.getValue(); if (srcFiles.size() == 2) { long seqA = srcFiles.get(0).getMaxSequenceId(); long seqB = srcFiles.get(1).getMaxSequenceId(); if (seqA == seqB) { // Can't have same sequenceid since on open store, this is what // distingushes the files (see the map of stores how its keyed by // sequenceid). throw new IOException("Files have same sequenceid: " + seqA); } }
  • 96. HBASE-7801 Allow a deferred sync option per Mutation By: Lars Hofhansl Interesting because: has durability implications worth blogging about. What? Previously, you could only turn WAL writing off completely, per table or edit. Now you can choose "none", "async", "sync" or "fsync". How? +++ Mutation.java + public void setDurability(Durability d) { + setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal())); + this.writeToWAL = d != Durability.SKIP_WAL; + } +++ HRegion.java + private void syncOrDefer(long txid, Durability durability) { + switch(durability) { ... + case SKIP_WAL: // nothing to do + break; + case ASYNC_WAL: // defer the sync, unless we globally can't + if (this.deferredLogSyncDisabled) { this.log.sync(txid); } + break; + case SYNC_WAL: + case FSYNC_WAL: + // sync the WAL edit (SYNC and FSYNC treated the same for now) + this.log.sync(txid); + break; + }
  • 97. HBASE-7801 Allow a deferred sync option per Mutation By: Lars Hofhansl Interesting because: has durability implications worth blogging about. What? Previously, you could only turn WAL writing off completely, per table or edit. Now you can choose "none", "async", "sync" or "fsync". How? +++ Mutation.java + public void setDurability(Durability d) { + setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal())); + this.writeToWAL = d != Durability.SKIP_WAL; + } +++ HRegion.java + private void syncOrDefer(long txid, Durability durability) { + switch(durability) { ... + case SKIP_WAL: // nothing to do + break; + case ASYNC_WAL: // defer the sync, unless we globally can't + if (this.deferredLogSyncDisabled) { this.log.sync(txid); } Wha ... ? + break; Oh. See HADOOP-6313 + case SYNC_WAL: + case FSYNC_WAL: + // sync the WAL edit (SYNC and FSYNC treated the same for now) + this.log.sync(txid); + break; + }
  • 99. HBASE-4072 Disable reading zoo.cfg files By: Harsh J Interesting because: Biggest facepalm. What? Used to be, if two system both use ZK and one needed to override values, the zoo.cfg values would always win. Caused a lot of goofy bugs in hbase utils like import/export, integration with other systems like flume. How? Put reading it behind a config that defaults to false. + + + + if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) { LOG.warn( "Parsing zoo.cfg is deprecated. Place all ZK related HBase " + "configuration under the hbase-site.xml");
  • 100. HBASE-4072 Disable reading zoo.cfg files By: Harsh J Interesting because: Biggest facepalm. What? Used to be, if two system both use ZK and one needed to override e! as values, the zoo.cfg values would always win. Caused a lot of goofy bugs in r c ne or hbase utils like import/export, integration with other systems like flume. c How? Put reading it behind a config that defaults to false. + + + + if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) { LOG.warn( "Parsing zoo.cfg is deprecated. Place all ZK related HBase " + "configuration under the hbase-site.xml");
  • 102. HBASE-3171 Drop ROOT, store META location in ZooKeeper By: J-D Cryans Interesting because: Only HBase JIRA with a downfall parody. What? The ROOT just tells you where the META table is. That's silly. How? Pretty big patch (59 files changed, 580 insertions(+), 1749 deletions(-)) http://www.youtube.com/watch?v=tuM9MYDssvg
  • 104. HBASE-6868 Avoid double checksumming blocks By: Lars Hofhansl Interesting because: tiny fix, but marked as a blocker, and sunk 0.94.2 RC1. What? since HBASE-5074 (checksums), sometimes we double checksum. How? 3 line patch to default to skip checksum if not local fs. r ne or c +++ HFileSystem.java // Incorrect data is read and HFileBlocks won't be able to read // their header magic numbers. See HBASE-5885 if (useHBaseChecksum && !(fs instanceof LocalFileSystem)) { + conf = new Configuration(conf); + conf.setBoolean("dfs.client.read.shortcircuit.skip.checksum", true); this.noChecksumFs = newInstanceFileSystem(conf); ... +++ HRegionServer.java // If hbase checksum verification enabled, automatically //switch off hdfs checksum verification. this.useHBaseChecksum = conf.getBoolean( HConstants.HBASE_CHECKSUM_VERIFICATION, true); + HConstants.HBASE_CHECKSUM_VERIFICATION, false); e! s ca
  • 105. What's it all mean? Active codebase. Good! Complexity increasing. Bad! credit: https://www.ohloh.net/p/hbase
  • 107. One more interesting stat: "Good on you"s
  • 108. One more interesting stat: "Good on you"s stack everyone else
  • 110. BTW: How did I do this? JIRA API + Phoenix on HBase + http://github.com/ivarley/jirachi