SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
Headline	
  Goes	
  Here	
  
Speaker	
  Name	
  or	
  Subhead	
  Goes	
  Here	
  
DO	
  NOT	
  USE	
  PUBLICLY	
  
PRIOR	
  TO	
  10/23/12	
  
Apache	
  HBase	
  –	
  Where	
  we’ve	
  been	
  
and	
  what’s	
  upcoming	
  
Jonathan	
  Hsieh	
  |	
  @jmhsieh	
  	
  
Tech	
  lead	
  /	
  SoMware	
  Engineer	
  at	
  Cloudera	
  |	
  HBase	
  PMC	
  Member	
  	
  
Hadoop	
  Users	
  Group	
  UK	
  
April	
  10,	
  2014	
  
4/10/14 Hadoop Users Group UK
Who	
  Am	
  I?	
  
•  Cloudera:	
  
•  Tech	
  Lead	
  HBase	
  Team	
  
•  So<ware	
  Engineer	
  
•  Apache	
  HBase	
  commiVer	
  /	
  PMC	
  	
  
•  Apache	
  Flume	
  founder	
  /	
  PMC	
  
•  U	
  of	
  Washington:	
  
•  Research	
  in	
  Distributed	
  Systems	
  
4/10/14 Hadoop Users Group UK
What	
  is	
  Apache	
  HBase?	
  
Apache	
  HBase	
  is	
  a	
  
reliable,	
  column-­‐
oriented	
  data	
  store	
  that	
  
provides	
  consistent,	
  low-­‐
latency,	
  random	
  read/
write	
  access.	
  
ZK	
   HDFS	
  
App	
   MR	
  
4/10/14 Hadoop Users Group UK
HBase	
  provides	
  Low-­‐latency	
  Random	
  Access	
  
•  Writes:	
  	
  
•  1-­‐3ms,	
  1k-­‐20k	
  writes/sec	
  per	
  node	
  
•  Reads:	
  	
  
•  0-­‐3ms	
  cached,	
  10-­‐30ms	
  disk	
  
•  10k-­‐40k	
  reads	
  /	
  second	
  /	
  node	
  from	
  
cache	
  
•  Cell	
  size:	
  	
  
•  0B-­‐3MB	
  	
  
•  Read,	
  write,	
  and	
  insert	
  data	
  
anywhere	
  in	
  the	
  table	
  
4/10/14 Hadoop Users Group UK
0000000000	
  
1111111111	
  
2222222222	
  
3333333333	
  
4444444444	
  
5555555555	
  
6666666666	
  
7777777777	
  
1	
  
2	
  
3	
  
4	
  
5	
  
Core	
  Properces	
  
•  ACID	
  guarantees	
  on	
  a	
  row	
  	
  
•  Writes	
  are	
  durable	
  
•  Strong	
  consistency	
  first,	
  then	
  availability	
  
•  AMer	
  failure,	
  recover	
  and	
  return	
  current	
  value	
  instead	
  of	
  returning	
  stale	
  value	
  
•  CAS	
  and	
  atomic	
  increments	
  can	
  be	
  efficient.	
  
•  Sorted	
  By	
  Primary	
  Key	
  	
  
•  Short	
  scans	
  are	
  efficient	
  
•  Parcconed	
  by	
  Primary	
  Key	
  
•  Log	
  Structured	
  Merged	
  Tree	
  
•  Writes	
  are	
  extremely	
  efficient	
  
•  Reads	
  are	
  efficient	
  
•  Periodic	
  layout	
  opcmizacons	
  for	
  read	
  opcmizacon	
  (“compaccons”)	
  required.	
  
4/10/14 Hadoop Users Group UK
An	
  HBase	
  History	
  
Where	
  We’ve	
  Been	
  
4/10/14 Hadoop Users Group UK
Jan	
  ‘12:	
  0.92.0	
  
Apache	
  HBase	
  Timeline	
  
4/10/14 Hadoop Users Group UK
2014	
  2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2013	
  2012	
  
Nov	
  ’06:	
  
Google	
  	
  
BigTable	
  OSDI	
  ‘06	
  
Apr	
  ‘07:	
  First	
  
Apache	
  HBase	
  
commit	
  as	
  
Hadoop	
  contrib	
  
project	
  
Apr	
  ‘10:	
  Apache	
  
HBase	
  becomes	
  
top	
  level	
  project	
   Oct	
  ‘13:	
  0.96.0	
  
Apr’11:	
  CDH3	
  GA	
  
with	
  HBase	
  0.90.1	
  
	
  
	
  
May	
  ‘12:	
  
HBaseCon	
  2012	
  
Jun	
  ‘13:	
  
HBaseCon	
  2013	
  
Jan‘08:	
  Promoted	
  
to	
  Hadoop	
  
subproject	
  
Feb	
  ‘13:	
  0.98.0	
  May	
  ‘12:	
  0.94.0	
  
Summer‘11:	
  	
  
Messages	
  	
  
on	
  HBase	
  	
  Summer	
  ‘09	
  
StumbleUpon	
  	
  
goes	
  produccon	
  on	
  
HBase	
  ~0.20	
  
Nov	
  ‘11:	
  	
  
Cassini	
  
on	
  HBase	
  
Jan	
  ‘13	
  
Phoenix	
  
on	
  HBase	
  
Summer‘11:	
  	
  Web	
  
Crawl	
  	
  
Cache	
  
Developer	
  Community	
  
•  Accve	
  community!	
  
•  Diverse	
  commiVers	
  
from	
  many	
  
organizacons	
  
4/10/14 Hadoop Users Group UK
Apache	
  HBase	
  “Nascar”	
  Slide	
  
4/10/14 Hadoop Users Group UK
Apache	
  HBase	
  Core	
  Development	
  
4/10/14 Hadoop Users Group UK
•  Vendors	
  
•  Self	
  Service	
  
Apache	
  HBase	
  Sample	
  Users	
  
4/10/14 Hadoop Users Group UK
•  Inbox	
  
•  Storage	
  
•  Web	
  
•  Search	
  
•  Analyccs	
  
•  Monitoring	
  	
  
	
  
Apache	
  HBase	
  Ecosystem	
  Projects	
  
4/10/14 Hadoop Users Group UK
What’s	
  here	
  and	
  new	
  today?	
  
Today:	
  Apache	
  0.96.2	
  /	
  0.98.1	
  
4/10/14 Hadoop Users Group UK
Criccal	
  Features	
  
Disaster	
  Recovery	
  
•  Cluster	
  Replicacon	
  
•  Table	
  Snapshots	
  
•  Copy	
  Table	
  
•  Import	
  /	
  Export	
  Tables	
  
•  Metadata	
  Corrupcon	
  repair	
  
tool	
  (hbck)	
  
AdministraMve	
  and	
  ConMnuity	
  
•  Kerberos	
  based	
  Authenccacon	
  
•  ACL	
  based	
  Authorizacon	
  
•  Config	
  change	
  via	
  rolling	
  restart.	
  
•  Within	
  version	
  rolling	
  upgrade.	
  
•  Protobuf	
  based	
  wire	
  protocol	
  
for	
  RPC	
  future	
  proofing	
  
4/10/14 Hadoop Users Group UK
Hardened	
  for	
  0.96	
  
Table	
  AdministraMon	
  
•  Online	
  Schema	
  change	
  
•  Online	
  Region	
  Merging	
  
•  Concnuous	
  fault	
  injeccon	
  
tescng	
  with	
  “Chaos	
  Monkey”	
  
Performance	
  Tuning	
  
•  Alternate	
  key	
  encodings	
  for	
  
efficient	
  memory	
  usage	
  
•  Exploring	
  Compactor	
  policy	
  
minimizes	
  compaccon	
  storms	
  
•  Smart	
  and	
  Adapcve	
  Stochascc	
  
region	
  load	
  balancer	
  
•  Fast	
  split	
  policy	
  for	
  new	
  tables	
  
4/10/14 Hadoop Users Group UK
MR	
  over	
  Table	
  Snapshots	
  (0.98,	
  CDH5.0)	
  
•  Previously	
  MapReduce	
  jobs	
  
over	
  HBase	
  required	
  online	
  full	
  
table	
  scan	
  
•  Idea:	
  Take	
  	
  a	
  snapshot	
  and	
  run	
  
MR	
  job	
  over	
  snapshot	
  files	
  
•  Doesn’t	
  use	
  HBase	
  client	
  
•  Avoid	
  affeccng	
  HBase	
  caches	
  	
  
•  3-­‐5x	
  perf	
  boost.	
  
4/10/14 Hadoop Users Group UK
map	
  
map	
  
map	
  
map	
  
map	
  
map	
  
map	
  
map	
  
reduce	
  
reduce	
  
reduce	
  
map	
  
map	
  
map	
  
map	
  
map	
  
map	
  
map	
  
map	
  
reduce	
  
reduce	
  
reduce	
  
snapshot	
  
Mean	
  Time	
  to	
  Recovery	
  (MTTR)	
  
•  Machine	
  failures	
  happen	
  in	
  distributed	
  systems	
  
•  Average	
  unavailability	
  when	
  automaccally	
  recovering	
  from	
  a	
  
failure.	
  
•  Recovery	
  cme	
  for	
  a	
  unclean	
  data	
  center	
  power	
  cycle	
  
4/10/14 Hadoop Users Group UK
recovered	
  nocfy	
  repair	
  detect	
  
Region	
  
unavailable	
  
Region	
  available	
  
client	
  aware	
  
Region	
  available	
  
client	
  unaware	
  
Fast	
  nocficacon	
  and	
  deteccon	
  (0.96)	
  
•  Proaccve	
  nocficacon	
  of	
  HMaster	
  failure	
  (0.96)	
  
•  Proaccve	
  nocficacon	
  of	
  RS	
  failure	
  (0.96)	
  
•  Nocfy	
  client	
  on	
  recovery	
  (0.96)	
  
•  Fast	
  server	
  failover	
  (Hardware)	
  
4/10/14 Hadoop Users Group UK
recovered	
  replay	
  assign	
  split	
  
Region	
  
unavailable	
  
Region	
  available	
  	
  
for	
  RW	
  
hdfs	
   hdfs	
  
detect	
  
hdfs	
  
•  Previously	
  had	
  two	
  IO	
  intensive	
  passes:	
  
•  Log	
  splitng	
  to	
  intermediate	
  files	
  
•  Assign	
  and	
  log	
  replay	
  
•  Now	
  just	
  one	
  IO	
  heavy	
  pass:	
  	
  Assign	
  first,	
  then	
  split+replay.	
  
•  Improves	
  read	
  and	
  write	
  recovery	
  cmes.	
  
•  Off	
  by	
  default	
  currently*.	
  
Distributed	
  log	
  replay	
  (experimental	
  0.96)	
  
4/10/14 Hadoop Users Group UK
recovered	
  split	
  +	
  replay	
  assign	
  
Region	
  
unavailable	
  
Region	
  available	
  	
  
for	
  RW	
  
Region	
  available	
  	
  
for	
  replay	
  writes	
  
hdfs	
  
detect	
  
*Caveat:	
  If	
  you	
  override	
  cme	
  stamps	
  you	
  could	
  have	
  	
  
READ	
  REPEATED	
  isolacon	
  violacons	
  (use	
  tags	
  to	
  fix	
  this)	
  
Cell	
  Tags	
  (0.98	
  experimental)	
  
•  Mechanism	
  for	
  aVaching	
  arbitrary	
  metadata	
  to	
  
Cells.	
  	
  
•  Mocvacon:	
  Finer-­‐grained	
  isolacon	
  
•  Use	
  for	
  Accumulo-­‐style	
  cell-­‐level	
  visibility	
  	
  
•  Main	
  feature	
  for	
  0.98	
  
•  Other	
  uses:	
  
•  Add	
  sequence	
  numbers	
  to	
  enable	
  correct	
  fast	
  
read/write	
  recovery	
  
•  Potencal	
  for	
  schema	
  tags	
  
4/10/14 Hadoop Users Group UK
Htrace	
  (0.96	
  experimental)	
  
•  Problem:	
  
•  Where	
  is	
  cme	
  being	
  spent	
  inside	
  HBase?	
  	
  	
  
•  Solucon:	
  HTrace	
  Framework	
  
•  Inspired	
  by	
  Google	
  Dapper	
  
•  Threaded	
  through	
  HBase	
  and	
  HDFS	
  	
  
•  Tracks	
  cme	
  spent	
  in	
  calls	
  in	
  a	
  distributed	
  system	
  by	
  tracking	
  spans*	
  
on	
  different	
  machines.	
  
*Some	
  assembly	
  scll	
  required.	
  
4/10/14 Hadoop Users Group UK
HTrace:	
  Distributed	
  Tracing	
  in	
  HBase	
  and	
  HDFS	
  
•  Framework	
  Inspired	
  by	
  
Google	
  Dapper	
  
•  Tracks	
  cme	
  spent	
  in	
  calls	
  
in	
  RPCs	
  across	
  different	
  
machines.	
  
•  Threaded	
  through	
  HBase	
  
(0.96)	
  and	
  future	
  HDFS.	
  
4/10/14 Hadoop Users Group UK
HBase	
  	
  
RS	
  
1	
  
HDFS	
  
DN	
  ZK	
  
HBase	
  
Client	
  
HBase	
  
meta	
  
HDFS	
  	
  	
  
NN	
  
A	
  span	
  
RPC	
  calls	
  
Zipkin	
  –	
  Visualizing	
  Spans	
  
•  UI	
  +	
  Visualizacon	
  System	
  
•  WriVen	
  by	
  TwiVer	
  
•  Zipkin	
  HBase	
  Storage	
  
•  Zipkin	
  HTrace	
  integracon	
  
•  View	
  where	
  cme	
  from	
  a	
  
specific	
  call	
  is	
  spent	
  in	
  
HBase,	
  HDFS,	
  and	
  ZK.	
  
4/10/14 Hadoop Users Group UK
A	
  Future	
  HBase	
  
What’s	
  Upcoming	
  
4/10/14 Hadoop Users Group UK
Outline	
  
•  Improved	
  Mean	
  cme	
  to	
  recovery	
  (MTTR)	
  
•  Improved	
  Predictability	
  
•  Improved	
  Usability	
  
•  Improved	
  Mulctenancy	
  
4/10/14 Hadoop Users Group UK
Faster	
  read	
  recovery	
  
Improving	
  MTTR	
  Further	
  	
  
4/10/14 Hadoop Users Group UK
•  Previously	
  had	
  two	
  IO	
  intensive	
  passes:	
  
•  Log	
  splitng	
  to	
  intermediate	
  files	
  
•  Assign	
  and	
  log	
  replay	
  
•  Now	
  just	
  one	
  IO	
  heavy	
  pass:	
  	
  Assign	
  first,	
  then	
  split+replay.	
  
•  Improves	
  read	
  and	
  write	
  recovery	
  cmes.	
  
•  Off	
  by	
  default	
  currently*.	
  
Distributed	
  log	
  replay	
  (experimental	
  0.96)	
  
4/10/14 Hadoop Users Group UK
recovered	
  split	
  +	
  replay	
  assign	
  
Region	
  
unavailable	
  
Region	
  available	
  	
  
for	
  RW	
  
Region	
  available	
  	
  
for	
  replay	
  writes	
  
hdfs	
  
detect	
  
*Caveat:	
  If	
  you	
  override	
  cme	
  stamps	
  you	
  could	
  have	
  	
  
READ	
  REPEATED	
  isolacon	
  violacons	
  (use	
  tags	
  to	
  fix	
  this)	
  
recovered	
  split	
  +	
  replay	
  
Distributed	
  log	
  replay	
  with	
  fast	
  write	
  recovery	
  
4/10/14 Hadoop Users Group UK
assign	
  
Region	
  
unavailable	
  
Region	
  available	
  	
  
for	
  RW	
  
Region	
  available	
  	
  
for	
  all	
  writes	
  
hdfs	
  
detect	
  
•  Writes	
  in	
  HBase	
  do	
  not	
  incur	
  reads.	
  
•  With	
  distributed	
  log	
  replay,	
  we’ve	
  already	
  have	
  regions	
  open	
  
for	
  write.	
  
•  Allow	
  fresh	
  writes	
  while	
  replaying	
  old	
  logs*.	
  
*Caveat:	
  If	
  you	
  override	
  cme	
  stamps	
  you	
  could	
  have	
  	
  
READ	
  REPEATED	
  isolacon	
  violacons	
  (use	
  tags	
  to	
  fix	
  this)	
  
Fast	
  Read	
  Recovery	
  (proposed)	
  
•  Idea:	
  Priscne	
  Region	
  fast	
  read	
  recovery	
  
•  If	
  region	
  not	
  edited	
  it	
  is	
  consistent	
  and	
  can	
  recover	
  RW	
  immediately	
  	
  
•  Idea:	
  Shadow	
  Regions	
  for	
  fast	
  read	
  recovery	
  
•  Shadow	
  region	
  tails	
  the	
  WAL	
  of	
  the	
  primary	
  region	
  
•  Shadow	
  memstore	
  is	
  one	
  HDFS	
  block	
  behind,	
  catch	
  up	
  recover	
  RW	
  
•  Currently	
  some	
  progress	
  for	
  trunk	
  
4/10/14 Hadoop Users Group UK
recovered	
  assign	
  
Region	
  
unavailable	
  
Can	
  guarantee	
  no	
  new	
  edits?	
  
Region	
  available	
  	
  for	
  all	
  RW	
  
detect	
  
Can	
  guarantee	
  we	
  have	
  all	
  edits?	
  
Region	
  available	
  for	
  all	
  RW	
  
Improving	
  the	
  99%cle	
  
Improving	
  Predictability	
  
4/10/14 Hadoop Users Group UK
Common	
  causes	
  of	
  performance	
  variability	
  
•  Locality	
  Loss	
  
•  Favored	
  Nodes,	
  HDFS	
  block	
  affinity	
  
•  Compaccon	
  
•  Exploring	
  compactor	
  
•  GC*	
  	
  
•  Off-­‐heap	
  Cache	
  
•  Hardware	
  hiccups	
  
•  MulM	
  WAL,	
  HDFS	
  speculaMve	
  read	
  
4/10/14 Hadoop Users Group UK
Performance	
  degraded	
  aMer	
  recovery	
  
•  AMer	
  recovery,	
  reads	
  suffer	
  a	
  performance	
  hit.	
  
•  Regions	
  have	
  lost	
  locality	
  
•  To	
  maintain	
  performance	
  aMer	
  failover,	
  we	
  need	
  to	
  regain	
  locality.	
  
•  Compact	
  Region	
  to	
  regain	
  locality	
  
•  We	
  can	
  do	
  beVer	
  by	
  using	
  HDFS	
  features	
  
4/10/14 Hadoop Users Group UK
performance	
  recovered	
  recovered	
  
Service	
  recovered;	
  	
  
degraded	
  performance	
  L	
  
recovery	
  
Performance	
  recovered	
  because	
  	
  
compaccon	
  restores	
  locality	
  J	
  
•  Control	
  and	
  track	
  where	
  block	
  replicas	
  are	
  
•  All	
  files	
  for	
  a	
  region	
  created	
  such	
  that	
  blocks	
  go	
  to	
  the	
  same	
  set	
  of	
  favored	
  nodes	
  
•  When	
  failing	
  over,	
  assign	
  the	
  region	
  to	
  one	
  of	
  those	
  favored	
  nodes.	
  
•  Currently	
  a	
  preview	
  feature	
  in	
  0.96	
  
•  Disabled	
  by	
  default	
  because	
  it	
  doesn’t	
  work	
  well	
  with	
  the	
  latest	
  balancer	
  or	
  splits.	
  
•  Will	
  likely	
  use	
  upcoming	
  HDFS	
  block	
  affinity	
  for	
  beVer	
  operability	
  
•  Originally	
  on	
  Facebook’s	
  0.89,	
  ported	
  to	
  0.96	
  
performance	
  recovered	
  
Read	
  Throughput:	
  Favored	
  Nodes	
  (experimental	
  
0.96)	
  
4/10/14 Hadoop Users Group UK
Service	
  recovered;	
  performance	
  sustained	
  because	
  	
  
region	
  assigned	
  to	
  favored	
  node.	
  J	
  
recovery	
  
Read	
  latency:	
  HDFS	
  hedged	
  read	
  (CDH5.0)	
  
•  HBase’s	
  Region	
  servers	
  use	
  
HDFS	
  client	
  to	
  reads	
  1	
  of	
  3	
  
HDFS	
  block	
  replicas	
  
•  If	
  you	
  chose	
  the	
  slow	
  node,	
  
your	
  reads	
  are	
  slow.	
  
•  If	
  a	
  read	
  is	
  taking	
  too	
  long,	
  
speculacvely	
  go	
  to	
  another	
  
that	
  may	
  be	
  faster.	
  
4/10/14 Hadoop Users Group UKRS	
  
1	
  
2	
  
3	
  
Slow	
  read!	
  
Hdfs	
  replicas	
  RS	
  
1	
  
2	
  
3	
  
Hdfs	
  replicas	
  
Too	
  slow,	
  read	
  other	
  replica	
  
Read	
  latency:	
  Read	
  Replicas	
  (in	
  progress)	
  
•  HBase	
  client	
  reads	
  from	
  primary	
  
region	
  servers.	
  
•  If	
  you	
  chose	
  the	
  slow	
  node,	
  your	
  
reads	
  are	
  slow.	
  
•  Idea:	
  Read	
  replica	
  assigned	
  to	
  other	
  
region	
  servers.	
  	
  Replicas	
  periodically	
  
catch	
  up	
  (via	
  snapshots	
  or	
  shadow	
  
region	
  memstores)	
  	
  	
  
•  Client	
  specifies	
  if	
  stale	
  read	
  OK.	
  	
  If	
  a	
  
read	
  is	
  taking	
  too	
  long,	
  speculacvely	
  
go	
  to	
  another	
  that	
  may	
  be	
  faster.	
  
4/10/14 Hadoop Users Group UK
Hbase	
  	
  
Client	
  
1	
  
Slow	
  read!	
  
1	
  
2	
  
3	
  
Region	
  	
  
replicas	
  
Too	
  slow,	
  read	
  stale	
  replica	
  
Hbase	
  	
  
Client	
  
Write	
  latency:	
  Mulcple	
  WALs	
  (in	
  progress)	
  
•  HBase’s	
  HDFS	
  client	
  writes	
  3	
  
replicas	
  	
  
•  Min	
  write	
  latency	
  is	
  bounded	
  
by	
  the	
  slowest	
  of	
  the	
  3	
  
replicas	
  
•  Idea:	
  If	
  a	
  write	
  is	
  taking	
  too	
  
long	
  let’s	
  duplicate	
  it	
  on	
  
another	
  set	
  that	
  may	
  be	
  
faster.	
  
4/10/14 Hadoop Users Group UKRS	
  
1	
  
2	
  
3	
  
Slow	
  Write	
  
Hdfs	
  	
  
replicas	
  RS	
  
1	
  
2	
  
3	
  
Hdfs	
  	
  
replicas	
  
1	
  
2	
  
3	
  Hdfs	
  
replicas	
  
Too	
  slow,	
  write	
  	
  
to	
  other	
  replica	
  
Improving	
  Usability	
  
Autotuning,	
  Tracing,	
  and	
  SQL	
  
4/10/14 Hadoop Users Group UK
Making	
  HBase	
  easier	
  to	
  use	
  and	
  tune.	
  
•  Difficult	
  to	
  see	
  what	
  is	
  happening	
  in	
  HBase	
  
•  Easy	
  to	
  make	
  poor	
  design	
  decisions	
  early	
  without	
  realizing	
  
	
  
•  New	
  Developments	
  
•  Memory	
  auto	
  tuning	
  
•  HTrace	
  +	
  Zipkin	
  
•  Frameworks	
  for	
  Schema	
  design	
  
4/10/14 Hadoop Users Group UK
Memory	
  Use	
  Auto-­‐tuning	
  (trunk)	
  
•  Memory	
  is	
  divided	
  between	
  	
  
•  the	
  memstore	
  (used	
  for	
  serving	
  recent	
  writes)	
  	
  
•  the	
  block	
  cache	
  (used	
  for	
  read	
  hot	
  spots)	
  
•  Need	
  to	
  choose	
  balance	
  for	
  work	
  load	
  
4/10/14 Hadoop Users Group UK
memstore	
  
Block	
  cache	
  
memstore	
  
Block	
  cache	
  
memstore	
  
Block	
  cache	
  
Read	
  Heavy	
  	
  
Balanced	
   Write	
  heavy	
  
HBase	
  Schemas	
  
•  HBase	
  Applicacon	
  developers	
  must	
  iterate	
  to	
  find	
  a	
  suitable	
  HBase	
  
schema	
  
•  Schema	
  criMcal	
  for	
  Performance	
  at	
  Scale	
  
•  How	
  can	
  we	
  make	
  this	
  easier?	
  
•  How	
  can	
  we	
  reduce	
  the	
  expercse	
  required	
  to	
  do	
  this?	
  
•  Today:	
  
•  Lots	
  of	
  tuning	
  knobs	
  
•  Developers	
  need	
  to	
  understand	
  Column	
  Families,	
  Rowkey	
  design,	
  Data	
  
encoding,	
  …	
  
•  Some	
  are	
  expensive	
  to	
  change	
  aMer	
  the	
  fact	
  
4/10/14 Hadoop Users Group UK
How	
  should	
  I	
  arrange	
  my	
  data?	
  
•  Isomorphic	
  data	
  representacons!	
  
4/10/14 Hadoop Users Group UK
rowkey	
   d:	
  
bob-­‐col1	
   aaaa	
  
bob-­‐col2	
   bbbb	
  
bob-­‐col3	
   cccc	
  
bob-­‐col4	
   dddd	
  
jon-­‐col1	
   eeee	
  
jon-­‐col2	
   ffff	
  
jon-­‐col3	
   gggg	
  
jon-­‐col4	
   hhhh	
  
Rowkey	
   d:col1	
   d:col2	
   d:col3	
   d:col4	
  
bob	
   aaaa	
   bbbb	
   cccc	
   dddd	
  
jon	
   eeee	
   ffff	
   gggg	
   hhhhh	
  
Rowkey	
   col1:	
   col2:	
   col3:	
   col4:	
  
bob	
   aaaa	
   bbbb	
   cccc	
   dddd	
  
jon	
   eeee	
   ffff	
   gggg	
   hhhhh	
  
Short	
  Fat	
  Table	
  using	
  column	
  qualifiers	
  
Short	
  Fat	
  Table	
  using	
  column	
  families	
  
Tall	
  skinny	
  with	
  	
  
compound	
  rowkey	
  
How	
  should	
  I	
  arrange	
  my	
  data?	
  
•  Isomorphic	
  data	
  representacons!	
  
4/10/14 Hadoop Users Group UK
rowkey	
   d:	
  
bob-­‐col1	
   aaaa	
  
bob-­‐col2	
   bbbb	
  
bob-­‐col3	
   cccc	
  
bob-­‐col4	
   dddd	
  
jon-­‐col1	
   eeee	
  
jon-­‐col2	
   ffff	
  
jon-­‐col3	
   gggg	
  
jon-­‐col4	
   hhhh	
  
Rowkey	
   d:col1	
   d:col2	
   d:col3	
   d:col4	
  
bob	
   aaaa	
   bbbb	
   cccc	
   dddd	
  
jon	
   eeee	
   ffff	
   gggg	
   hhhhh	
  
Rowkey	
   col1:	
   col2:	
   col3:	
   col4:	
  
bob	
   aaaa	
   bbbb	
   cccc	
   dddd	
  
jon	
   eeee	
   ffff	
   gggg	
   hhhhh	
  
Short	
  Fat	
  Table	
  using	
  column	
  qualifiers	
  
Short	
  Fat	
  Table	
  using	
  column	
  families	
  
Tall	
  skinny	
  with	
  	
  
compound	
  rowkey	
  
With	
  great	
  power	
  comes	
  great	
  
responsibility!	
  
	
  
How	
  can	
  we	
  make	
  this	
  easier	
  for	
  users?	
  
Impala	
  
•  Scalable	
  Low-­‐latency	
  SQL	
  querying	
  for	
  HDFS	
  (and	
  HBase!)	
  
•  ODBC/JDBC	
  driver	
  interface	
  
•  Highlights	
  	
  
•  Use’s	
  Hive	
  metastore	
  and	
  its	
  hbase-­‐hbase	
  connector	
  
configuracon	
  convencons.	
  
•  Nacve	
  code	
  implementacon,	
  uses	
  JIT	
  for	
  query	
  
execucon	
  opcmizacon.	
  
•  Authorizacon	
  via	
  Kerberos	
  support	
  
•  Open	
  sourced	
  by	
  Cloudera	
  
•  hVps://github.com/cloudera/impala	
  
4/10/14 Hadoop Users Group UK
Phoenix	
  
•  A	
  SQL	
  skin	
  over	
  HBase	
  targecng	
  low-­‐latency	
  queries.	
  
•  JDBC	
  SQL	
  interface	
  
•  Highlights	
  
•  Adds	
  Types	
  
•  Handles	
  Compound	
  Row	
  key	
  encoding	
  	
  
•  Secondary	
  indices	
  in	
  development	
  
•  Provides	
  some	
  pushdown	
  aggregacons	
  (coprocessor).	
  
•  Open	
  sourced	
  by	
  Salesforce.com	
  
•  Work	
  from	
  James	
  Taylor,	
  Jesse	
  Yates,	
  et	
  al	
  
•  hVps://github.com/forcedotcom/phoenix	
  
4/10/14 Hadoop Users Group UK
Kite	
  (nee	
  Cloudera	
  Development	
  Kit/CDK)	
  
•  APIs	
  that	
  provides	
  a	
  Dataset	
  abstracMon	
  	
  
•  Provides	
  get/put/delete	
  API	
  in	
  avro	
  objects	
  
•  HBase	
  Support	
  in	
  progress	
  
•  Highlights	
  
•  Supports	
  mulcple	
  components	
  of	
  the	
  hadoop	
  distros	
  
(flume,	
  morphlines,	
  hive,	
  crunch,	
  hcat)	
  
•  Provides	
  types	
  using	
  	
  Avro	
  and	
  parquet	
  formats	
  for	
  
encoding	
  encces	
  
•  Manages	
  schema	
  evolucon	
  
•  Open	
  source	
  by	
  Cloudera	
  
•  hVps://github.com/kite-­‐sdk/kite	
  
4/10/14 Hadoop Users Group UK
Many	
  apps	
  and	
  users	
  in	
  a	
  single	
  cluster	
  
Mulc-­‐tenancy	
  
4/10/14 Hadoop Users Group UK
Growing	
  HBase	
  
•  Pre	
  0.96.0:	
  scaling	
  up	
  HBase	
  for	
  
single	
  HBase	
  applicacons	
  
•  Essencally	
  a	
  single	
  user	
  for	
  single	
  
app.	
  
•  Ex:	
  Facebook	
  messages,	
  one	
  
applicacon,	
  many	
  hbase	
  clusters	
  
•  Shard	
  users	
  to	
  different	
  pods	
  
•  Focused	
  on	
  concnuity	
  and	
  disaster	
  
recovery	
  features	
  
•  Cross-­‐cluster	
  Replicacon	
  
•  Table	
  Snapshots	
  
•  Rolling	
  Upgrades	
  
4/10/14 Hadoop Users Group UK
#	
  of	
  isolated	
  applicacons	
  
	
  #	
  of	
  clusters	
  
Scalability	
  
One	
  giant	
  applicacon,	
  	
  
Mulcple	
  clusters	
  
Growing	
  HBase	
  
•  In	
  0.96	
  we	
  introduce	
  primicves	
  
for	
  supporcng	
  MulMtenancy	
  
•  Many	
  users,	
  many	
  applicacons,	
  
one	
  HBase	
  cluster	
  
•  Need	
  to	
  have	
  some	
  control	
  of	
  
the	
  interaccons	
  different	
  users	
  
cause.	
  
•  Ex:	
  Manage	
  for	
  MR	
  analyccs	
  and	
  
low-­‐latency	
  serving	
  in	
  one	
  
cluster.	
  
4/10/14 Hadoop Users Group UK
#	
  of	
  isolated	
  applicacons	
  
	
  #	
  of	
  clusters	
  
mulctenancy	
  
Scalability	
  
One	
  giant	
  applicacon,	
  	
  
Mulcple	
  clusters	
  
Many	
  applicacons	
  	
  
In	
  one	
  shared	
  cluster	
  
Namespaces	
  (0.96)	
  
•  Namespaces	
  provide	
  an	
  abstraccon	
  for	
  mulcple	
  tenants	
  to	
  
create	
  and	
  manage	
  their	
  own	
  tables	
  within	
  a	
  large	
  HBase	
  
instance.	
  
4/10/14 Hadoop Users Group UK
Namespace	
  blue	
   Namespace	
  green	
   Namespace	
  orange	
  
Mulctenancy	
  goals	
  
•  Security	
  (0.96)	
  
•  A	
  separate	
  admin	
  ACLs	
  for	
  different	
  sets	
  of	
  tables	
  
•  Quotas	
  (in	
  progress)	
  
•  Max	
  tables,	
  max	
  regions.	
  
•  Performance	
  Isolacon	
  (in	
  progress)	
  
•  Limit	
  performance	
  impact	
  load	
  on	
  one	
  table	
  has	
  on	
  others.	
  
•  Priority	
  (future)	
  
•  Prioricze	
  some	
  workloads/tables/user	
  before	
  others	
  
4/10/14 Hadoop Users Group UK
Isolacon	
  with	
  Region	
  Server	
  Groups	
  (in	
  progress)	
  
4/10/14 Hadoop Users Group UK
Region	
  assignment	
  distribucon	
  (no	
  region	
  server	
  groups)	
  
Namespace	
  blue	
   Namespace	
  green	
   Namespace	
  orange	
  
Isolacon	
  with	
  Region	
  Server	
  Groups	
  (in	
  progress)	
  
4/10/14 Hadoop Users Group UK
RSG	
  blue	
   RSG	
  green	
  orange	
  
Namespace	
  blue	
   Namespace	
  green	
   Namespace	
  orange	
  
Region	
  assignment	
  distribucon	
  with	
  Region	
  Server	
  Groups	
  (RSG)	
  
Conclusions	
  
4/10/14 Hadoop Users Group UK
Summary	
  by	
  Version	
  
0.90	
  (CDH3)	
   0.92	
  /0.94	
  (CDH4)	
   0.96	
  (CDH5)	
   Next	
  (0.98	
  /	
  1.0.0)	
  
New	
  
Features	
  
Stability	
   	
  Reliability	
  
	
  
Concnuity	
   Mulctenancy	
  
MTTR	
   Recovery	
  in	
  
Hours	
  
Recovery	
  in	
  Minutes	
   Recovery	
  of	
  writes	
  in	
  
seconds,	
  reads	
  in	
  10’s	
  
of	
  seconds	
  	
  
Recovery	
  in	
  Seconds	
  
(reads+writes)	
  
Perf	
   Baseline	
   BeVer	
  Throughput	
   Opcmizing	
  
Performance	
  	
  
Predictable	
  
Performance	
  
Usability	
   HBase	
  
Developer	
  
Expercse	
  
HBase	
  Operaconal	
  
Experience	
  
Distributed	
  Systems	
  
Admin	
  Experience	
  
Applicacon	
  
Developers	
  
Experience	
  
4/10/14 Hadoop Users Group UK
Quescons?	
  
@jmhsieh	
  
4/10/14 Hadoop Users Group UK

Más contenido relacionado

La actualidad más candente

High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsCloudera, Inc.
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.
 
State of HBase: Meet the Release Managers
State of HBase: Meet the Release ManagersState of HBase: Meet the Release Managers
State of HBase: Meet the Release ManagersHBaseCon
 

La actualidad más candente (19)

HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
State of HBase: Meet the Release Managers
State of HBase: Meet the Release ManagersState of HBase: Meet the Release Managers
State of HBase: Meet the Release Managers
 

Similar a Apache HBase: Where We've Been and What's Upcoming

HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataSaurav Kumar Sinha
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaData Con LA
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012Chris Huang
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Kevin Crocker
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 

Similar a Apache HBase: Where We've Been and What's Upcoming (20)

HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big Data
 
Hadoop pycon2011uk
Hadoop pycon2011ukHadoop pycon2011uk
Hadoop pycon2011uk
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 

Más de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

Más de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Apache HBase: Where We've Been and What's Upcoming

  • 1. Headline  Goes  Here   Speaker  Name  or  Subhead  Goes  Here   DO  NOT  USE  PUBLICLY   PRIOR  TO  10/23/12   Apache  HBase  –  Where  we’ve  been   and  what’s  upcoming   Jonathan  Hsieh  |  @jmhsieh     Tech  lead  /  SoMware  Engineer  at  Cloudera  |  HBase  PMC  Member     Hadoop  Users  Group  UK   April  10,  2014   4/10/14 Hadoop Users Group UK
  • 2. Who  Am  I?   •  Cloudera:   •  Tech  Lead  HBase  Team   •  So<ware  Engineer   •  Apache  HBase  commiVer  /  PMC     •  Apache  Flume  founder  /  PMC   •  U  of  Washington:   •  Research  in  Distributed  Systems   4/10/14 Hadoop Users Group UK
  • 3. What  is  Apache  HBase?   Apache  HBase  is  a   reliable,  column-­‐ oriented  data  store  that   provides  consistent,  low-­‐ latency,  random  read/ write  access.   ZK   HDFS   App   MR   4/10/14 Hadoop Users Group UK
  • 4. HBase  provides  Low-­‐latency  Random  Access   •  Writes:     •  1-­‐3ms,  1k-­‐20k  writes/sec  per  node   •  Reads:     •  0-­‐3ms  cached,  10-­‐30ms  disk   •  10k-­‐40k  reads  /  second  /  node  from   cache   •  Cell  size:     •  0B-­‐3MB     •  Read,  write,  and  insert  data   anywhere  in  the  table   4/10/14 Hadoop Users Group UK 0000000000   1111111111   2222222222   3333333333   4444444444   5555555555   6666666666   7777777777   1   2   3   4   5  
  • 5. Core  Properces   •  ACID  guarantees  on  a  row     •  Writes  are  durable   •  Strong  consistency  first,  then  availability   •  AMer  failure,  recover  and  return  current  value  instead  of  returning  stale  value   •  CAS  and  atomic  increments  can  be  efficient.   •  Sorted  By  Primary  Key     •  Short  scans  are  efficient   •  Parcconed  by  Primary  Key   •  Log  Structured  Merged  Tree   •  Writes  are  extremely  efficient   •  Reads  are  efficient   •  Periodic  layout  opcmizacons  for  read  opcmizacon  (“compaccons”)  required.   4/10/14 Hadoop Users Group UK
  • 6. An  HBase  History   Where  We’ve  Been   4/10/14 Hadoop Users Group UK
  • 7. Jan  ‘12:  0.92.0   Apache  HBase  Timeline   4/10/14 Hadoop Users Group UK 2014  2006   2007   2008   2009   2010   2011   2013  2012   Nov  ’06:   Google     BigTable  OSDI  ‘06   Apr  ‘07:  First   Apache  HBase   commit  as   Hadoop  contrib   project   Apr  ‘10:  Apache   HBase  becomes   top  level  project   Oct  ‘13:  0.96.0   Apr’11:  CDH3  GA   with  HBase  0.90.1       May  ‘12:   HBaseCon  2012   Jun  ‘13:   HBaseCon  2013   Jan‘08:  Promoted   to  Hadoop   subproject   Feb  ‘13:  0.98.0  May  ‘12:  0.94.0   Summer‘11:     Messages     on  HBase    Summer  ‘09   StumbleUpon     goes  produccon  on   HBase  ~0.20   Nov  ‘11:     Cassini   on  HBase   Jan  ‘13   Phoenix   on  HBase   Summer‘11:    Web   Crawl     Cache  
  • 8. Developer  Community   •  Accve  community!   •  Diverse  commiVers   from  many   organizacons   4/10/14 Hadoop Users Group UK
  • 9. Apache  HBase  “Nascar”  Slide   4/10/14 Hadoop Users Group UK
  • 10. Apache  HBase  Core  Development   4/10/14 Hadoop Users Group UK •  Vendors   •  Self  Service  
  • 11. Apache  HBase  Sample  Users   4/10/14 Hadoop Users Group UK •  Inbox   •  Storage   •  Web   •  Search   •  Analyccs   •  Monitoring      
  • 12. Apache  HBase  Ecosystem  Projects   4/10/14 Hadoop Users Group UK
  • 13. What’s  here  and  new  today?   Today:  Apache  0.96.2  /  0.98.1   4/10/14 Hadoop Users Group UK
  • 14. Criccal  Features   Disaster  Recovery   •  Cluster  Replicacon   •  Table  Snapshots   •  Copy  Table   •  Import  /  Export  Tables   •  Metadata  Corrupcon  repair   tool  (hbck)   AdministraMve  and  ConMnuity   •  Kerberos  based  Authenccacon   •  ACL  based  Authorizacon   •  Config  change  via  rolling  restart.   •  Within  version  rolling  upgrade.   •  Protobuf  based  wire  protocol   for  RPC  future  proofing   4/10/14 Hadoop Users Group UK
  • 15. Hardened  for  0.96   Table  AdministraMon   •  Online  Schema  change   •  Online  Region  Merging   •  Concnuous  fault  injeccon   tescng  with  “Chaos  Monkey”   Performance  Tuning   •  Alternate  key  encodings  for   efficient  memory  usage   •  Exploring  Compactor  policy   minimizes  compaccon  storms   •  Smart  and  Adapcve  Stochascc   region  load  balancer   •  Fast  split  policy  for  new  tables   4/10/14 Hadoop Users Group UK
  • 16. MR  over  Table  Snapshots  (0.98,  CDH5.0)   •  Previously  MapReduce  jobs   over  HBase  required  online  full   table  scan   •  Idea:  Take    a  snapshot  and  run   MR  job  over  snapshot  files   •  Doesn’t  use  HBase  client   •  Avoid  affeccng  HBase  caches     •  3-­‐5x  perf  boost.   4/10/14 Hadoop Users Group UK map   map   map   map   map   map   map   map   reduce   reduce   reduce   map   map   map   map   map   map   map   map   reduce   reduce   reduce   snapshot  
  • 17. Mean  Time  to  Recovery  (MTTR)   •  Machine  failures  happen  in  distributed  systems   •  Average  unavailability  when  automaccally  recovering  from  a   failure.   •  Recovery  cme  for  a  unclean  data  center  power  cycle   4/10/14 Hadoop Users Group UK recovered  nocfy  repair  detect   Region   unavailable   Region  available   client  aware   Region  available   client  unaware  
  • 18. Fast  nocficacon  and  deteccon  (0.96)   •  Proaccve  nocficacon  of  HMaster  failure  (0.96)   •  Proaccve  nocficacon  of  RS  failure  (0.96)   •  Nocfy  client  on  recovery  (0.96)   •  Fast  server  failover  (Hardware)   4/10/14 Hadoop Users Group UK recovered  replay  assign  split   Region   unavailable   Region  available     for  RW   hdfs   hdfs   detect   hdfs  
  • 19. •  Previously  had  two  IO  intensive  passes:   •  Log  splitng  to  intermediate  files   •  Assign  and  log  replay   •  Now  just  one  IO  heavy  pass:    Assign  first,  then  split+replay.   •  Improves  read  and  write  recovery  cmes.   •  Off  by  default  currently*.   Distributed  log  replay  (experimental  0.96)   4/10/14 Hadoop Users Group UK recovered  split  +  replay  assign   Region   unavailable   Region  available     for  RW   Region  available     for  replay  writes   hdfs   detect   *Caveat:  If  you  override  cme  stamps  you  could  have     READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  
  • 20. Cell  Tags  (0.98  experimental)   •  Mechanism  for  aVaching  arbitrary  metadata  to   Cells.     •  Mocvacon:  Finer-­‐grained  isolacon   •  Use  for  Accumulo-­‐style  cell-­‐level  visibility     •  Main  feature  for  0.98   •  Other  uses:   •  Add  sequence  numbers  to  enable  correct  fast   read/write  recovery   •  Potencal  for  schema  tags   4/10/14 Hadoop Users Group UK
  • 21. Htrace  (0.96  experimental)   •  Problem:   •  Where  is  cme  being  spent  inside  HBase?       •  Solucon:  HTrace  Framework   •  Inspired  by  Google  Dapper   •  Threaded  through  HBase  and  HDFS     •  Tracks  cme  spent  in  calls  in  a  distributed  system  by  tracking  spans*   on  different  machines.   *Some  assembly  scll  required.   4/10/14 Hadoop Users Group UK
  • 22. HTrace:  Distributed  Tracing  in  HBase  and  HDFS   •  Framework  Inspired  by   Google  Dapper   •  Tracks  cme  spent  in  calls   in  RPCs  across  different   machines.   •  Threaded  through  HBase   (0.96)  and  future  HDFS.   4/10/14 Hadoop Users Group UK HBase     RS   1   HDFS   DN  ZK   HBase   Client   HBase   meta   HDFS       NN   A  span   RPC  calls  
  • 23. Zipkin  –  Visualizing  Spans   •  UI  +  Visualizacon  System   •  WriVen  by  TwiVer   •  Zipkin  HBase  Storage   •  Zipkin  HTrace  integracon   •  View  where  cme  from  a   specific  call  is  spent  in   HBase,  HDFS,  and  ZK.   4/10/14 Hadoop Users Group UK
  • 24. A  Future  HBase   What’s  Upcoming   4/10/14 Hadoop Users Group UK
  • 25. Outline   •  Improved  Mean  cme  to  recovery  (MTTR)   •  Improved  Predictability   •  Improved  Usability   •  Improved  Mulctenancy   4/10/14 Hadoop Users Group UK
  • 26. Faster  read  recovery   Improving  MTTR  Further     4/10/14 Hadoop Users Group UK
  • 27. •  Previously  had  two  IO  intensive  passes:   •  Log  splitng  to  intermediate  files   •  Assign  and  log  replay   •  Now  just  one  IO  heavy  pass:    Assign  first,  then  split+replay.   •  Improves  read  and  write  recovery  cmes.   •  Off  by  default  currently*.   Distributed  log  replay  (experimental  0.96)   4/10/14 Hadoop Users Group UK recovered  split  +  replay  assign   Region   unavailable   Region  available     for  RW   Region  available     for  replay  writes   hdfs   detect   *Caveat:  If  you  override  cme  stamps  you  could  have     READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  
  • 28. recovered  split  +  replay   Distributed  log  replay  with  fast  write  recovery   4/10/14 Hadoop Users Group UK assign   Region   unavailable   Region  available     for  RW   Region  available     for  all  writes   hdfs   detect   •  Writes  in  HBase  do  not  incur  reads.   •  With  distributed  log  replay,  we’ve  already  have  regions  open   for  write.   •  Allow  fresh  writes  while  replaying  old  logs*.   *Caveat:  If  you  override  cme  stamps  you  could  have     READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  
  • 29. Fast  Read  Recovery  (proposed)   •  Idea:  Priscne  Region  fast  read  recovery   •  If  region  not  edited  it  is  consistent  and  can  recover  RW  immediately     •  Idea:  Shadow  Regions  for  fast  read  recovery   •  Shadow  region  tails  the  WAL  of  the  primary  region   •  Shadow  memstore  is  one  HDFS  block  behind,  catch  up  recover  RW   •  Currently  some  progress  for  trunk   4/10/14 Hadoop Users Group UK recovered  assign   Region   unavailable   Can  guarantee  no  new  edits?   Region  available    for  all  RW   detect   Can  guarantee  we  have  all  edits?   Region  available  for  all  RW  
  • 30. Improving  the  99%cle   Improving  Predictability   4/10/14 Hadoop Users Group UK
  • 31. Common  causes  of  performance  variability   •  Locality  Loss   •  Favored  Nodes,  HDFS  block  affinity   •  Compaccon   •  Exploring  compactor   •  GC*     •  Off-­‐heap  Cache   •  Hardware  hiccups   •  MulM  WAL,  HDFS  speculaMve  read   4/10/14 Hadoop Users Group UK
  • 32. Performance  degraded  aMer  recovery   •  AMer  recovery,  reads  suffer  a  performance  hit.   •  Regions  have  lost  locality   •  To  maintain  performance  aMer  failover,  we  need  to  regain  locality.   •  Compact  Region  to  regain  locality   •  We  can  do  beVer  by  using  HDFS  features   4/10/14 Hadoop Users Group UK performance  recovered  recovered   Service  recovered;     degraded  performance  L   recovery   Performance  recovered  because     compaccon  restores  locality  J  
  • 33. •  Control  and  track  where  block  replicas  are   •  All  files  for  a  region  created  such  that  blocks  go  to  the  same  set  of  favored  nodes   •  When  failing  over,  assign  the  region  to  one  of  those  favored  nodes.   •  Currently  a  preview  feature  in  0.96   •  Disabled  by  default  because  it  doesn’t  work  well  with  the  latest  balancer  or  splits.   •  Will  likely  use  upcoming  HDFS  block  affinity  for  beVer  operability   •  Originally  on  Facebook’s  0.89,  ported  to  0.96   performance  recovered   Read  Throughput:  Favored  Nodes  (experimental   0.96)   4/10/14 Hadoop Users Group UK Service  recovered;  performance  sustained  because     region  assigned  to  favored  node.  J   recovery  
  • 34. Read  latency:  HDFS  hedged  read  (CDH5.0)   •  HBase’s  Region  servers  use   HDFS  client  to  reads  1  of  3   HDFS  block  replicas   •  If  you  chose  the  slow  node,   your  reads  are  slow.   •  If  a  read  is  taking  too  long,   speculacvely  go  to  another   that  may  be  faster.   4/10/14 Hadoop Users Group UKRS   1   2   3   Slow  read!   Hdfs  replicas  RS   1   2   3   Hdfs  replicas   Too  slow,  read  other  replica  
  • 35. Read  latency:  Read  Replicas  (in  progress)   •  HBase  client  reads  from  primary   region  servers.   •  If  you  chose  the  slow  node,  your   reads  are  slow.   •  Idea:  Read  replica  assigned  to  other   region  servers.    Replicas  periodically   catch  up  (via  snapshots  or  shadow   region  memstores)       •  Client  specifies  if  stale  read  OK.    If  a   read  is  taking  too  long,  speculacvely   go  to  another  that  may  be  faster.   4/10/14 Hadoop Users Group UK Hbase     Client   1   Slow  read!   1   2   3   Region     replicas   Too  slow,  read  stale  replica   Hbase     Client  
  • 36. Write  latency:  Mulcple  WALs  (in  progress)   •  HBase’s  HDFS  client  writes  3   replicas     •  Min  write  latency  is  bounded   by  the  slowest  of  the  3   replicas   •  Idea:  If  a  write  is  taking  too   long  let’s  duplicate  it  on   another  set  that  may  be   faster.   4/10/14 Hadoop Users Group UKRS   1   2   3   Slow  Write   Hdfs     replicas  RS   1   2   3   Hdfs     replicas   1   2   3  Hdfs   replicas   Too  slow,  write     to  other  replica  
  • 37. Improving  Usability   Autotuning,  Tracing,  and  SQL   4/10/14 Hadoop Users Group UK
  • 38. Making  HBase  easier  to  use  and  tune.   •  Difficult  to  see  what  is  happening  in  HBase   •  Easy  to  make  poor  design  decisions  early  without  realizing     •  New  Developments   •  Memory  auto  tuning   •  HTrace  +  Zipkin   •  Frameworks  for  Schema  design   4/10/14 Hadoop Users Group UK
  • 39. Memory  Use  Auto-­‐tuning  (trunk)   •  Memory  is  divided  between     •  the  memstore  (used  for  serving  recent  writes)     •  the  block  cache  (used  for  read  hot  spots)   •  Need  to  choose  balance  for  work  load   4/10/14 Hadoop Users Group UK memstore   Block  cache   memstore   Block  cache   memstore   Block  cache   Read  Heavy     Balanced   Write  heavy  
  • 40. HBase  Schemas   •  HBase  Applicacon  developers  must  iterate  to  find  a  suitable  HBase   schema   •  Schema  criMcal  for  Performance  at  Scale   •  How  can  we  make  this  easier?   •  How  can  we  reduce  the  expercse  required  to  do  this?   •  Today:   •  Lots  of  tuning  knobs   •  Developers  need  to  understand  Column  Families,  Rowkey  design,  Data   encoding,  …   •  Some  are  expensive  to  change  aMer  the  fact   4/10/14 Hadoop Users Group UK
  • 41. How  should  I  arrange  my  data?   •  Isomorphic  data  representacons!   4/10/14 Hadoop Users Group UK rowkey   d:   bob-­‐col1   aaaa   bob-­‐col2   bbbb   bob-­‐col3   cccc   bob-­‐col4   dddd   jon-­‐col1   eeee   jon-­‐col2   ffff   jon-­‐col3   gggg   jon-­‐col4   hhhh   Rowkey   d:col1   d:col2   d:col3   d:col4   bob   aaaa   bbbb   cccc   dddd   jon   eeee   ffff   gggg   hhhhh   Rowkey   col1:   col2:   col3:   col4:   bob   aaaa   bbbb   cccc   dddd   jon   eeee   ffff   gggg   hhhhh   Short  Fat  Table  using  column  qualifiers   Short  Fat  Table  using  column  families   Tall  skinny  with     compound  rowkey  
  • 42. How  should  I  arrange  my  data?   •  Isomorphic  data  representacons!   4/10/14 Hadoop Users Group UK rowkey   d:   bob-­‐col1   aaaa   bob-­‐col2   bbbb   bob-­‐col3   cccc   bob-­‐col4   dddd   jon-­‐col1   eeee   jon-­‐col2   ffff   jon-­‐col3   gggg   jon-­‐col4   hhhh   Rowkey   d:col1   d:col2   d:col3   d:col4   bob   aaaa   bbbb   cccc   dddd   jon   eeee   ffff   gggg   hhhhh   Rowkey   col1:   col2:   col3:   col4:   bob   aaaa   bbbb   cccc   dddd   jon   eeee   ffff   gggg   hhhhh   Short  Fat  Table  using  column  qualifiers   Short  Fat  Table  using  column  families   Tall  skinny  with     compound  rowkey   With  great  power  comes  great   responsibility!     How  can  we  make  this  easier  for  users?  
  • 43. Impala   •  Scalable  Low-­‐latency  SQL  querying  for  HDFS  (and  HBase!)   •  ODBC/JDBC  driver  interface   •  Highlights     •  Use’s  Hive  metastore  and  its  hbase-­‐hbase  connector   configuracon  convencons.   •  Nacve  code  implementacon,  uses  JIT  for  query   execucon  opcmizacon.   •  Authorizacon  via  Kerberos  support   •  Open  sourced  by  Cloudera   •  hVps://github.com/cloudera/impala   4/10/14 Hadoop Users Group UK
  • 44. Phoenix   •  A  SQL  skin  over  HBase  targecng  low-­‐latency  queries.   •  JDBC  SQL  interface   •  Highlights   •  Adds  Types   •  Handles  Compound  Row  key  encoding     •  Secondary  indices  in  development   •  Provides  some  pushdown  aggregacons  (coprocessor).   •  Open  sourced  by  Salesforce.com   •  Work  from  James  Taylor,  Jesse  Yates,  et  al   •  hVps://github.com/forcedotcom/phoenix   4/10/14 Hadoop Users Group UK
  • 45. Kite  (nee  Cloudera  Development  Kit/CDK)   •  APIs  that  provides  a  Dataset  abstracMon     •  Provides  get/put/delete  API  in  avro  objects   •  HBase  Support  in  progress   •  Highlights   •  Supports  mulcple  components  of  the  hadoop  distros   (flume,  morphlines,  hive,  crunch,  hcat)   •  Provides  types  using    Avro  and  parquet  formats  for   encoding  encces   •  Manages  schema  evolucon   •  Open  source  by  Cloudera   •  hVps://github.com/kite-­‐sdk/kite   4/10/14 Hadoop Users Group UK
  • 46. Many  apps  and  users  in  a  single  cluster   Mulc-­‐tenancy   4/10/14 Hadoop Users Group UK
  • 47. Growing  HBase   •  Pre  0.96.0:  scaling  up  HBase  for   single  HBase  applicacons   •  Essencally  a  single  user  for  single   app.   •  Ex:  Facebook  messages,  one   applicacon,  many  hbase  clusters   •  Shard  users  to  different  pods   •  Focused  on  concnuity  and  disaster   recovery  features   •  Cross-­‐cluster  Replicacon   •  Table  Snapshots   •  Rolling  Upgrades   4/10/14 Hadoop Users Group UK #  of  isolated  applicacons    #  of  clusters   Scalability   One  giant  applicacon,     Mulcple  clusters  
  • 48. Growing  HBase   •  In  0.96  we  introduce  primicves   for  supporcng  MulMtenancy   •  Many  users,  many  applicacons,   one  HBase  cluster   •  Need  to  have  some  control  of   the  interaccons  different  users   cause.   •  Ex:  Manage  for  MR  analyccs  and   low-­‐latency  serving  in  one   cluster.   4/10/14 Hadoop Users Group UK #  of  isolated  applicacons    #  of  clusters   mulctenancy   Scalability   One  giant  applicacon,     Mulcple  clusters   Many  applicacons     In  one  shared  cluster  
  • 49. Namespaces  (0.96)   •  Namespaces  provide  an  abstraccon  for  mulcple  tenants  to   create  and  manage  their  own  tables  within  a  large  HBase   instance.   4/10/14 Hadoop Users Group UK Namespace  blue   Namespace  green   Namespace  orange  
  • 50. Mulctenancy  goals   •  Security  (0.96)   •  A  separate  admin  ACLs  for  different  sets  of  tables   •  Quotas  (in  progress)   •  Max  tables,  max  regions.   •  Performance  Isolacon  (in  progress)   •  Limit  performance  impact  load  on  one  table  has  on  others.   •  Priority  (future)   •  Prioricze  some  workloads/tables/user  before  others   4/10/14 Hadoop Users Group UK
  • 51. Isolacon  with  Region  Server  Groups  (in  progress)   4/10/14 Hadoop Users Group UK Region  assignment  distribucon  (no  region  server  groups)   Namespace  blue   Namespace  green   Namespace  orange  
  • 52. Isolacon  with  Region  Server  Groups  (in  progress)   4/10/14 Hadoop Users Group UK RSG  blue   RSG  green  orange   Namespace  blue   Namespace  green   Namespace  orange   Region  assignment  distribucon  with  Region  Server  Groups  (RSG)  
  • 54. Summary  by  Version   0.90  (CDH3)   0.92  /0.94  (CDH4)   0.96  (CDH5)   Next  (0.98  /  1.0.0)   New   Features   Stability    Reliability     Concnuity   Mulctenancy   MTTR   Recovery  in   Hours   Recovery  in  Minutes   Recovery  of  writes  in   seconds,  reads  in  10’s   of  seconds     Recovery  in  Seconds   (reads+writes)   Perf   Baseline   BeVer  Throughput   Opcmizing   Performance     Predictable   Performance   Usability   HBase   Developer   Expercse   HBase  Operaconal   Experience   Distributed  Systems   Admin  Experience   Applicacon   Developers   Experience   4/10/14 Hadoop Users Group UK
  • 55. Quescons?   @jmhsieh   4/10/14 Hadoop Users Group UK