SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
1	
  ©MapR	
  Technologies	
  	
  
HBase	
  and	
  M7	
  Technical	
  
Overview	
  
Jim	
  Fiori	
  
Senior	
  Solu8ons	
  Architect	
  
MapR	
  Technologies	
  
April2013	
  
2	
  ©MapR	
  Technologies	
  	
  
§  Background	
  
§  “a	
  3-­‐hour	
  tour”	
  
§  Early	
  Hadoop	
  fire-­‐fight	
  
§  Big	
  Data	
  
Who	
  am	
  I?	
  
3	
  ©MapR	
  Technologies	
  	
  
Apache	
  HBase	
  
MapR	
  
M7	
  
	
  
Agenda	
  
4	
  ©MapR	
  Technologies	
  	
  
HBase	
  
Google	
  BigTable	
  Paper	
  -­‐	
  2006	
  
A	
  sparse,	
  distributed,	
  persistent,	
  indexed,	
  and	
  
sorted	
  map	
  
OR	
  
A	
  NoSQL	
  database	
  
OR	
  
A	
  Columnar	
  data	
  store	
  
	
  
5	
  ©MapR	
  Technologies	
  	
  
Key-­‐Value	
  Store	
  
§  Row	
  key	
  
–  Binary	
  sortable	
  value	
  
§  Row	
  content	
  key	
  (analogous	
  to	
  a	
  column)	
  
–  Column	
  family	
  (string)	
  
–  Column	
  qualifier	
  (binary)	
  
–  Version/8mestamp	
  (number)	
  
§  A	
  row	
  key,	
  column	
  family,	
  column	
  qualifier,	
  and	
  version	
  uniquely	
  
iden8fies	
  a	
  par8cular	
  cell	
  
–  A	
  cell	
  contains	
  a	
  single	
  binary	
  value	
  
6	
  ©MapR	
  Technologies	
  	
  
A	
  Row	
  	
  
Value1	
  Row	
  Key	
   Value2	
   Value3	
   Value4	
   ValueN	
  
…	
  
	
  C0 	
   	
   	
  C1 	
   	
   	
  C2 	
   	
   	
  C3 	
   	
   	
  C4 	
   	
   	
   	
  CN	
  
Column	
  
Family	
  
Row	
  Key	
  
Column	
  
Qualifier	
  
Version	
   Value2	
  
Column	
  
Family	
  
Row	
  Key	
  
Column	
  
Qualifier	
  
Version	
   Value1	
  
Column	
  
Family	
  
Row	
  Key	
  
Column	
  
Qualifier	
  
Version	
   ValueN	
  
…
7	
  ©MapR	
  Technologies	
  	
  
§  Weakly	
  typed	
  and	
  schema-­‐less	
  (unstructured	
  or	
  perhaps	
  semi-­‐
structured)	
  
–  Almost	
  everything	
  is	
  binary	
  
§  No	
  constraints	
  
–  You	
  can	
  put	
  any	
  binary	
  value	
  in	
  any	
  cell	
  
–  You	
  can	
  even	
  put	
  incompa8ble	
  types	
  in	
  two	
  different	
  instances	
  of	
  the	
  same	
  
column	
  family:column	
  qualifier	
  
§  Column	
  (qualifiers)	
  are	
  created	
  implicitly	
  
§  Different	
  rows	
  can	
  have	
  different	
  columns	
  
§  No	
  transac8ons/no	
  ACID	
  
–  Only	
  unit	
  of	
  atomic	
  opera8on	
  is	
  a	
  single	
  row	
  
Not	
  A	
  TradiDonal	
  RDBMS	
  
8	
  ©MapR	
  Technologies	
  	
  
§  APIs	
  for	
  querying	
  (get),	
  scanning,	
  and	
  upda8ng	
  (put)	
  
–  Operate	
  on	
  row	
  key,	
  column	
  family,	
  qualifier,	
  version,	
  and	
  values	
  
–  Can	
  par8ally	
  specify	
  and	
  will	
  retrieve	
  union	
  of	
  results	
  
•  if	
  just	
  specify	
  row	
  key,	
  will	
  get	
  all	
  values	
  for	
  it	
  (with	
  column	
  family,	
  qualifier)	
  
–  By	
  default	
  only	
  largest	
  version	
  (most	
  recent	
  if	
  8mestamp)	
  	
  is	
  returned	
  
•  Specify	
  row	
  key	
  and	
  column	
  family	
  to	
  get	
  will	
  retrieve	
  all	
  values	
  for	
  that	
  row	
  and	
  
column	
  family	
  
–  Scanning	
  is	
  just	
  get	
  over	
  a	
  range	
  of	
  row	
  keys	
  
§  Version	
  
–  While	
  defaults	
  to	
  a	
  8mestamp,	
  any	
  integer	
  is	
  acceptable	
  
API	
  
9	
  ©MapR	
  Technologies	
  	
  
§  Rather	
  than	
  storing	
  table	
  rows	
  linearly	
  on	
  disk	
  and	
  each	
  row	
  on	
  
disk	
  as	
  a	
  single	
  byte	
  range	
  with	
  fixed	
  size	
  fields,	
  store	
  columns	
  of	
  
row	
  separately	
  
–  Very	
  efficient	
  storage	
  for	
  sparse	
  data	
  sets	
  (NULL	
  is	
  free)	
  
–  Compression	
  works	
  beker	
  on	
  similar	
  data	
  	
  
–  Fetches	
  of	
  only	
  subsets	
  of	
  row	
  very	
  efficient	
  (less	
  disk	
  IO)	
  
–  No	
  fixed	
  size	
  on	
  column	
  values	
  
–  No	
  requirement	
  to	
  even	
  define	
  columns	
  
§  Columns	
  are	
  grouped	
  together	
  into	
  column	
  families	
  
–  Basically	
  a	
  file	
  on	
  disk	
  
–  A	
  unit	
  of	
  op8miza8on	
  
–  In	
  Hbase,	
  adding	
  column	
  is	
  implicit,	
  adding	
  column	
  family	
  is	
  explicit	
  
Columnar	
  
10	
  ©MapR	
  Technologies	
  	
  
HBase	
  Table	
  Architecture	
  
§  Tables	
  are	
  divided	
  into	
  key	
  ranges	
  (regions)	
  
§  Regions	
  are	
  served	
  by	
  nodes	
  (RegionServers)	
  
§  Columns	
  are	
  divided	
  into	
  access	
  groups	
  (columns	
  families)	
  
CF1	
   CF2	
   CF3	
   CF4	
   CF5	
  
R1	
  
R2	
  
R3	
  
R4	
  
11	
  ©MapR	
  Technologies	
  	
  
HBase	
  Architecture	
  
12	
  ©MapR	
  Technologies	
  	
  
§  Data	
  is	
  stored	
  in	
  sorted	
  order	
  
–  A	
  table	
  contains	
  rows	
  
–  A	
  sequence	
  of	
  rows	
  are	
  grouped	
  together	
  into	
  a	
  region	
  
•  A	
  region	
  consists	
  of	
  various	
  files	
  related	
  to	
  those	
  rows	
  and	
  is	
  loaded	
  into	
  a	
  region	
  
server	
  
•  Regions	
  are	
  stored	
  in	
  HDFS	
  for	
  high	
  availability	
  
–  A	
  single	
  region	
  server	
  manages	
  mul8ple	
  regions	
  
•  Region	
  assignment	
  can	
  change	
  –	
  load	
  balancing,	
  failures,	
  etc.	
  
§  Clients	
  connect	
  to	
  tables	
  
–  HBase	
  run8me	
  transparently	
  determines	
  the	
  region	
  (based	
  on	
  key	
  ranges)	
  
and	
  contacts	
  the	
  appropriate	
  region	
  server	
  
§  At	
  any	
  given	
  8me	
  exactly	
  one	
  region	
  server	
  provides	
  access	
  to	
  a	
  
region	
  
–  Master	
  region	
  servers	
  (with	
  Zookeeper)	
  manage	
  that	
  
Storage	
  Model	
  Highlights	
  
13	
  ©MapR	
  Technologies	
  	
  
§  Very	
  scalable	
  
§  Easy	
  to	
  add	
  region	
  servers	
  
§  Easy	
  to	
  move	
  regions	
  around	
  
§  Scans	
  are	
  efficient	
  
–  Unlike	
  hashing	
  based	
  models	
  
§  Access	
  via	
  row	
  key	
  is	
  very	
  efficient	
  
–  Note:	
  there	
  are	
  no	
  secondary	
  indexes	
  
§  No	
  schema,	
  can	
  store	
  whatever	
  you	
  want	
  when	
  you	
  want	
  
§  Strong	
  consistency	
  
§  Integrated	
  with	
  Hadoop	
  
–  Map-­‐Reduce	
  on	
  HBase	
  is	
  straighoorward	
  
–  HDFS/MapR-­‐FS	
  provides	
  data	
  replica8on	
  
What’s	
  Great	
  About	
  This?	
  
14	
  ©MapR	
  Technologies	
  	
  
§  Data	
  from	
  a	
  region	
  column	
  family	
  is	
  stored	
  in	
  an	
  HFile	
  
– An	
  HFile	
  contains	
  row	
  key:column	
  qualifier:version:value	
  
entries	
  
– Index	
  at	
  the	
  end	
  into	
  the	
  data	
  –	
  64KB	
  “blocks”	
  by	
  default	
  
§  Update	
  
– New	
  value	
  is	
  wriken	
  persistently	
  to	
  Write	
  Ahead	
  Log	
  (WAL)	
  
– Cached	
  in	
  memory	
  (MemStore)	
  
– When	
  memory	
  fills,	
  write	
  out	
  new	
  HFile	
  
§  Read	
  
– Checks	
  in	
  memory,	
  then	
  all	
  of	
  the	
  HFiles	
  
– Read	
  data	
  cached	
  in	
  memory	
  
§  Delete	
  
– Create	
  a	
  tombstone	
  record	
  (purged	
  at	
  major	
  compac8on)	
  
	
  
Data	
  Storage	
  Architecture	
  
15	
  ©MapR	
  Technologies	
  	
  
Apache	
  HBase	
  HFile	
  Structure	
  
64Kbyte	
  blocks	
  
are	
  compressed	
  
	
  
An	
  index	
  into	
  the	
  
compressed	
  blocks	
  is	
  
created	
  as	
  a	
  btree	
  
Key-­‐value	
  
pairs	
  are	
  
laid	
  out	
  in	
  
increasing	
  
order	
  
Each	
  cell	
  is	
  an	
  individual	
  key	
  +	
  value	
  
	
  	
  -­‐	
  a	
  row	
  repeats	
  the	
  key	
  for	
  each	
  column	
  
16	
  ©MapR	
  Technologies	
  	
  
HBase	
  Region	
  OperaDon	
  
§  Typical	
  region	
  size	
  is	
  a	
  few	
  GB,	
  some8mes	
  even	
  10G	
  or	
  20G	
  
§  RS	
  	
  holds	
  data	
  in	
  memory	
  in	
  a	
  MemStore	
  un8l	
  full,	
  then	
  
writes	
  a	
  new	
  HFile	
  
–  Logical	
  view	
  of	
  database	
  constructed	
  by	
  layering	
  these	
  files,	
  with	
  the	
  
latest	
  on	
  top	
  
	
  
Key	
  range	
  represented	
  by	
  this	
  region	
  
newest	
  
oldest	
  
17	
  ©MapR	
  Technologies	
  	
  
HBase	
  Read	
  AmplificaDon	
  
§  When	
  a	
  get/scan	
  comes	
  in,	
  all	
  the	
  files	
  have	
  to	
  be	
  examined	
  
–  schema-­‐less,	
  so	
  where	
  is	
  the	
  column?	
  
–  Done	
  in-­‐memory	
  and	
  does	
  not	
  change	
  what's	
  on	
  disk	
  
•  Bloom-­‐filters	
  do	
  not	
  help	
  in	
  scans	
  
newest	
  
oldest	
  
With	
  7	
  files,	
  a	
  1K-­‐record	
  get()	
  poten8ally	
  takes	
  about	
  30	
  seeks,	
  	
  
7	
  block	
  fetches	
  and	
  decompressions,	
  from	
  HDFS.	
  Even	
  with	
  the	
  index	
  in	
  memory	
  
7	
  seeks	
  and	
  7	
  block	
  fetches	
  are	
  required.	
  
18	
  ©MapR	
  Technologies	
  	
  
HBase	
  Write	
  AmplificaDon	
  
§  To	
  reduce	
  the	
  read-­‐amplifica8on,	
  HBase	
  merges	
  the	
  HFiles	
  
periodically	
  
–  process	
  called	
  compac8on	
  
–  runs	
  automa8cally	
  when	
  too	
  many	
  files	
  
–  usually	
  turned	
  off	
  due	
  to	
  I/O	
  storms	
  which	
  interfere	
  with	
  client	
  
access	
  
–  and	
  kicked-­‐off	
  manually	
  on	
  weekends	
  
Major	
  compac8on	
  reads	
  all	
  files	
  and	
  
merges	
  	
  into	
  a	
  single	
  HFile	
  
20	
  ©MapR	
  Technologies	
  	
  
§  A	
  persistent	
  record	
  of	
  every	
  update/insert	
  in	
  sequence	
  order	
  
–  Shared	
  by	
  all	
  regions	
  on	
  one	
  region	
  server	
  
–  WAL	
  files	
  periodically	
  rolled	
  to	
  limit	
  size	
  but	
  older	
  WALs	
  s8ll	
  needed	
  
–  WAL	
  file	
  no	
  longer	
  needed	
  once	
  every	
  region	
  with	
  updates	
  in	
  WAL	
  file	
  has	
  
flushed	
  those	
  from	
  memory	
  to	
  an	
  HFile	
  
•  Remember	
  that	
  more	
  HFiles	
  slow	
  read	
  path!	
  
§  Must	
  be	
  replayed	
  as	
  part	
  of	
  recovery	
  process	
  since	
  in	
  memory	
  
updates	
  are	
  “lost”	
  
–  This	
  is	
  very	
  expensive	
  and	
  delays	
  bringing	
  a	
  region	
  back	
  online	
  
WAL	
  File	
  
21	
  ©MapR	
  Technologies	
  	
  
What’s	
  Not	
  So	
  Good	
  
Reliability	
  
• Complex	
  coordina8on	
  between	
  ZK,	
  HDFS,	
  HBase	
  
Master,	
  and	
  Region	
  Server	
  during	
  region	
  movement	
  
• Compac8ons	
  disrupt	
  opera8ons	
  
• Very	
  slow	
  crash	
  recovery	
  because	
  of	
  
• Coordina8on	
  complexity	
  
• WAL	
  log	
  reading	
  (one	
  log/server)	
  
Business	
  conDnuity	
  
• Many	
  administra8ve	
  ac8ons	
  require	
  down8me	
  
• Not	
  well	
  integrated	
  into	
  MapR-­‐FS	
  mirroring	
  and	
  
snapshot	
  func8onality	
  
22	
  ©MapR	
  Technologies	
  	
  
What’s	
  Not	
  So	
  Good	
  
Performance	
  
• Very	
  long	
  read/write	
  path	
  
• Significant	
  read	
  and	
  write	
  amplifica8on	
  
• Mul8ple	
  JVMs	
  in	
  read/write	
  path	
  –	
  GC	
  delays!	
  
Manageability	
  
• Compac8ons,	
  splits	
  and	
  merges	
  must	
  be	
  done	
  
manually	
  (in	
  reality)	
  
• Lots	
  of	
  “well	
  known”	
  problems	
  maintaining	
  reliable	
  
cluster	
  –	
  spliwng,	
  compac8ons,	
  region	
  assignment,	
  etc.	
  
• Prac8cal	
  limits	
  on	
  number	
  of	
  regions/region	
  server	
  and	
  
size	
  of	
  regions	
  –	
  can	
  make	
  it	
  hard	
  to	
  fully	
  u8lize	
  
hardware	
  
23	
  ©MapR	
  Technologies	
  	
  
Region	
  Assignment	
  in	
  Apache	
  HBase	
  
24	
  ©MapR	
  Technologies	
  	
  
Apache	
  HBase	
  on	
  MapR	
  
Limited	
  data	
  management,	
  data	
  protec8on	
  and	
  disaster	
  recovery	
  for	
  tables.	
  	
  
25	
  ©MapR	
  Technologies	
  	
  
HBase	
  
MapR	
  
M7	
  
Containers	
  	
  
Agenda	
  
27	
  ©MapR	
  Technologies	
  	
  
MapR	
  DistribuDon	
  for	
  Apache	
  Hadoop	
  
§  Complete	
  Hadoop	
  distribu8on	
  
	
  
§  Comprehensive	
  management	
  
suite	
  
	
  
§  Industry-­‐standard	
  interfaces	
  
	
  
§  Enterprise-­‐grade	
  
dependability	
  
	
  
§  Higher	
  performance	
  
28	
  ©MapR	
  Technologies	
  	
  
MapR:	
  The	
  Enterprise	
  Grade	
  DistribuDon	
  
29	
  ©MapR	
  Technologies	
  	
  
One	
  PlaVorm	
  for	
  Big	
  Data	
  
…
Batch	
  
99.999%	
  
HA	
  
Data	
  
Protec8on	
  
Disaster	
  
Recovery	
  
Scalability	
  	
  
&	
  
Performance	
  
Enterprise	
  
Integra8on	
  
Mul8-­‐
tenancy	
  
Map	
  
Reduce	
  
File-­‐Based	
  
Applica8ons	
   SQL	
   Database	
   Search	
   Stream	
  
Processing	
  
Interac8ve	
   Real-­‐8me	
  
…	
  
Broad	
  	
  
range	
  of	
  
applica8ons	
  
Recommenda8on	
  Engines	
   Fraud	
  Detec8on	
   Billing	
   Logis8cs	
  
Risk	
  Modeling	
   Market	
  Segmenta8on	
   Inventory	
  Forecas8ng	
  
32	
  ©MapR	
  Technologies	
  	
  
The	
  Cloud	
  Leaders	
  Pick	
  MapR	
  
Google	
  chose	
  MapR	
  to	
  
provide	
  Hadoop	
  on	
  Google	
  
Compute	
  Engine	
  
Amazon	
  EMR	
  is	
  the	
  largest	
  
Hadoop	
  provider	
  in	
  revenue	
  
and	
  #	
  of	
  clusters	
  
MinuteSort	
  Record	
  
1.5	
  TB	
  in	
  60	
  seconds	
  
2103	
  nodes	
  
34	
  ©MapR	
  Technologies	
  	
  
MapR	
  EdiDons	
  
§  Control	
  System	
  
§  NFS	
  Access	
  
§  Performance	
  
§  High	
  Availability	
  
§  Snapshots	
  &	
  Mirroring	
  
§  24	
  X	
  7	
  Support	
  
§  Annual	
  Subscrip8on	
  
§  Control	
  System	
  
§  NFS	
  Access	
  
§  Performance	
  
§  Unlimited	
  Nodes	
  
§  Free	
  	
  
Compute	
  Engine	
  
Also	
  Available	
  through:	
  	
  
§  All	
  the	
  Features	
  of	
  M5	
  
§  Simplified	
  
Administra8on	
  for	
  
HBase	
  
§  Increased	
  Performance	
  
§  Consistent	
  Low	
  Latency	
  
§  Unified	
  Snapshots,	
  
Mirroring	
  
35	
  ©MapR	
  Technologies	
  	
  
Hbase	
  
MapR	
  
M7	
  
	
  
Agenda	
  
37	
  ©MapR	
  Technologies	
  	
  
Introducing	
  MapR	
  M7	
  
§  An	
  integrated	
  system	
  
–  Unified	
  namespace	
  for	
  files	
  and	
  tables	
  
–  Built-­‐in	
  data	
  management	
  &	
  protec8on	
  
–  No	
  extra	
  administra8on	
  
§  Architected	
  for	
  reliability	
  and	
  performance	
  
–  Fewer	
  layers	
  
–  Single	
  hop	
  to	
  data	
  
–  No	
  compac8ons,	
  low	
  i/o	
  amplifica8on	
  
–  Seamless	
  splits,	
  automa8c	
  merges	
  
–  Instant	
  recovery	
  
38	
  ©MapR	
  Technologies	
  	
  
M7:	
  	
  Remove	
  Layers,	
  Simplify	
  
MapR	
  	
  	
  M7	
  
Take	
  note!	
  No	
  JVM!	
  
39	
  ©MapR	
  Technologies	
  	
  
Binary	
  CompaDble	
  with	
  HBase	
  APIs	
  
§  HBase	
  applica8ons	
  work	
  "as	
  is"	
  with	
  M7	
  
–  No	
  need	
  to	
  recompile	
  (binary	
  compa8ble)	
  
§  Can	
  run	
  M7	
  and	
  HBase	
  side-­‐by-­‐side	
  on	
  the	
  same	
  cluster	
  
–  e.g.,	
  during	
  a	
  migra8on	
  
–  can	
  access	
  both	
  M7	
  table	
  and	
  HBase	
  table	
  in	
  same	
  program	
  
	
  
§  Use	
  standard	
  Apache	
  HBase	
  CopyTable	
  tool	
  to	
  copy	
  a	
  table	
  
from	
  HBase	
  to	
  M7	
  or	
  vice-­‐versa	
  
	
  
%	
  hbase	
  org.apache.hadoop.hbase.mapreduce.CopyTable	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐new.name=/user/srivas/mytable	
  oldtable	
  
40	
  ©MapR	
  Technologies	
  	
  
M7:	
  	
  No	
  Master	
  and	
  No	
  RegionServers	
  
No	
  extra	
  daemons	
  to	
  manage	
  
One	
  hop	
  to	
  data	
   Unified	
  cache	
  
No	
  JVM	
  problems	
  
41	
  ©MapR	
  Technologies	
  	
  
Region	
  Assignment	
  in	
  Apache	
  HBase	
  
None	
  of	
  this	
  complexity	
  is	
  present	
  in	
  MapR	
  M7	
  
42	
  ©MapR	
  Technologies	
  	
  
Unified	
  Namespace	
  for	
  Files	
  and	
  Tables	
  
$	
  pwd	
  
/mapr/default/user/dave	
  
	
  
$	
  ls	
  
file1	
  	
  file2	
  	
  table1	
  	
  table2	
  
	
  
$	
  hbase	
  shell	
  
hbase(main):003:0>	
  create	
  '/user/dave/table3',	
  'cf1',	
  'cf2',	
  'cf3'	
  
0	
  row(s)	
  in	
  0.1570	
  seconds	
  
	
  
$	
  ls	
  
file1	
  	
  file2	
  	
  table1	
  	
  table2	
  	
  table3	
  
	
  
$	
  hadoop	
  fs	
  -­‐ls	
  /user/dave	
  
Found	
  5	
  items	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  16	
  2012-­‐09-­‐28	
  08:34	
  /user/dave/file1	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  22	
  2012-­‐09-­‐28	
  08:34	
  /user/dave/file2	
  
trwxr-­‐xr-­‐x	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  2012-­‐09-­‐28	
  08:32	
  /user/dave/table1	
  
trwxr-­‐xr-­‐x	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  2012-­‐09-­‐28	
  08:33	
  /user/dave/table2	
  
trwxr-­‐xr-­‐x	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  2012-­‐09-­‐28	
  08:38	
  /user/dave/table3	
  
43	
  ©MapR	
  Technologies	
  	
  
Tables	
  for	
  End	
  Users	
  
§  Users	
  can	
  create	
  and	
  manage	
  their	
  own	
  tables	
  
–  Unlimited	
  #	
  of	
  tables	
  
	
  
§  Tables	
  can	
  be	
  created	
  in	
  any	
  directory	
  
–  Tables	
  count	
  towards	
  volume	
  and	
  user	
  quotas	
  
§  No	
  admin	
  interven8on	
  needed	
  
–  I	
  can	
  create	
  a	
  file	
  or	
  a	
  directory	
  without	
  opening	
  a	
  8cket	
  with	
  
admin	
  team,	
  why	
  not	
  a	
  table?	
  
–  Do	
  stuff	
  on	
  the	
  fly,	
  	
  no	
  stop/restart	
  servers	
  
§  Automa8c	
  data	
  protec8on	
  and	
  disaster	
  recovery	
  
–  Users	
  can	
  recover	
  from	
  snapshots/mirrors	
  on	
  their	
  own	
  
44	
  ©MapR	
  Technologies	
  	
  
M7	
  –	
  An	
  Integrated	
  System	
  
45	
  ©MapR	
  Technologies	
  	
  
M7	
  
Compara8ve	
  Analysis	
  with	
  
	
  Apache	
  HBase,	
  Level-­‐DB	
  and	
  a	
  BTree	
  
46	
  ©MapR	
  Technologies	
  	
  
HBase	
  Write	
  AmplificaDon	
  Analysis	
  
§  Assume	
  10G	
  per	
  region,	
  write	
  10%	
  per	
  day,	
  grow	
  10%	
  per	
  week	
  
–  1G	
  of	
  writes	
  
–  a~er	
  7	
  days,	
  7	
  files	
  of	
  1G	
  and	
  1file	
  of	
  10G	
  (only	
  1G	
  is	
  growth)	
  
§  IO	
  Cost	
  
–  Wrote	
  7G	
  to	
  WAL	
  +	
  7G	
  to	
  HFiles	
  
–  Compac8on	
  adds	
  s8ll	
  more	
  
•  read:	
  17G	
  	
  (=	
  7	
  x	
  1G	
  	
  +	
  1	
  x	
  10G)	
  
•  write:	
  	
  11G	
  write	
  to	
  new	
  Hfile	
  
–  WAF	
  –	
  wrote	
  7G	
  “for	
  real”	
  but	
  actual	
  disk	
  IO	
  a~er	
  compac8on	
  is	
  read	
  
17G	
  +	
  write	
  25G	
  and	
  that’s	
  assuming	
  no	
  applica8on	
  reads!	
  
§  IO	
  Cost	
  of	
  1000	
  regions	
  similar	
  to	
  above	
  
–  read	
  17T,	
  	
  write	
  25T	
  	
  è	
  major	
  impact	
  on	
  node	
  
§  Best	
  prac8ce,	
  limit	
  #	
  of	
  regions/node	
  à	
  can’t	
  fully	
  u8lize	
  
storage	
  
47	
  ©MapR	
  Technologies	
  	
  
AlternaDve:	
  Level-­‐DB	
  
§  Tiered,	
  logarithmic	
  increase	
  
–  L1:	
  2	
  x	
  1M	
  	
  files	
  
–  L2:	
  	
  10	
  x	
  1M	
  
–  L3:	
  	
  100	
  x	
  1M	
  
–  L4:	
  	
  	
  1,000	
  x	
  1M,	
  etc	
  
§  Compac8on	
  overhead	
  
–  avoids	
  IO	
  storms	
  	
  (i/o	
  done	
  in	
  smaller	
  increments	
  of	
  	
  ~10M)	
  
–  but	
  significantly	
  extra	
  bandwidth	
  compared	
  to	
  HBase	
  
§  Read	
  overhead	
  is	
  s8ll	
  high	
  
–  10-­‐15	
  seeks,	
  perhaps	
  more	
  if	
  the	
  lowest	
  level	
  is	
  very	
  large	
  
–  40K	
  -­‐	
  60K	
  	
  read	
  from	
  disk	
  to	
  retrieve	
  a	
  1K	
  record	
  
48	
  ©MapR	
  Technologies	
  	
  
BTree	
  analysis	
  
§  Read	
  finds	
  data	
  directly,	
  proven	
  to	
  be	
  fastest	
  
–  interior	
  nodes	
  only	
  hold	
  keys	
  
–  very	
  large	
  branching	
  factor	
  
–  values	
  only	
  at	
  leaves	
  
–  thus	
  index	
  caches	
  work	
  
–  R	
  =	
  logN	
  seeks,	
  if	
  no	
  caching	
  
–  1K	
  record	
  read	
  will	
  transfer	
  about	
  logN	
  blocks	
  from	
  disk	
  
§  Writes	
  are	
  slow	
  on	
  inserts	
  
–  inserted	
  into	
  correct	
  place	
  right	
  away	
  
–  otherwise	
  read	
  will	
  not	
  find	
  it	
  
–  requires	
  btree	
  to	
  be	
  con8nuously	
  rebalanced	
  
–  causes	
  extreme	
  random	
  i/o	
  in	
  insert	
  path	
  
–  W	
  =	
  2.5x	
  +	
  logN	
  seeks	
  if	
  no	
  caching	
  
49	
  ©MapR	
  Technologies	
  	
  
Log-­‐Structured	
  Merge	
  Trees	
  
§  LSM	
  Trees	
  reduce	
  insert	
  cost	
  by	
  deferring	
  and	
  batching	
  index	
  changes	
  
–  If	
  don't	
  compact	
  o~en,	
  read	
  perf	
  is	
  impacted	
  
–  If	
  compact	
  too	
  o~en,	
  write	
  perf	
  is	
  impacted	
  
	
  
§  B-­‐Trees	
  are	
  great	
  for	
  reads	
  
–  but	
  expensive	
  to	
  update	
  in	
  real-­‐8me	
  
	
  
Index Log
Index
Memory Disk
Write
Read
Can	
  we	
  combine	
  both	
  ideas?	
  
	
  
Writes	
  cannot	
  be	
  done	
  beker	
  than	
  W	
  =	
  2.5x	
  
write	
  to	
  log	
  	
  +	
  	
  write	
  data	
  to	
  somewhere	
  	
  +	
  	
  update	
  meta-­‐data	
  
	
  
50	
  ©MapR	
  Technologies	
  	
  
M7	
  from	
  MapR	
  
§  Twis8ng	
  BTree's	
  
–  leaves	
  are	
  variable	
  size	
  (8K	
  -­‐	
  8M	
  or	
  larger)	
  
–  can	
  stay	
  unbalanced	
  for	
  long	
  periods	
  of	
  8me	
  
•  more	
  inserts	
  will	
  balance	
  it	
  eventually	
  
•  automa8cally	
  throkles	
  updates	
  to	
  interior	
  btree	
  nodes	
  
–  M7	
  inserts	
  "close	
  to"	
  where	
  the	
  data	
  is	
  supposed	
  to	
  go	
  
§  Reads	
  
–  Uses	
  BTree	
  structure	
  to	
  get	
  "close"	
  very	
  fast	
  
•  very	
  high	
  branching	
  with	
  key-­‐prefix-­‐compression	
  
–  U8lizes	
  a	
  separate	
  lower-­‐level	
  index	
  to	
  find	
  it	
  exactly	
  
•  updated	
  "in-­‐place”	
  bloom-­‐filters	
  for	
  gets,	
  range-­‐maps	
  for	
  scans	
  
	
  
§  Overhead	
  
–  1K	
  record	
  read	
  will	
  transfer	
  about	
  32K	
  from	
  disk	
  in	
  logN	
  seeks	
  
51	
  ©MapR	
  Technologies	
  	
  
M7	
  	
  provides	
  Instant	
  Recovery	
  
§  Instead	
  of	
  having	
  one	
  WAL/region	
  server	
  or	
  even	
  one/region,	
  
we	
  have	
  many	
  micro-­‐WALs/region	
  
§  0-­‐40	
  microWALs	
  per	
  region	
  
–  idle	
  WALs	
  “compacted”,	
  so	
  most	
  are	
  empty	
  
–  region	
  is	
  up	
  before	
  all	
  microWALs	
  are	
  recovered	
  
–  recovers	
  region	
  in	
  background	
  in	
  parallel	
  
–  when	
  a	
  key	
  is	
  accessed,	
  that	
  microWAL	
  is	
  recovered	
  inline	
  
–  1000-­‐10000x	
  faster	
  recovery	
  
§  Never	
  perform	
  equivalent	
  of	
  HBase	
  major	
  or	
  minor	
  
compac8on	
  
§  Why	
  doesn't	
  HBase	
  do	
  this?	
  M7	
  uses	
  MapR-­‐FS,	
  not	
  HDFS	
  
–  No	
  limit	
  to	
  #	
  of	
  files	
  on	
  disk	
  
–  No	
  limit	
  to	
  #	
  open	
  files	
  
–  I/O	
  path	
  translates	
  random	
  writes	
  to	
  sequen8al	
  writes	
  on	
  disk	
  
53	
  ©MapR	
  Technologies	
  	
  
M7:	
  	
  Fileservers	
  Serve	
  Regions	
  
§  Region	
  lives	
  en8rely	
  inside	
  a	
  container	
  
–  Does	
  not	
  coordinate	
  through	
  ZooKeeper	
  
	
  
§  Containers	
  support	
  distributed	
  transac8ons	
  
–  with	
  replica8on	
  built-­‐in	
  
§  Only	
  coordina8on	
  in	
  the	
  system	
  is	
  for	
  splits	
  
–  Between	
  region-­‐map	
  and	
  data-­‐container	
  
–  already	
  solved	
  this	
  problem	
  for	
  files	
  and	
  its	
  chunks	
  
	
  
57	
  ©MapR	
  Technologies	
  	
  
M7	
  Containers	
  
§  Container	
  holds	
  many	
  files	
  
– regular,	
  dir,	
  symlink,	
  btree,	
  chunk-­‐map,	
  region-­‐map,	
  …	
  
– all	
  random-­‐write	
  capable	
  
§  Container	
  is	
  replicated	
  to	
  servers	
  
– unit	
  of	
  resynchroniza8on	
  
§  Region	
  lives	
  en8rely	
  inside	
  1	
  container	
  
– all	
  files	
  +	
  WALs	
  +	
  btree's	
  +	
  bloom-­‐filters	
  +	
  range-­‐maps	
  
63	
  ©MapR	
  Technologies	
  	
  
Other	
  M7	
  Features	
  
§  Smaller	
  disk	
  footprint	
  
– M7	
  never	
  repeats	
  the	
  key	
  or	
  column	
  name	
  
§  Columnar	
  layout	
  
– M7	
  supports	
  64	
  column	
  families	
  
– in-­‐memory	
  column-­‐families	
  
§  Online	
  admin	
  
– M7	
  schema	
  changes	
  on	
  the	
  fly	
  
– delete/rename/redistribute	
  tables	
   	
  	
  
§  Run	
  MapReduce	
  and	
  tables	
  on	
  same	
  cluster	
  
§  UI:	
  hbase	
  shell,	
  MCS	
  GUI,	
  maprcli	
  
64	
  ©MapR	
  Technologies	
  	
  
Thank	
  you!	
  
	
  
QuesDons?	
  

Más contenido relacionado

La actualidad más candente

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 

La actualidad más candente (20)

MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 
MapR 5.2 Product Update
MapR 5.2 Product UpdateMapR 5.2 Product Update
MapR 5.2 Product Update
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
10c introduction
10c introduction10c introduction
10c introduction
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 

Destacado

Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
MapR Technologies
 
Docker - The Linux Container
Docker - The Linux ContainerDocker - The Linux Container
Docker - The Linux Container
Balaji Rajan
 
Data Science
Data ScienceData Science
Data Science
Soft Computing
 

Destacado (11)

Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Docker - The Linux Container
Docker - The Linux ContainerDocker - The Linux Container
Docker - The Linux Container
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
"Workstation Up" - Docker Development at Flow by Mike Roth
"Workstation Up" - Docker Development at Flow by Mike Roth"Workstation Up" - Docker Development at Flow by Mike Roth
"Workstation Up" - Docker Development at Flow by Mike Roth
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
Using Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated ApplicationsUsing Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated Applications
 
Data Science
Data ScienceData Science
Data Science
 

Similar a Dchug m7-30 apr2013

Philly DB MapR M7 - March 2013
Philly DB MapR M7 - March 2013Philly DB MapR M7 - March 2013
Philly DB MapR M7 - March 2013
MapR Technologies
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Boris Yen
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
fnothaft
 

Similar a Dchug m7-30 apr2013 (20)

01 hbase
01 hbase01 hbase
01 hbase
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Philly DB MapR M7 - March 2013
Philly DB MapR M7 - March 2013Philly DB MapR M7 - March 2013
Philly DB MapR M7 - March 2013
 
PhillyDB Hbase and MapR M7 - March 2013
PhillyDB Hbase and MapR M7 - March 2013PhillyDB Hbase and MapR M7 - March 2013
PhillyDB Hbase and MapR M7 - March 2013
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 

Último

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Dchug m7-30 apr2013

  • 1. 1  ©MapR  Technologies     HBase  and  M7  Technical   Overview   Jim  Fiori   Senior  Solu8ons  Architect   MapR  Technologies   April2013  
  • 2. 2  ©MapR  Technologies     §  Background   §  “a  3-­‐hour  tour”   §  Early  Hadoop  fire-­‐fight   §  Big  Data   Who  am  I?  
  • 3. 3  ©MapR  Technologies     Apache  HBase   MapR   M7     Agenda  
  • 4. 4  ©MapR  Technologies     HBase   Google  BigTable  Paper  -­‐  2006   A  sparse,  distributed,  persistent,  indexed,  and   sorted  map   OR   A  NoSQL  database   OR   A  Columnar  data  store    
  • 5. 5  ©MapR  Technologies     Key-­‐Value  Store   §  Row  key   –  Binary  sortable  value   §  Row  content  key  (analogous  to  a  column)   –  Column  family  (string)   –  Column  qualifier  (binary)   –  Version/8mestamp  (number)   §  A  row  key,  column  family,  column  qualifier,  and  version  uniquely   iden8fies  a  par8cular  cell   –  A  cell  contains  a  single  binary  value  
  • 6. 6  ©MapR  Technologies     A  Row     Value1  Row  Key   Value2   Value3   Value4   ValueN   …    C0      C1      C2      C3      C4        CN   Column   Family   Row  Key   Column   Qualifier   Version   Value2   Column   Family   Row  Key   Column   Qualifier   Version   Value1   Column   Family   Row  Key   Column   Qualifier   Version   ValueN   …
  • 7. 7  ©MapR  Technologies     §  Weakly  typed  and  schema-­‐less  (unstructured  or  perhaps  semi-­‐ structured)   –  Almost  everything  is  binary   §  No  constraints   –  You  can  put  any  binary  value  in  any  cell   –  You  can  even  put  incompa8ble  types  in  two  different  instances  of  the  same   column  family:column  qualifier   §  Column  (qualifiers)  are  created  implicitly   §  Different  rows  can  have  different  columns   §  No  transac8ons/no  ACID   –  Only  unit  of  atomic  opera8on  is  a  single  row   Not  A  TradiDonal  RDBMS  
  • 8. 8  ©MapR  Technologies     §  APIs  for  querying  (get),  scanning,  and  upda8ng  (put)   –  Operate  on  row  key,  column  family,  qualifier,  version,  and  values   –  Can  par8ally  specify  and  will  retrieve  union  of  results   •  if  just  specify  row  key,  will  get  all  values  for  it  (with  column  family,  qualifier)   –  By  default  only  largest  version  (most  recent  if  8mestamp)    is  returned   •  Specify  row  key  and  column  family  to  get  will  retrieve  all  values  for  that  row  and   column  family   –  Scanning  is  just  get  over  a  range  of  row  keys   §  Version   –  While  defaults  to  a  8mestamp,  any  integer  is  acceptable   API  
  • 9. 9  ©MapR  Technologies     §  Rather  than  storing  table  rows  linearly  on  disk  and  each  row  on   disk  as  a  single  byte  range  with  fixed  size  fields,  store  columns  of   row  separately   –  Very  efficient  storage  for  sparse  data  sets  (NULL  is  free)   –  Compression  works  beker  on  similar  data     –  Fetches  of  only  subsets  of  row  very  efficient  (less  disk  IO)   –  No  fixed  size  on  column  values   –  No  requirement  to  even  define  columns   §  Columns  are  grouped  together  into  column  families   –  Basically  a  file  on  disk   –  A  unit  of  op8miza8on   –  In  Hbase,  adding  column  is  implicit,  adding  column  family  is  explicit   Columnar  
  • 10. 10  ©MapR  Technologies     HBase  Table  Architecture   §  Tables  are  divided  into  key  ranges  (regions)   §  Regions  are  served  by  nodes  (RegionServers)   §  Columns  are  divided  into  access  groups  (columns  families)   CF1   CF2   CF3   CF4   CF5   R1   R2   R3   R4  
  • 11. 11  ©MapR  Technologies     HBase  Architecture  
  • 12. 12  ©MapR  Technologies     §  Data  is  stored  in  sorted  order   –  A  table  contains  rows   –  A  sequence  of  rows  are  grouped  together  into  a  region   •  A  region  consists  of  various  files  related  to  those  rows  and  is  loaded  into  a  region   server   •  Regions  are  stored  in  HDFS  for  high  availability   –  A  single  region  server  manages  mul8ple  regions   •  Region  assignment  can  change  –  load  balancing,  failures,  etc.   §  Clients  connect  to  tables   –  HBase  run8me  transparently  determines  the  region  (based  on  key  ranges)   and  contacts  the  appropriate  region  server   §  At  any  given  8me  exactly  one  region  server  provides  access  to  a   region   –  Master  region  servers  (with  Zookeeper)  manage  that   Storage  Model  Highlights  
  • 13. 13  ©MapR  Technologies     §  Very  scalable   §  Easy  to  add  region  servers   §  Easy  to  move  regions  around   §  Scans  are  efficient   –  Unlike  hashing  based  models   §  Access  via  row  key  is  very  efficient   –  Note:  there  are  no  secondary  indexes   §  No  schema,  can  store  whatever  you  want  when  you  want   §  Strong  consistency   §  Integrated  with  Hadoop   –  Map-­‐Reduce  on  HBase  is  straighoorward   –  HDFS/MapR-­‐FS  provides  data  replica8on   What’s  Great  About  This?  
  • 14. 14  ©MapR  Technologies     §  Data  from  a  region  column  family  is  stored  in  an  HFile   – An  HFile  contains  row  key:column  qualifier:version:value   entries   – Index  at  the  end  into  the  data  –  64KB  “blocks”  by  default   §  Update   – New  value  is  wriken  persistently  to  Write  Ahead  Log  (WAL)   – Cached  in  memory  (MemStore)   – When  memory  fills,  write  out  new  HFile   §  Read   – Checks  in  memory,  then  all  of  the  HFiles   – Read  data  cached  in  memory   §  Delete   – Create  a  tombstone  record  (purged  at  major  compac8on)     Data  Storage  Architecture  
  • 15. 15  ©MapR  Technologies     Apache  HBase  HFile  Structure   64Kbyte  blocks   are  compressed     An  index  into  the   compressed  blocks  is   created  as  a  btree   Key-­‐value   pairs  are   laid  out  in   increasing   order   Each  cell  is  an  individual  key  +  value      -­‐  a  row  repeats  the  key  for  each  column  
  • 16. 16  ©MapR  Technologies     HBase  Region  OperaDon   §  Typical  region  size  is  a  few  GB,  some8mes  even  10G  or  20G   §  RS    holds  data  in  memory  in  a  MemStore  un8l  full,  then   writes  a  new  HFile   –  Logical  view  of  database  constructed  by  layering  these  files,  with  the   latest  on  top     Key  range  represented  by  this  region   newest   oldest  
  • 17. 17  ©MapR  Technologies     HBase  Read  AmplificaDon   §  When  a  get/scan  comes  in,  all  the  files  have  to  be  examined   –  schema-­‐less,  so  where  is  the  column?   –  Done  in-­‐memory  and  does  not  change  what's  on  disk   •  Bloom-­‐filters  do  not  help  in  scans   newest   oldest   With  7  files,  a  1K-­‐record  get()  poten8ally  takes  about  30  seeks,     7  block  fetches  and  decompressions,  from  HDFS.  Even  with  the  index  in  memory   7  seeks  and  7  block  fetches  are  required.  
  • 18. 18  ©MapR  Technologies     HBase  Write  AmplificaDon   §  To  reduce  the  read-­‐amplifica8on,  HBase  merges  the  HFiles   periodically   –  process  called  compac8on   –  runs  automa8cally  when  too  many  files   –  usually  turned  off  due  to  I/O  storms  which  interfere  with  client   access   –  and  kicked-­‐off  manually  on  weekends   Major  compac8on  reads  all  files  and   merges    into  a  single  HFile  
  • 19. 20  ©MapR  Technologies     §  A  persistent  record  of  every  update/insert  in  sequence  order   –  Shared  by  all  regions  on  one  region  server   –  WAL  files  periodically  rolled  to  limit  size  but  older  WALs  s8ll  needed   –  WAL  file  no  longer  needed  once  every  region  with  updates  in  WAL  file  has   flushed  those  from  memory  to  an  HFile   •  Remember  that  more  HFiles  slow  read  path!   §  Must  be  replayed  as  part  of  recovery  process  since  in  memory   updates  are  “lost”   –  This  is  very  expensive  and  delays  bringing  a  region  back  online   WAL  File  
  • 20. 21  ©MapR  Technologies     What’s  Not  So  Good   Reliability   • Complex  coordina8on  between  ZK,  HDFS,  HBase   Master,  and  Region  Server  during  region  movement   • Compac8ons  disrupt  opera8ons   • Very  slow  crash  recovery  because  of   • Coordina8on  complexity   • WAL  log  reading  (one  log/server)   Business  conDnuity   • Many  administra8ve  ac8ons  require  down8me   • Not  well  integrated  into  MapR-­‐FS  mirroring  and   snapshot  func8onality  
  • 21. 22  ©MapR  Technologies     What’s  Not  So  Good   Performance   • Very  long  read/write  path   • Significant  read  and  write  amplifica8on   • Mul8ple  JVMs  in  read/write  path  –  GC  delays!   Manageability   • Compac8ons,  splits  and  merges  must  be  done   manually  (in  reality)   • Lots  of  “well  known”  problems  maintaining  reliable   cluster  –  spliwng,  compac8ons,  region  assignment,  etc.   • Prac8cal  limits  on  number  of  regions/region  server  and   size  of  regions  –  can  make  it  hard  to  fully  u8lize   hardware  
  • 22. 23  ©MapR  Technologies     Region  Assignment  in  Apache  HBase  
  • 23. 24  ©MapR  Technologies     Apache  HBase  on  MapR   Limited  data  management,  data  protec8on  and  disaster  recovery  for  tables.    
  • 24. 25  ©MapR  Technologies     HBase   MapR   M7   Containers     Agenda  
  • 25. 27  ©MapR  Technologies     MapR  DistribuDon  for  Apache  Hadoop   §  Complete  Hadoop  distribu8on     §  Comprehensive  management   suite     §  Industry-­‐standard  interfaces     §  Enterprise-­‐grade   dependability     §  Higher  performance  
  • 26. 28  ©MapR  Technologies     MapR:  The  Enterprise  Grade  DistribuDon  
  • 27. 29  ©MapR  Technologies     One  PlaVorm  for  Big  Data   … Batch   99.999%   HA   Data   Protec8on   Disaster   Recovery   Scalability     &   Performance   Enterprise   Integra8on   Mul8-­‐ tenancy   Map   Reduce   File-­‐Based   Applica8ons   SQL   Database   Search   Stream   Processing   Interac8ve   Real-­‐8me   …   Broad     range  of   applica8ons   Recommenda8on  Engines   Fraud  Detec8on   Billing   Logis8cs   Risk  Modeling   Market  Segmenta8on   Inventory  Forecas8ng  
  • 28. 32  ©MapR  Technologies     The  Cloud  Leaders  Pick  MapR   Google  chose  MapR  to   provide  Hadoop  on  Google   Compute  Engine   Amazon  EMR  is  the  largest   Hadoop  provider  in  revenue   and  #  of  clusters   MinuteSort  Record   1.5  TB  in  60  seconds   2103  nodes  
  • 29. 34  ©MapR  Technologies     MapR  EdiDons   §  Control  System   §  NFS  Access   §  Performance   §  High  Availability   §  Snapshots  &  Mirroring   §  24  X  7  Support   §  Annual  Subscrip8on   §  Control  System   §  NFS  Access   §  Performance   §  Unlimited  Nodes   §  Free     Compute  Engine   Also  Available  through:     §  All  the  Features  of  M5   §  Simplified   Administra8on  for   HBase   §  Increased  Performance   §  Consistent  Low  Latency   §  Unified  Snapshots,   Mirroring  
  • 30. 35  ©MapR  Technologies     Hbase   MapR   M7     Agenda  
  • 31. 37  ©MapR  Technologies     Introducing  MapR  M7   §  An  integrated  system   –  Unified  namespace  for  files  and  tables   –  Built-­‐in  data  management  &  protec8on   –  No  extra  administra8on   §  Architected  for  reliability  and  performance   –  Fewer  layers   –  Single  hop  to  data   –  No  compac8ons,  low  i/o  amplifica8on   –  Seamless  splits,  automa8c  merges   –  Instant  recovery  
  • 32. 38  ©MapR  Technologies     M7:    Remove  Layers,  Simplify   MapR      M7   Take  note!  No  JVM!  
  • 33. 39  ©MapR  Technologies     Binary  CompaDble  with  HBase  APIs   §  HBase  applica8ons  work  "as  is"  with  M7   –  No  need  to  recompile  (binary  compa8ble)   §  Can  run  M7  and  HBase  side-­‐by-­‐side  on  the  same  cluster   –  e.g.,  during  a  migra8on   –  can  access  both  M7  table  and  HBase  table  in  same  program     §  Use  standard  Apache  HBase  CopyTable  tool  to  copy  a  table   from  HBase  to  M7  or  vice-­‐versa     %  hbase  org.apache.hadoop.hbase.mapreduce.CopyTable                            -­‐-­‐new.name=/user/srivas/mytable  oldtable  
  • 34. 40  ©MapR  Technologies     M7:    No  Master  and  No  RegionServers   No  extra  daemons  to  manage   One  hop  to  data   Unified  cache   No  JVM  problems  
  • 35. 41  ©MapR  Technologies     Region  Assignment  in  Apache  HBase   None  of  this  complexity  is  present  in  MapR  M7  
  • 36. 42  ©MapR  Technologies     Unified  Namespace  for  Files  and  Tables   $  pwd   /mapr/default/user/dave     $  ls   file1    file2    table1    table2     $  hbase  shell   hbase(main):003:0>  create  '/user/dave/table3',  'cf1',  'cf2',  'cf3'   0  row(s)  in  0.1570  seconds     $  ls   file1    file2    table1    table2    table3     $  hadoop  fs  -­‐ls  /user/dave   Found  5  items   -­‐rw-­‐r-­‐-­‐r-­‐-­‐      3  mapr  mapr                  16  2012-­‐09-­‐28  08:34  /user/dave/file1   -­‐rw-­‐r-­‐-­‐r-­‐-­‐      3  mapr  mapr                  22  2012-­‐09-­‐28  08:34  /user/dave/file2   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:32  /user/dave/table1   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:33  /user/dave/table2   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:38  /user/dave/table3  
  • 37. 43  ©MapR  Technologies     Tables  for  End  Users   §  Users  can  create  and  manage  their  own  tables   –  Unlimited  #  of  tables     §  Tables  can  be  created  in  any  directory   –  Tables  count  towards  volume  and  user  quotas   §  No  admin  interven8on  needed   –  I  can  create  a  file  or  a  directory  without  opening  a  8cket  with   admin  team,  why  not  a  table?   –  Do  stuff  on  the  fly,    no  stop/restart  servers   §  Automa8c  data  protec8on  and  disaster  recovery   –  Users  can  recover  from  snapshots/mirrors  on  their  own  
  • 38. 44  ©MapR  Technologies     M7  –  An  Integrated  System  
  • 39. 45  ©MapR  Technologies     M7   Compara8ve  Analysis  with    Apache  HBase,  Level-­‐DB  and  a  BTree  
  • 40. 46  ©MapR  Technologies     HBase  Write  AmplificaDon  Analysis   §  Assume  10G  per  region,  write  10%  per  day,  grow  10%  per  week   –  1G  of  writes   –  a~er  7  days,  7  files  of  1G  and  1file  of  10G  (only  1G  is  growth)   §  IO  Cost   –  Wrote  7G  to  WAL  +  7G  to  HFiles   –  Compac8on  adds  s8ll  more   •  read:  17G    (=  7  x  1G    +  1  x  10G)   •  write:    11G  write  to  new  Hfile   –  WAF  –  wrote  7G  “for  real”  but  actual  disk  IO  a~er  compac8on  is  read   17G  +  write  25G  and  that’s  assuming  no  applica8on  reads!   §  IO  Cost  of  1000  regions  similar  to  above   –  read  17T,    write  25T    è  major  impact  on  node   §  Best  prac8ce,  limit  #  of  regions/node  à  can’t  fully  u8lize   storage  
  • 41. 47  ©MapR  Technologies     AlternaDve:  Level-­‐DB   §  Tiered,  logarithmic  increase   –  L1:  2  x  1M    files   –  L2:    10  x  1M   –  L3:    100  x  1M   –  L4:      1,000  x  1M,  etc   §  Compac8on  overhead   –  avoids  IO  storms    (i/o  done  in  smaller  increments  of    ~10M)   –  but  significantly  extra  bandwidth  compared  to  HBase   §  Read  overhead  is  s8ll  high   –  10-­‐15  seeks,  perhaps  more  if  the  lowest  level  is  very  large   –  40K  -­‐  60K    read  from  disk  to  retrieve  a  1K  record  
  • 42. 48  ©MapR  Technologies     BTree  analysis   §  Read  finds  data  directly,  proven  to  be  fastest   –  interior  nodes  only  hold  keys   –  very  large  branching  factor   –  values  only  at  leaves   –  thus  index  caches  work   –  R  =  logN  seeks,  if  no  caching   –  1K  record  read  will  transfer  about  logN  blocks  from  disk   §  Writes  are  slow  on  inserts   –  inserted  into  correct  place  right  away   –  otherwise  read  will  not  find  it   –  requires  btree  to  be  con8nuously  rebalanced   –  causes  extreme  random  i/o  in  insert  path   –  W  =  2.5x  +  logN  seeks  if  no  caching  
  • 43. 49  ©MapR  Technologies     Log-­‐Structured  Merge  Trees   §  LSM  Trees  reduce  insert  cost  by  deferring  and  batching  index  changes   –  If  don't  compact  o~en,  read  perf  is  impacted   –  If  compact  too  o~en,  write  perf  is  impacted     §  B-­‐Trees  are  great  for  reads   –  but  expensive  to  update  in  real-­‐8me     Index Log Index Memory Disk Write Read Can  we  combine  both  ideas?     Writes  cannot  be  done  beker  than  W  =  2.5x   write  to  log    +    write  data  to  somewhere    +    update  meta-­‐data    
  • 44. 50  ©MapR  Technologies     M7  from  MapR   §  Twis8ng  BTree's   –  leaves  are  variable  size  (8K  -­‐  8M  or  larger)   –  can  stay  unbalanced  for  long  periods  of  8me   •  more  inserts  will  balance  it  eventually   •  automa8cally  throkles  updates  to  interior  btree  nodes   –  M7  inserts  "close  to"  where  the  data  is  supposed  to  go   §  Reads   –  Uses  BTree  structure  to  get  "close"  very  fast   •  very  high  branching  with  key-­‐prefix-­‐compression   –  U8lizes  a  separate  lower-­‐level  index  to  find  it  exactly   •  updated  "in-­‐place”  bloom-­‐filters  for  gets,  range-­‐maps  for  scans     §  Overhead   –  1K  record  read  will  transfer  about  32K  from  disk  in  logN  seeks  
  • 45. 51  ©MapR  Technologies     M7    provides  Instant  Recovery   §  Instead  of  having  one  WAL/region  server  or  even  one/region,   we  have  many  micro-­‐WALs/region   §  0-­‐40  microWALs  per  region   –  idle  WALs  “compacted”,  so  most  are  empty   –  region  is  up  before  all  microWALs  are  recovered   –  recovers  region  in  background  in  parallel   –  when  a  key  is  accessed,  that  microWAL  is  recovered  inline   –  1000-­‐10000x  faster  recovery   §  Never  perform  equivalent  of  HBase  major  or  minor   compac8on   §  Why  doesn't  HBase  do  this?  M7  uses  MapR-­‐FS,  not  HDFS   –  No  limit  to  #  of  files  on  disk   –  No  limit  to  #  open  files   –  I/O  path  translates  random  writes  to  sequen8al  writes  on  disk  
  • 46. 53  ©MapR  Technologies     M7:    Fileservers  Serve  Regions   §  Region  lives  en8rely  inside  a  container   –  Does  not  coordinate  through  ZooKeeper     §  Containers  support  distributed  transac8ons   –  with  replica8on  built-­‐in   §  Only  coordina8on  in  the  system  is  for  splits   –  Between  region-­‐map  and  data-­‐container   –  already  solved  this  problem  for  files  and  its  chunks    
  • 47. 57  ©MapR  Technologies     M7  Containers   §  Container  holds  many  files   – regular,  dir,  symlink,  btree,  chunk-­‐map,  region-­‐map,  …   – all  random-­‐write  capable   §  Container  is  replicated  to  servers   – unit  of  resynchroniza8on   §  Region  lives  en8rely  inside  1  container   – all  files  +  WALs  +  btree's  +  bloom-­‐filters  +  range-­‐maps  
  • 48. 63  ©MapR  Technologies     Other  M7  Features   §  Smaller  disk  footprint   – M7  never  repeats  the  key  or  column  name   §  Columnar  layout   – M7  supports  64  column  families   – in-­‐memory  column-­‐families   §  Online  admin   – M7  schema  changes  on  the  fly   – delete/rename/redistribute  tables       §  Run  MapReduce  and  tables  on  same  cluster   §  UI:  hbase  shell,  MCS  GUI,  maprcli  
  • 49. 64  ©MapR  Technologies     Thank  you!     QuesDons?