SlideShare a Scribd company logo
1 of 55
Download to read offline
1
Hadoop	
  is	
  dead,	
  long	
  live	
  Hadoop!	
  
Lars	
  George	
  	
  |	
  	
  EMEA	
  Chief	
  Architect	
  
@larsgeorge	
  
A	
  Eulogy	
  and	
  ProclamaAon	
  
What	
  the	
  Press	
  Says…	
  
2
Source:	
  hFp://blogs.the451group.com/informaAon_management/2012/07/09/hadoop-­‐is-­‐dead-­‐long-­‐live-­‐hadoop/	
  
3
Big	
  Data…	
  WTH?	
  
A	
  brief	
  reasoning	
  for	
  Hadoop’s	
  existence.	
  
4
—	
  Bubble	
  Buddy,	
  Head	
  of	
  IT	
  
Big	
  Data	
  –	
  A	
  Misnomer	
  
•  Misleading	
  to	
  quick	
  assumpAons	
  
•  Current	
  challenges	
  are	
  driven	
  by	
  many	
  things,	
  not	
  just	
  the	
  
size	
  of	
  data	
  
•  ANY	
  company	
  can	
  use	
  the	
  Big	
  Data	
  principles	
  to	
  
improve	
  specific	
  business	
  metrics	
  
•  Increased	
  data	
  retenAon	
  
•  Access	
  to	
  all	
  the	
  data	
  
•  Machine	
  learning	
  for	
  paFern	
  detecAon,	
  recommendaAons	
  
•  But	
  what	
  has	
  happened	
  to	
  cause	
  this	
  all?	
  
5
Explosive	
  Data	
  Growth	
  
6
10,000	
  
2005	
   2015	
  2010	
  
5,000	
  
0	
  
1.8 trillion gigabytes of	
  data	
  was	
  
created	
  in	
  2011…	
  
§  More	
  than	
  90%	
  is	
  unstructured	
  data	
  
§  Approx.	
  500	
  quadrillion	
  files	
  
§  QuanAty	
  doubles	
  every	
  2	
  years	
  
STRUCTURED	
  DATA	
   UNSTRUCTURED	
  DATA	
  
GIGABYTES	
  OF	
  DATA	
  CREATED	
  (IN	
  BILLIONS)	
  
Source:	
  IDC	
  2011	
  
The	
  ‘Big	
  Data’	
  Phenomenon	
  
7
Big	
  Data	
  Drivers:	
  
§  The	
  proliferaAon	
  of	
  data	
  capture	
  
and	
  creaAon	
  technologies	
  
§  Increased	
  “interconnectedness”	
  
drives	
  consumpAon	
  (creaAng	
  more	
  
data)	
  
§  Inexpensive	
  storage	
  makes	
  it	
  
possible	
  to	
  keep	
  more,	
  longer	
  
§  InnovaAve	
  somware	
  and	
  analysis	
  
tools	
  turn	
  data	
  into	
  informaAon	
  
Big	
  Data	
  encompasses	
  not	
  only	
  
the	
  content itself,	
  but	
  how
it’s consumed.	
  
More Devices
More
Consumption
More Content
New & Better
Information
§  Every	
  gigabyte	
  of	
  stored	
  content	
  can	
  generate	
  a	
  
petabyte	
  or	
  more	
  of	
  transient	
  data*	
  
§  The	
  informaAon	
  about	
  you	
  is	
  much	
  greater	
  than	
  
the	
  informaAon	
  you	
  create	
  
*Source:	
  IDC	
  2011	
  
The	
  Current	
  SoluAons	
  
8
10,000	
  
2005	
   2015	
  2010	
  
5,000	
  
0	
  
Current Database Solutions are	
  
designed	
  for	
  structured	
  data.	
  
§  OpAmized	
  to	
  answer	
  known	
  quesPons	
  quickly	
  
§  Schemas	
  dictate	
  form/context	
  
§  Difficult	
  to	
  adapt	
  to	
  new	
  data	
  types	
  and	
  new	
  
quesAons	
  
§  Expensive	
  at	
  Petabyte	
  scale	
  
STRUCTURED	
  DATA	
   UNSTRUCTURED	
  DATA	
  
GIGABYTES	
  OF	
  DATA	
  CREATED	
  (IN	
  BILLIONS)	
  
10%
Data	
  Management	
  Strategies	
  
Have	
  Stayed	
  the	
  Same	
  
	
  
•  Raw	
  data	
  on	
  SAN,	
  NAS	
  
and	
  tape	
  
	
  
•  Data	
  moved	
  from	
  
storage	
  to	
  compute	
  
	
  
•  RelaAonal	
  models	
  with	
  
predesigned	
  schemas	
  
Too	
  Much	
  Data,	
  Too	
  Many	
  Sources	
  
•  Can’t	
  ingest	
  fast	
  enough	
  
Too	
  Much	
  Data,	
  Too	
  Many	
  Sources	
  
$
!
$ $
$
•  Can’t	
  ingest	
  fast	
  enough	
  
	
  
•  Costs	
  too	
  much	
  to	
  store	
  
Too	
  Much	
  Data,	
  Too	
  Many	
  Sources	
  
1
2 3 4
5
•  Can’t	
  ingest	
  fast	
  enough	
  
	
  
•  Costs	
  too	
  much	
  to	
  store	
  
	
  
•  Exists	
  in	
  different	
  places	
  
Too	
  Much	
  Data,	
  Too	
  Many	
  Sources	
  
•  Can’t	
  ingest	
  fast	
  enough	
  
	
  
•  Costs	
  too	
  much	
  to	
  store	
  
	
  
•  Exists	
  in	
  different	
  places	
  
	
  
•  Archived	
  data	
  is	
  lost	
  
Can’t	
  Use	
  It	
  The	
  Way	
  You	
  Want	
  To	
  
•  Analysis	
  and	
  processing	
  
takes	
  too	
  long	
  
Can’t	
  Use	
  It	
  The	
  Way	
  You	
  Want	
  To	
  
1
2 3 4
5
•  Analysis	
  and	
  processing	
  
takes	
  too	
  long	
  
	
  
•  Data	
  exists	
  in	
  silos	
  
Can’t	
  Use	
  It	
  The	
  Way	
  You	
  Want	
  To	
  
? ? ?
•  Analysis	
  and	
  processing	
  
takes	
  too	
  long	
  
	
  
•  Data	
  exists	
  in	
  silos	
  
	
  
•  Can’t	
  ask	
  new	
  quesAons	
  
Can’t	
  Use	
  It	
  The	
  Way	
  You	
  Want	
  To	
  
•  Analysis	
  and	
  processing	
  
takes	
  too	
  long	
  
	
  
•  Data	
  exists	
  in	
  silos	
  
	
  
•  Can’t	
  ask	
  new	
  quesAons	
  
	
  
•  Can’t	
  analyze	
  
unstructured	
  data	
  
The	
  Big	
  Data	
  Challenge	
  
18
VOLUME
VARIETY
VELOCITY
DEMANDS	
  A	
  
NEW	
  APPROACH	
  
Big	
  Data	
  Contains	
  Limitless	
  Insights…	
  
BUT	
  
WEB	
  LOGS	
  
SOCIAL	
  
MEDIA	
  
TRANSACTIONAL	
  
DATA	
  
SMART	
  
GRIDS	
  
OPERATIONAL	
  DATA	
  
DIGITAL	
  
CONTENT	
  
R&D	
  DATA	
  
AD	
  IMPRESSIONS	
  
FILES	
  
Big	
  Data	
  Challenges	
  
19
Cost-­‐effecAvely	
  managing	
  the	
  volume, velocity and
variety of	
  data	
  
Deriving	
  value	
  across	
  
structured and unstructured data	
  
AdapAng	
  to	
  context changes and integraAng
new data sources and types
Big	
  Data	
  SoluAon	
  Requirements	
  
20
Cost-effectively manage
the	
  volume,	
  variety	
  and	
  velocity	
  of	
  data	
  
Process and analyze
large,	
  complex	
  data	
  sets…quickly	
  
Flexibly adapt
to	
  context	
  changes	
  and	
  new	
  data	
  types	
  
21
Google’s	
  Approach	
  to	
  Big	
  Data	
  
Hadoop’s	
  Pedigree	
  	
  
A	
  Timeline	
  View	
  #1	
  
22
Google	
  File	
  System	
  
•  FoundaAon	
  of	
  scalable,	
  fail-­‐safe,	
  self-­‐healing	
  storage	
  
•  One	
  central	
  place	
  of	
  truth	
  
•  Cost-­‐effecAve	
  hardware	
  finally	
  available	
  
•  19”	
  Rack	
  servers	
  with	
  decent	
  amount	
  of	
  disk	
  space	
  
•  Handling	
  of	
  failures	
  built	
  in	
  
•  Components	
  or	
  enAre	
  servers	
  
•  At	
  scale	
  there	
  are	
  always	
  hardware	
  faults	
  	
  
•  Simple	
  file	
  system	
  interface	
  
•  Finally	
  no	
  need	
  for	
  expensive,	
  proprietary	
  systems	
  
23
Storage	
  
MapReduce	
  
•  First	
  take	
  on	
  distributed	
  data	
  processing	
  framework	
  
•  Same	
  concepts	
  as	
  Google	
  File	
  System,	
  i.e.	
  
•  Fail-­‐safe	
  and	
  scalable	
  
•  Handles	
  a	
  wide	
  range	
  of	
  data	
  processing	
  problems	
  
•  BUT	
  not	
  all	
  of	
  them	
  (more	
  later)	
  
•  Simple	
  API	
  reading	
  and	
  wriAng	
  Key/Value	
  pairs	
  
•  Framework	
  handles	
  heavy	
  task	
  of	
  data	
  movement	
  
•  Core	
  concept	
  is	
  data	
  locality,	
  heavy	
  I/O	
  
•  Brings	
  code	
  to	
  data,	
  not	
  the	
  opposite	
  (i.e.	
  no	
  HPC)	
  
•  Accessible	
  in	
  many	
  programming	
  languages	
  
24
Processing	
  
BigTable	
  
•  Adds	
  database	
  like	
  random	
  access	
  to	
  data	
  
•  EffecAvely	
  a	
  Key/Value	
  store	
  with	
  table	
  semanAcs	
  
•  Used	
  for	
  small	
  data	
  points	
  
•  Usually	
  less	
  than	
  a	
  megabyte	
  per	
  Key/Value	
  
•  Forfeits	
  advanced	
  concepts	
  for	
  ease	
  of	
  scalability	
  
•  No	
  transacAons,	
  no	
  query	
  language	
  
•  Powers	
  many	
  applicaAons	
  at	
  Google	
  
•  Uses	
  Google	
  File	
  System	
  as	
  storage	
  layer	
  
•  Tight	
  integraAon	
  with	
  MapReduce	
  for	
  batch	
  
processing	
  
25
Random	
  Access	
  
Dremel,	
  Tenzing,	
  Pregel	
  
•  Dremel	
  adds	
  specific	
  file	
  format	
  and	
  query	
  language	
  
•  Used	
  for	
  highly	
  selecAve	
  queries,	
  data	
  exploraAon	
  
•  File	
  layout	
  is	
  opAmized	
  for	
  very	
  effecAve	
  scanning	
  
•  Runs	
  alongside	
  of	
  MapReduce	
  and	
  File	
  System	
  	
  
•  Tenzing	
  adds	
  SQL	
  over	
  various	
  data	
  sources	
  
•  Can	
  query	
  raw	
  files,	
  Dremel	
  files,	
  or	
  BigTable	
  data	
  etc.	
  
•  Brings	
  “known”	
  paradigm	
  to	
  stored	
  data	
  
•  Pregel	
  adds	
  graph	
  processing	
  API	
  
26
Query	
  API	
  
Percolator,	
  Megastore	
  
•  AddiAons	
  to	
  BigTable	
  to	
  add	
  “missing”	
  features	
  
•  Percolator	
  is	
  using	
  BigTable	
  to	
  update	
  search	
  index	
  
incrementally,	
  needs	
  transacAons	
  
•  Distributes	
  updates	
  with	
  mulA-­‐phase	
  commits	
  
•  Megastore	
  drives	
  Google	
  App	
  Engine	
  to	
  also	
  add	
  
transacAons	
  for	
  user	
  API	
  
•  Uses	
  ranges	
  of	
  rows	
  as	
  en#ty	
  groups	
  
•  Reduces	
  locking	
  to	
  small	
  subsets	
  
•  OpAmisAc,	
  roll-­‐forward	
  only	
  transacAons	
  
•  Java	
  layer	
  over	
  BigTable	
  API	
  
27
TransacAons	
  
Spanner,	
  F1	
  
•  Future	
  of	
  Google’s	
  distributed	
  storage	
  and	
  
processing	
  system	
  
•  Spanner	
  is	
  a	
  scalable,	
  mulA-­‐version,	
  globally-­‐	
  
distributed,	
  and	
  synchronously-­‐replicated	
  database	
  
•  Replicates	
  across	
  datacenters	
  
•  Uses	
  TrueTime	
  (atomic	
  clocks)	
  for	
  synchronizaAon	
  
•  Uses	
  Colossus	
  for	
  storage	
  (a	
  GFS	
  successor)	
  
•  F1	
  replaced	
  MySQL	
  for	
  AdWords	
  service	
  
•  SQL	
  over	
  data	
  stored	
  in	
  Spanner	
  
•  Colocated	
  with	
  Spanner	
  processes	
  
28
World-­‐Wide	
  Data	
  
29
The	
  Hadoop	
  Story	
  
A	
  Eulogy	
  
What	
  is	
  Apache	
  Hadoop?	
  
30
Has	
  the	
  Flexibility	
  to	
  Store	
  and	
  
Mine	
  Any	
  Type	
  of	
  Data	
  
	
  
§  Ask	
  quesAons	
  across	
  structured	
  and	
  
unstructured	
  data	
  that	
  were	
  previously	
  
impossible	
  to	
  ask	
  or	
  solve	
  
§  Not	
  bound	
  by	
  a	
  single	
  schema	
  
Excels	
  at	
  
Processing	
  Complex	
  Data	
  
	
  
§  Scale-­‐out	
  architecture	
  divides	
  workloads	
  
across	
  mulAple	
  nodes	
  
§  Flexible	
  file	
  system	
  eliminates	
  ETL	
  
boFlenecks	
  
Scales	
  
Economically	
  
	
  
§  Can	
  be	
  deployed	
  on	
  commodity	
  
hardware	
  
§  Open	
  source	
  plavorm	
  guards	
  against	
  
vendor	
  lock	
  
Hadoop	
  Distributed	
  
File	
  System	
  (HDFS)	
  
	
  
Self-­‐Healing,	
  High	
  
Bandwidth	
  Clustered	
  
Storage	
  
	
  
	
  
MapReduce/YARN	
  
	
  
Distributed	
  CompuAng	
  
Framework	
  
Apache Hadoop	
  is	
  an	
  open	
  source	
  
plavorm	
  for	
  data	
  storage	
  and	
  processing	
  
that	
  is…	
  
ü  Scalable	
  
ü  Fault	
  tolerant	
  
ü  Distributed	
  
CORE	
  HADOOP	
  SYSTEM	
  COMPONENTS	
  
Core	
  Hadoop:	
  HDFS	
  
31
Self-healing, high bandwidth
1
2
3
4
5
2
4
5
HDFS
1
2
5
1
3
4
2
3
5
1
3
4
HDFS	
  breaks	
  incoming	
  files	
  into	
  blocks	
  and	
  stores	
  them	
  redundantly	
  across	
  the	
  cluster.	
  
Core	
  Hadoop:	
  MapReduce	
  
32
framework.
1
2
3
4
5
2
4
5
MR
1
2
5
1
3
4
2
3
5
1
3
4
Processes	
  large	
  jobs	
  in	
  parallel	
  across	
  many	
  nodes	
  and	
  combines	
  the	
  results.	
  
Why	
  Hadoop	
  Was	
  Created	
  
33
New opportunities to	
  derive	
  value	
  from	
  	
  all	
  your	
  data.	
  	
  
Exploding	
  Data	
  Volumes	
  
&	
  Types	
  
Driving	
  The	
  Need	
  For	
  A	
  Flexible,	
  
Scalable	
  SoluPon	
  
It’s difficult to handle data this diverse, at this scale.
Traditional platforms can’t keep pace.
WEB	
  
LOGS	
  
SOCIAL	
  
MEDIA	
  
TRANSACTIONAL	
  
DATA	
  
SMART	
  
GRIDS	
  
OPERATIONAL	
  
DATA	
  
DIGITAL	
  
CONTENT	
  
R&D	
  
DATA	
  
AD	
  IMPRESSIONS	
  
FILES	
  
•  Any	
  Kind	
  
•  From	
  Any	
  Source	
  
•  Structured	
  &	
  Unstructured	
  
•  At	
  Scale	
  
•  Deep	
  Analysis	
  
•  ExhausAve	
  &	
  Detailed	
  
•  SophisAcated	
  Algorithms	
  
•  Generate	
  Results	
  Quickly	
  
•  Extract More Value
•  From More Data
•  More Cost Effectively
•  With Greater Flexibility
BIG	
  DATA	
  
HARD	
  
PROBLEMS	
  
NEW
OPPORTUNITIES
The	
  Core	
  Values	
  of	
  Hadoop	
  
34
A platform for
§  Designed to store and
process data at petabyte
scale
§  Scale-out architecture
increases capacity and
processing power linearly
§  Perform operations in
parallel across the entire
cluster
§  Store data in any format –
free from rigid schemas
§  Define context at the time
you ask the question
§  Process and analyze data
using virtually any
programming language
§  Build out your cluster on
your hardware of choice
§  Open source software
guards against vendor
lock-in
§  Wide integration ensures
investment protection
1 2 3
Hadoop	
  In	
  PracAce	
  
35
36	
  
Cloudera	
  Soaware	
  Stack	
  
Turnkey	
  soluAon	
  for	
  Big	
  Data	
  and	
  Advanced	
  AnalyAcs	
  use-­‐cases	
  
	
  	
  
CDH	
  
100%	
  OPEN	
  SOURCE	
  
HADOOP	
  DISTRIBUTION	
  
CLOUDERA	
  MANAGER	
  
END-­‐TO-­‐END	
  SYSTEM	
  MANAGEMENT	
  
CORE	
  PROJECTS	
   PREMIUM	
  PROJECTS	
   CONNECTORS	
  
HDFS	
   MAPREDUCE	
   FLUME	
   HCATALOG	
  
MICROSTRATEGY	
  
NETEZZA	
  
ORACLE	
  
QLIKVIEW	
  
TABLEAU	
  
TERADATA	
  
HIVE	
   HUE	
   MAHOUT	
   OOZIE	
  
PIG	
   SQOOP	
   WHIRR	
   ZOOKEEPER	
  
HBASE	
  
IMPALA	
  
SEARCH	
  (BETA)	
  
DEPLOYMENT	
   MONITORING	
   API	
   SNMP	
   CONFIG	
  ROLLBACKS	
   PHONE	
  HOME	
  
SERVICE	
  MGMT	
   DIAGNOSTICS	
   ROLLING	
  UPGRADES	
   LDAP	
   REPORTING	
   BACKUP/DR	
  
CLOUDERA	
  SUPPORT	
  
BEST-­‐IN-­‐CLASS	
  TECHNICAL	
  SUPPORT,	
  
COMMUNICTY	
  ADVOCACY	
  &	
  
INDEMNIFICATION	
  
CLOUDERA	
  NAVIGATOR	
  
END-­‐TO-­‐END	
  DATA	
  MANAGEMENT	
  
ACCESS	
  MGMT	
   DATA	
  AUDIT	
  
CORE	
  HADOOP	
  
PROJECTS	
  
CLOUDERA	
  
MANAGER	
  
CLOUDERA	
  
NAVIGATOR	
  
HBASE	
   IMPALA	
  
37
Spin	
  some	
  YARN!	
  
Reborn	
  again!	
  
Back	
  to	
  the	
  Press	
  again…	
  
38
Source:	
  hFp://gigaom.com/2012/07/07/why-­‐the-­‐days-­‐are-­‐numbered-­‐for-­‐hadoop-­‐as-­‐we-­‐know-­‐it/	
  
A	
  Timeline	
  View	
  #2	
  
39
First:	
  What	
  is	
  MapReduce	
  1?	
  
40
MoAvaAons	
  to	
  Change	
  MR1	
  
41
•  Scaling	
  >4000	
  nodes	
  
•  Fewer,	
  larger	
  clusters	
  
•  No	
  single	
  source	
  of	
  truth,	
  data	
  in	
  “silos”	
  again	
  
•  HA	
  of	
  Job	
  Tracker	
  difficult	
  
•  Large,	
  complex	
  state	
  
•  Poor	
  resource	
  uAlizaAon	
  
•  Slots	
  in	
  MR1	
  are	
  for	
  either	
  map	
  or	
  reduce	
  
YARN:	
  Yet	
  Another	
  Resource	
  NegoAator	
  
42
Split	
  of	
  ResponsibiliAes	
  
43
Job	
  Tracker	
  
Resource	
  
Manager	
  
ApplicaAon	
  
Master	
  
split	
  
•  One	
  per	
  Cluster	
  
•  Long-­‐lived	
  
•  App-­‐level	
  
•  One	
  per	
  app	
  instance	
  
•  Short-­‐lived	
  
•  Task-­‐level	
  scheduling	
  
and	
  monitoring	
  
Fine-­‐grained	
  Resource	
  Control	
  
•  Node	
  Manager	
  is	
  a	
  generalized	
  Task	
  Tracker	
  
•  Task	
  Tracker	
  
•  Fixed	
  number	
  of	
  map	
  and	
  reduce	
  slots	
  
•  Node	
  Manager	
  
•  Containers	
  with	
  variable	
  resource	
  limits	
  
44
Node	
  Manager:	
  Containers	
  
45
YARN	
  +	
  MapReduce	
  2	
  
46
•  YARN	
  “runs”	
  MapReduce	
  as	
  an	
  applicaAon	
  
•  MR	
  is	
  user	
  space	
  
•  YARN	
  is	
  kernel	
  
YARN	
  ApplicaAons	
  
•  Distributed	
  shell	
  
•  Open	
  MPI	
  
•  Master-­‐worker	
  
•  Apache	
  Giraph,	
  Hama	
  
•  Spark	
  
47
48
Summary	
  
What	
  the	
  future	
  may	
  hold	
  
Enterprise	
  Data	
  EvoluAon	
  
RDBMS/EDW
HADOOP-OPTIMIZED
INFRASTRUCTURE
AMOUNTOFDATA
BUSINESS IMPACT
NEXT-GEN DATA
COMPUTING PLATFORM
DATA-DRIVEN
ORGANIZATION
AMOUNT	
  OF	
  DATA	
  
•	
  Data	
  collecAon	
  &	
  reporAng	
  
•	
  Process	
  data	
  faster	
  
•	
  Store	
  data	
  more	
  cost-­‐effecAvely	
  
•	
  Simplify	
  infrastructure	
  
•	
  Combine	
  data	
  from	
  across	
  the	
  business	
  
•	
  Ask	
  new	
  quesAons	
  immediately	
  
•	
  Enable	
  new	
  real-­‐Ame	
  applicaAons	
  
	
  
	
  
1980s	
   2000s	
   2010s	
  
CREATE	
  
COMPETITIVE	
  ADVANTAGE	
  
IMPROVE	
  
OPERATIONAL	
  EFFICIENCY	
  
Playing	
  Catchup	
  
•  Improve	
  overall	
  performance	
  
•  Google’s	
  code	
  is	
  kernel	
  module,	
  C++,	
  as	
  low	
  as	
  possible	
  
•  Hadoop	
  is	
  Java,	
  for	
  ease	
  of	
  development	
  in	
  open-­‐source	
  
•  Maybe	
  rewrite	
  parts	
  of	
  the	
  stack?	
  
•  Overall	
  goal:	
  saturate	
  machine	
  specs	
  (I/O,	
  CPU,	
  RAM)	
  
•  Add	
  missing	
  features	
  
•  Everything	
  is	
  based	
  on	
  “hearsay”,	
  aka	
  research	
  papers	
  and	
  
presentaAons	
  
•  Add	
  what	
  is	
  necessary	
  or	
  for	
  the	
  sake	
  of	
  it?	
  
50
Further	
  Extend	
  or	
  Invent?	
  
•  YARN	
  is	
  a	
  good	
  example	
  for	
  what	
  can	
  be	
  done	
  
•  Look	
  at	
  every	
  component	
  and	
  evaluate	
  
•  Work	
  with	
  research	
  and	
  universiAes,	
  companies	
  to	
  
drive	
  new	
  development	
  
•  What	
  else	
  can	
  be	
  done	
  with	
  all	
  that	
  data?	
  
51
52
—	
  Jim	
  Gray,	
  Computer	
  ScienAst	
  
From	
  Framework	
  to	
  Plavorm	
  to	
  Commodity	
  
•  Hadoop	
  distribuAons	
  are	
  already	
  a	
  commodity	
  
•  Move	
  up	
  the	
  stack	
  to	
  reach	
  commercial	
  space	
  
•  Simplify	
  data	
  processing	
  
•  ConAnuuity	
  
•  WibiData	
  (Kiji)	
  
•  Cloudera	
  CDK	
  
•  Pure	
  Hadoop	
  SoluAons	
  
•  DataMeer	
  
•  Plavora	
  
53
Hadoop…	
  live	
  long	
  and	
  prosper!	
  
54
 Lars	
  George,	
  EMEA	
  Chief	
  Architect,	
  Cloudera 	
   	
   	
   	
   	
   	
  @larsgeorge	
  
Thank	
  you!	
  

More Related Content

What's hot

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL ServerPeter Gfader
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignKent Graziano
 
Modernizing Data Management Through Metadata
Modernizing Data Management Through MetadataModernizing Data Management Through Metadata
Modernizing Data Management Through MetadataMANTA
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsMatei Zaharia
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017Rittman Analytics
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Simplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaSimplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaDatabricks
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 

What's hot (20)

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse Design
 
Modernizing Data Management Through Metadata
Modernizing Data Management Through MetadataModernizing Data Management Through Metadata
Modernizing Data Management Through Metadata
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Simplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaSimplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks Delta
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Data lake
Data lakeData lake
Data lake
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 

Viewers also liked

Ysance conference - cloud computing - aws - 3 mai 2010
Ysance   conference - cloud computing - aws - 3 mai 2010Ysance   conference - cloud computing - aws - 3 mai 2010
Ysance conference - cloud computing - aws - 3 mai 2010Ysance
 
Social Networks and the Richness of Data
Social Networks and the Richness of DataSocial Networks and the Richness of Data
Social Networks and the Richness of Datalarsgeorge
 
Introduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuéeIntroduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuéeKhanh Maudoux
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012larsgeorge
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014larsgeorge
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Noteslarsgeorge
 
Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Sciencelarsgeorge
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBaseSalesforce Developers
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013larsgeorge
 

Viewers also liked (20)

Ysance conference - cloud computing - aws - 3 mai 2010
Ysance   conference - cloud computing - aws - 3 mai 2010Ysance   conference - cloud computing - aws - 3 mai 2010
Ysance conference - cloud computing - aws - 3 mai 2010
 
Social Networks and the Richness of Data
Social Networks and the Richness of DataSocial Networks and the Richness of Data
Social Networks and the Richness of Data
 
Hadoop unit
Hadoop unitHadoop unit
Hadoop unit
 
Introduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuéeIntroduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuée
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
 
Présentation Club STORM
Présentation Club STORMPrésentation Club STORM
Présentation Club STORM
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Science
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBase
 
Tech day hadoop, Spark
Tech day hadoop, SparkTech day hadoop, Spark
Tech day hadoop, Spark
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
Soutenance ysance
Soutenance ysanceSoutenance ysance
Soutenance ysance
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
 

Similar to Hadoop is dead, long live Hadoop! The rise of Big Data and the solution of Hadoop

The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersZohar Elkayam
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data ScienceNiko Vuokko
 
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsBig Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsDavid Bennett
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventBig Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventThe Hive
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Anthony Potappel
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeSaurabh K. Gupta
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 

Similar to Hadoop is dead, long live Hadoop! The rise of Big Data and the solution of Hadoop (20)

Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data Science
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsBig Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate Systems
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventBig Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data
Big dataBig data
Big data
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Hadoop is dead, long live Hadoop! The rise of Big Data and the solution of Hadoop

  • 1. 1 Hadoop  is  dead,  long  live  Hadoop!   Lars  George    |    EMEA  Chief  Architect   @larsgeorge   A  Eulogy  and  ProclamaAon  
  • 2. What  the  Press  Says…   2 Source:  hFp://blogs.the451group.com/informaAon_management/2012/07/09/hadoop-­‐is-­‐dead-­‐long-­‐live-­‐hadoop/  
  • 3. 3 Big  Data…  WTH?   A  brief  reasoning  for  Hadoop’s  existence.  
  • 4. 4 —  Bubble  Buddy,  Head  of  IT  
  • 5. Big  Data  –  A  Misnomer   •  Misleading  to  quick  assumpAons   •  Current  challenges  are  driven  by  many  things,  not  just  the   size  of  data   •  ANY  company  can  use  the  Big  Data  principles  to   improve  specific  business  metrics   •  Increased  data  retenAon   •  Access  to  all  the  data   •  Machine  learning  for  paFern  detecAon,  recommendaAons   •  But  what  has  happened  to  cause  this  all?   5
  • 6. Explosive  Data  Growth   6 10,000   2005   2015  2010   5,000   0   1.8 trillion gigabytes of  data  was   created  in  2011…   §  More  than  90%  is  unstructured  data   §  Approx.  500  quadrillion  files   §  QuanAty  doubles  every  2  years   STRUCTURED  DATA   UNSTRUCTURED  DATA   GIGABYTES  OF  DATA  CREATED  (IN  BILLIONS)   Source:  IDC  2011  
  • 7. The  ‘Big  Data’  Phenomenon   7 Big  Data  Drivers:   §  The  proliferaAon  of  data  capture   and  creaAon  technologies   §  Increased  “interconnectedness”   drives  consumpAon  (creaAng  more   data)   §  Inexpensive  storage  makes  it   possible  to  keep  more,  longer   §  InnovaAve  somware  and  analysis   tools  turn  data  into  informaAon   Big  Data  encompasses  not  only   the  content itself,  but  how it’s consumed.   More Devices More Consumption More Content New & Better Information §  Every  gigabyte  of  stored  content  can  generate  a   petabyte  or  more  of  transient  data*   §  The  informaAon  about  you  is  much  greater  than   the  informaAon  you  create   *Source:  IDC  2011  
  • 8. The  Current  SoluAons   8 10,000   2005   2015  2010   5,000   0   Current Database Solutions are   designed  for  structured  data.   §  OpAmized  to  answer  known  quesPons  quickly   §  Schemas  dictate  form/context   §  Difficult  to  adapt  to  new  data  types  and  new   quesAons   §  Expensive  at  Petabyte  scale   STRUCTURED  DATA   UNSTRUCTURED  DATA   GIGABYTES  OF  DATA  CREATED  (IN  BILLIONS)   10%
  • 9. Data  Management  Strategies   Have  Stayed  the  Same     •  Raw  data  on  SAN,  NAS   and  tape     •  Data  moved  from   storage  to  compute     •  RelaAonal  models  with   predesigned  schemas  
  • 10. Too  Much  Data,  Too  Many  Sources   •  Can’t  ingest  fast  enough  
  • 11. Too  Much  Data,  Too  Many  Sources   $ ! $ $ $ •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store  
  • 12. Too  Much  Data,  Too  Many  Sources   1 2 3 4 5 •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store     •  Exists  in  different  places  
  • 13. Too  Much  Data,  Too  Many  Sources   •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store     •  Exists  in  different  places     •  Archived  data  is  lost  
  • 14. Can’t  Use  It  The  Way  You  Want  To   •  Analysis  and  processing   takes  too  long  
  • 15. Can’t  Use  It  The  Way  You  Want  To   1 2 3 4 5 •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos  
  • 16. Can’t  Use  It  The  Way  You  Want  To   ? ? ? •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos     •  Can’t  ask  new  quesAons  
  • 17. Can’t  Use  It  The  Way  You  Want  To   •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos     •  Can’t  ask  new  quesAons     •  Can’t  analyze   unstructured  data  
  • 18. The  Big  Data  Challenge   18 VOLUME VARIETY VELOCITY DEMANDS  A   NEW  APPROACH   Big  Data  Contains  Limitless  Insights…   BUT   WEB  LOGS   SOCIAL   MEDIA   TRANSACTIONAL   DATA   SMART   GRIDS   OPERATIONAL  DATA   DIGITAL   CONTENT   R&D  DATA   AD  IMPRESSIONS   FILES  
  • 19. Big  Data  Challenges   19 Cost-­‐effecAvely  managing  the  volume, velocity and variety of  data   Deriving  value  across   structured and unstructured data   AdapAng  to  context changes and integraAng new data sources and types
  • 20. Big  Data  SoluAon  Requirements   20 Cost-effectively manage the  volume,  variety  and  velocity  of  data   Process and analyze large,  complex  data  sets…quickly   Flexibly adapt to  context  changes  and  new  data  types  
  • 21. 21 Google’s  Approach  to  Big  Data   Hadoop’s  Pedigree    
  • 22. A  Timeline  View  #1   22
  • 23. Google  File  System   •  FoundaAon  of  scalable,  fail-­‐safe,  self-­‐healing  storage   •  One  central  place  of  truth   •  Cost-­‐effecAve  hardware  finally  available   •  19”  Rack  servers  with  decent  amount  of  disk  space   •  Handling  of  failures  built  in   •  Components  or  enAre  servers   •  At  scale  there  are  always  hardware  faults     •  Simple  file  system  interface   •  Finally  no  need  for  expensive,  proprietary  systems   23 Storage  
  • 24. MapReduce   •  First  take  on  distributed  data  processing  framework   •  Same  concepts  as  Google  File  System,  i.e.   •  Fail-­‐safe  and  scalable   •  Handles  a  wide  range  of  data  processing  problems   •  BUT  not  all  of  them  (more  later)   •  Simple  API  reading  and  wriAng  Key/Value  pairs   •  Framework  handles  heavy  task  of  data  movement   •  Core  concept  is  data  locality,  heavy  I/O   •  Brings  code  to  data,  not  the  opposite  (i.e.  no  HPC)   •  Accessible  in  many  programming  languages   24 Processing  
  • 25. BigTable   •  Adds  database  like  random  access  to  data   •  EffecAvely  a  Key/Value  store  with  table  semanAcs   •  Used  for  small  data  points   •  Usually  less  than  a  megabyte  per  Key/Value   •  Forfeits  advanced  concepts  for  ease  of  scalability   •  No  transacAons,  no  query  language   •  Powers  many  applicaAons  at  Google   •  Uses  Google  File  System  as  storage  layer   •  Tight  integraAon  with  MapReduce  for  batch   processing   25 Random  Access  
  • 26. Dremel,  Tenzing,  Pregel   •  Dremel  adds  specific  file  format  and  query  language   •  Used  for  highly  selecAve  queries,  data  exploraAon   •  File  layout  is  opAmized  for  very  effecAve  scanning   •  Runs  alongside  of  MapReduce  and  File  System     •  Tenzing  adds  SQL  over  various  data  sources   •  Can  query  raw  files,  Dremel  files,  or  BigTable  data  etc.   •  Brings  “known”  paradigm  to  stored  data   •  Pregel  adds  graph  processing  API   26 Query  API  
  • 27. Percolator,  Megastore   •  AddiAons  to  BigTable  to  add  “missing”  features   •  Percolator  is  using  BigTable  to  update  search  index   incrementally,  needs  transacAons   •  Distributes  updates  with  mulA-­‐phase  commits   •  Megastore  drives  Google  App  Engine  to  also  add   transacAons  for  user  API   •  Uses  ranges  of  rows  as  en#ty  groups   •  Reduces  locking  to  small  subsets   •  OpAmisAc,  roll-­‐forward  only  transacAons   •  Java  layer  over  BigTable  API   27 TransacAons  
  • 28. Spanner,  F1   •  Future  of  Google’s  distributed  storage  and   processing  system   •  Spanner  is  a  scalable,  mulA-­‐version,  globally-­‐   distributed,  and  synchronously-­‐replicated  database   •  Replicates  across  datacenters   •  Uses  TrueTime  (atomic  clocks)  for  synchronizaAon   •  Uses  Colossus  for  storage  (a  GFS  successor)   •  F1  replaced  MySQL  for  AdWords  service   •  SQL  over  data  stored  in  Spanner   •  Colocated  with  Spanner  processes   28 World-­‐Wide  Data  
  • 29. 29 The  Hadoop  Story   A  Eulogy  
  • 30. What  is  Apache  Hadoop?   30 Has  the  Flexibility  to  Store  and   Mine  Any  Type  of  Data     §  Ask  quesAons  across  structured  and   unstructured  data  that  were  previously   impossible  to  ask  or  solve   §  Not  bound  by  a  single  schema   Excels  at   Processing  Complex  Data     §  Scale-­‐out  architecture  divides  workloads   across  mulAple  nodes   §  Flexible  file  system  eliminates  ETL   boFlenecks   Scales   Economically     §  Can  be  deployed  on  commodity   hardware   §  Open  source  plavorm  guards  against   vendor  lock   Hadoop  Distributed   File  System  (HDFS)     Self-­‐Healing,  High   Bandwidth  Clustered   Storage       MapReduce/YARN     Distributed  CompuAng   Framework   Apache Hadoop  is  an  open  source   plavorm  for  data  storage  and  processing   that  is…   ü  Scalable   ü  Fault  tolerant   ü  Distributed   CORE  HADOOP  SYSTEM  COMPONENTS  
  • 31. Core  Hadoop:  HDFS   31 Self-healing, high bandwidth 1 2 3 4 5 2 4 5 HDFS 1 2 5 1 3 4 2 3 5 1 3 4 HDFS  breaks  incoming  files  into  blocks  and  stores  them  redundantly  across  the  cluster.  
  • 32. Core  Hadoop:  MapReduce   32 framework. 1 2 3 4 5 2 4 5 MR 1 2 5 1 3 4 2 3 5 1 3 4 Processes  large  jobs  in  parallel  across  many  nodes  and  combines  the  results.  
  • 33. Why  Hadoop  Was  Created   33 New opportunities to  derive  value  from    all  your  data.     Exploding  Data  Volumes   &  Types   Driving  The  Need  For  A  Flexible,   Scalable  SoluPon   It’s difficult to handle data this diverse, at this scale. Traditional platforms can’t keep pace. WEB   LOGS   SOCIAL   MEDIA   TRANSACTIONAL   DATA   SMART   GRIDS   OPERATIONAL   DATA   DIGITAL   CONTENT   R&D   DATA   AD  IMPRESSIONS   FILES   •  Any  Kind   •  From  Any  Source   •  Structured  &  Unstructured   •  At  Scale   •  Deep  Analysis   •  ExhausAve  &  Detailed   •  SophisAcated  Algorithms   •  Generate  Results  Quickly   •  Extract More Value •  From More Data •  More Cost Effectively •  With Greater Flexibility BIG  DATA   HARD   PROBLEMS   NEW OPPORTUNITIES
  • 34. The  Core  Values  of  Hadoop   34 A platform for §  Designed to store and process data at petabyte scale §  Scale-out architecture increases capacity and processing power linearly §  Perform operations in parallel across the entire cluster §  Store data in any format – free from rigid schemas §  Define context at the time you ask the question §  Process and analyze data using virtually any programming language §  Build out your cluster on your hardware of choice §  Open source software guards against vendor lock-in §  Wide integration ensures investment protection 1 2 3
  • 36. 36   Cloudera  Soaware  Stack   Turnkey  soluAon  for  Big  Data  and  Advanced  AnalyAcs  use-­‐cases       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA  
  • 37. 37 Spin  some  YARN!   Reborn  again!  
  • 38. Back  to  the  Press  again…   38 Source:  hFp://gigaom.com/2012/07/07/why-­‐the-­‐days-­‐are-­‐numbered-­‐for-­‐hadoop-­‐as-­‐we-­‐know-­‐it/  
  • 39. A  Timeline  View  #2   39
  • 40. First:  What  is  MapReduce  1?   40
  • 41. MoAvaAons  to  Change  MR1   41 •  Scaling  >4000  nodes   •  Fewer,  larger  clusters   •  No  single  source  of  truth,  data  in  “silos”  again   •  HA  of  Job  Tracker  difficult   •  Large,  complex  state   •  Poor  resource  uAlizaAon   •  Slots  in  MR1  are  for  either  map  or  reduce  
  • 42. YARN:  Yet  Another  Resource  NegoAator   42
  • 43. Split  of  ResponsibiliAes   43 Job  Tracker   Resource   Manager   ApplicaAon   Master   split   •  One  per  Cluster   •  Long-­‐lived   •  App-­‐level   •  One  per  app  instance   •  Short-­‐lived   •  Task-­‐level  scheduling   and  monitoring  
  • 44. Fine-­‐grained  Resource  Control   •  Node  Manager  is  a  generalized  Task  Tracker   •  Task  Tracker   •  Fixed  number  of  map  and  reduce  slots   •  Node  Manager   •  Containers  with  variable  resource  limits   44
  • 46. YARN  +  MapReduce  2   46 •  YARN  “runs”  MapReduce  as  an  applicaAon   •  MR  is  user  space   •  YARN  is  kernel  
  • 47. YARN  ApplicaAons   •  Distributed  shell   •  Open  MPI   •  Master-­‐worker   •  Apache  Giraph,  Hama   •  Spark   47
  • 48. 48 Summary   What  the  future  may  hold  
  • 49. Enterprise  Data  EvoluAon   RDBMS/EDW HADOOP-OPTIMIZED INFRASTRUCTURE AMOUNTOFDATA BUSINESS IMPACT NEXT-GEN DATA COMPUTING PLATFORM DATA-DRIVEN ORGANIZATION AMOUNT  OF  DATA   •  Data  collecAon  &  reporAng   •  Process  data  faster   •  Store  data  more  cost-­‐effecAvely   •  Simplify  infrastructure   •  Combine  data  from  across  the  business   •  Ask  new  quesAons  immediately   •  Enable  new  real-­‐Ame  applicaAons       1980s   2000s   2010s   CREATE   COMPETITIVE  ADVANTAGE   IMPROVE   OPERATIONAL  EFFICIENCY  
  • 50. Playing  Catchup   •  Improve  overall  performance   •  Google’s  code  is  kernel  module,  C++,  as  low  as  possible   •  Hadoop  is  Java,  for  ease  of  development  in  open-­‐source   •  Maybe  rewrite  parts  of  the  stack?   •  Overall  goal:  saturate  machine  specs  (I/O,  CPU,  RAM)   •  Add  missing  features   •  Everything  is  based  on  “hearsay”,  aka  research  papers  and   presentaAons   •  Add  what  is  necessary  or  for  the  sake  of  it?   50
  • 51. Further  Extend  or  Invent?   •  YARN  is  a  good  example  for  what  can  be  done   •  Look  at  every  component  and  evaluate   •  Work  with  research  and  universiAes,  companies  to   drive  new  development   •  What  else  can  be  done  with  all  that  data?   51
  • 52. 52 —  Jim  Gray,  Computer  ScienAst  
  • 53. From  Framework  to  Plavorm  to  Commodity   •  Hadoop  distribuAons  are  already  a  commodity   •  Move  up  the  stack  to  reach  commercial  space   •  Simplify  data  processing   •  ConAnuuity   •  WibiData  (Kiji)   •  Cloudera  CDK   •  Pure  Hadoop  SoluAons   •  DataMeer   •  Plavora   53
  • 54. Hadoop…  live  long  and  prosper!   54
  • 55.  Lars  George,  EMEA  Chief  Architect,  Cloudera            @larsgeorge   Thank  you!