SlideShare una empresa de Scribd logo
1 de 85
Descargar para leer sin conexión
Adam	
  Muise	
  –	
  Hortonworks	
  

WELCOME	
  TO	
  HADOOP	
  
Who	
  am	
  I?	
  
Why	
  are	
  we	
  here?	
  
Data	
  
“Big	
  Data”	
  is	
  the	
  marke=ng	
  term	
  
of	
  the	
  decade	
  
What	
  lurks	
  behind	
  the	
  hype	
  is	
  
the	
  democra=za=on	
  of	
  Data.	
  
You	
  need	
  to	
  deal	
  with	
  Data.	
  
You’re	
  probably	
  not	
  as	
  good	
  at	
  
that	
  as	
  you	
  think.	
  
Put	
  it	
  away,	
  delete	
  it,	
  tweet	
  it,	
  
compress	
  it,	
  shred	
  it,	
  wikileak-­‐it,	
  put	
  
it	
  in	
  a	
  database,	
  put	
  it	
  in	
  SAN/NAS,	
  
put	
  it	
  in	
  the	
  cloud,	
  hide	
  it	
  in	
  tape…	
  
Let’s	
  talk	
  challenges…	
  
Volume	
  
Volume	
  

Volume	
  

Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  

Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  

Volume	
  

Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  

Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  

Volume	
  

Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  Volume	
   Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
   Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
   Volume	
   Volume	
  

Volume	
  
Storage,	
  Management,	
  Processing	
  
all	
  become	
  challenges	
  with	
  Data	
  at	
  
Volume	
  
Tradi=onal	
  technologies	
  adopt	
  a	
  
divide,	
  drop,	
  and	
  conquer	
  approach	
  
Another	
  EDW	
  

Analy=cal	
  DB	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

The	
  solu=on?	
  
EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

OLTP	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Yet	
  Another	
  EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Another	
  EDW	
  

Analy=cal	
  DB	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

OLTP	
  

Ummm…you	
  
dropped	
  something	
  
EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Yet	
  Another	
  EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  Data	
  
Data	
   Data	
   Data	
  
Data	
  Data	
   Data	
   Data	
   Data	
   Data	
   Data	
   Data	
   Data	
   Data	
  Data	
  
Data	
   Data	
  Data	
  
Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
  Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
  
Analyzing	
  the	
  data	
  usually	
  raises	
  
more	
  interes=ng	
  ques=ons…	
  
…which	
  leads	
  to	
  more	
  data	
  
Wait,	
  you’ve	
  seen	
  this	
  before.	
  

Data	
  
Data	
  
Data	
  

…	
  

Sausage	
  Factory	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

…	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  Data	
   Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  begets	
  Data.	
  
What	
  keeps	
  us	
  from	
  Data?	
  
“Prices,	
  Stupid	
  passwords,	
  and	
  
Boring	
  Sta=s=cs.”	
  	
  
-­‐	
  Hans	
  Rosling	
  

h"p://www.youtube.com/watch?v=hVimVzgtD6w	
  
Your	
  data	
  silos	
  are	
  lonely	
  places.	
  
EDW	
  

Accounts	
  

Customers	
  

Web	
  Proper=es	
  

Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
…	
  Data	
  likes	
  to	
  be	
  together.	
  
EDW	
  

Accounts	
  

Customers	
  
Data	
  
Data	
  
Web	
  Proper=es	
  
Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
CDR	
  

Data	
  
Data	
   Data	
   Machine	
  Data	
  
Facebook	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Weather	
  Data	
  

TwiYer	
  

Data	
  
Data	
  likes	
  to	
  socialize	
  too.	
   Data	
   Data	
  
EDW	
  

Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Accounts	
  
Data	
  
Web	
  Proper=es	
  
Data	
   Data	
  
Data	
  
Customers	
  
Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
  
New	
  types	
  of	
  data	
  don’t	
  quite	
  fit	
  into	
  
your	
  pris=ne	
  view	
  of	
  the	
  world.	
  
Logs	
  

Data	
   Data	
  
Data	
  
Data	
  
Data	
  Data	
  
Data	
  
Machine	
  Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
  Data	
  
Data	
  

My	
  LiYle	
  Data	
  Empire	
  

Data	
  
?	
   Data	
  
?	
   Data	
   Data	
  
Data	
  
Data	
   Data	
  
?	
  ?	
  
Data	
  
Data	
  
To	
  resolve	
  this,	
  some	
  people	
  take	
  
hints	
  from	
  Lord	
  Of	
  The	
  Rings...	
  
…and	
  create	
  One-­‐Schema-­‐To-­‐
Rule-­‐Them-­‐All…	
  
EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Schema	
  
Data	
  
Data	
  
Data	
   Data	
  
ETL	
  
Data	
  
Data	
  
Data	
  

ETL	
  

ETL	
  

ETL	
  

EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Schema	
  
Data	
  
Data	
  
Data	
   Data	
  

…but	
  that	
  has	
  its	
  problems	
  too.	
  
ETL	
  
Data	
  
Data	
  
Data	
  

ETL	
  

ETL	
  
ETL	
  

EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Schema	
  
Data	
  
Data	
  
Data	
   Data	
  
So	
  what	
  is	
  the	
  answer?	
  
Enter	
  the	
  Hadoop.	
  

………	
  
hYp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/	
  
Hadoop	
  was	
  created	
  because	
  Big	
  IT	
  
never	
  cut	
  it	
  for	
  the	
  Internet	
  
Proper=es	
  like	
  Google,	
  Yahoo,	
  
Facebook,	
  TwiYer,	
  and	
  LinkedIn	
  
Tradi=onal	
  architecture	
  didn’t	
  
scale	
  enough…	
  
App	
   App	
   App	
   App	
  

App	
   App	
   App	
   App	
  
DB	
   DB	
  
DB	
  
SAN	
  

App	
   App	
   App	
   App	
  
DB	
   DB	
  
DB	
  
SAN	
  

DB	
   DB	
  
DB	
  
SAN	
  
Databases	
  become	
  bloated	
  and	
  
useless	
  
$upercompu=ng	
  

Tradi=onal	
  architectures	
  cost	
  too	
  
much	
  at	
  that	
  volume…	
  

$/TB	
  

$pecial	
  
Hardware	
  
How	
  would	
  you	
  fix	
  this?	
  
If	
  you	
  could	
  design	
  a	
  system	
  that	
  
would	
  handle	
  this,	
  what	
  would	
  it	
  
look	
  like?	
  
It	
  would	
  probably	
  need	
  a	
  highly	
  
resilient,	
  self-­‐healing,	
  cost-­‐efficient,	
  
distributed	
  file	
  system…	
  
Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  
It	
  would	
  probably	
  need	
  a	
  completely	
  
parallel	
  processing	
  framework	
  that	
  
took	
  tasks	
  to	
  the	
  data…	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
It	
  would	
  probably	
  run	
  on	
  commodity	
  
hardware,	
  virtualized	
  machines,	
  and	
  
common	
  OS	
  plaeorms	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
It	
  would	
  probably	
  be	
  open	
  source	
  so	
  
innova=on	
  could	
  happen	
  as	
  quickly	
  
as	
  possible	
  
It	
  would	
  need	
  a	
  cri=cal	
  mass	
  of	
  
users	
  
It	
  would	
  be	
  Apache	
  Hadoop	
  
{Processing	
  +	
  Storage}	
  
=	
  
{MapReduce/Tez/YARN+	
  HDFS}	
  
HDFS	
  stores	
  data	
  in	
  blocks	
  and	
  
replicates	
  those	
  blocks	
  
block1	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
block2	
  
block2	
  

Processing	
   Processing	
  Processing	
  
block1	
  
Storage	
   Storage	
   Storage	
  
block3	
  
block2	
  
Processing	
  
Storage	
  
block3	
  

Processing	
  Processing	
  
block1	
  
Storage	
   Storage	
  
block3	
  
If	
  a	
  block	
  fails	
  then	
  HDFS	
  always	
  has	
  
the	
  other	
  copies	
  and	
  heals	
  itself	
  
block1	
  
Processing	
   Processing	
  Processing	
  
block3	
  
Storage	
   Storage	
   Storage	
  
block2	
  
block2	
  

Processing	
   Processing	
  Processing	
  
block1	
  
Storage	
   Storage	
   Storage	
  
block3	
  
block2	
  
Processing	
  
Storage	
  
block3	
  

Processing	
  Processing	
  
block1	
  
Storage	
   Storage	
  

X
MapReduce	
  is	
  a	
  programming	
  
paradigm	
  that	
  completely	
  parallel	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  

Mapper	
  

Mapper	
  
Mapper	
  

Mapper	
  
Mapper	
  

Reducer	
  

Data	
  
Data	
  
Data	
  

Reducer	
  

Data	
  
Data	
  
Data	
  

Reducer	
  

Data	
  
Data	
  
Data	
  
MapReduce	
  has	
  three	
  phases:	
  
Map,	
  Sort/Shuffle,	
  Reduce	
  
Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  

Mapper	
  

Mapper	
  

Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  

Reducer	
  

Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  

Mapper	
  

Reducer	
  

Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  

Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  

Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  

Mapper	
  

Reducer	
  

Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  

Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  

Mapper	
  

Key,	
  Value	
  
Key,	
  Value	
  
Key,	
  Value	
  
MapReduce	
  applies	
  to	
  a	
  lot	
  of	
  
data	
  processing	
  problems	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  

Mapper	
  

Mapper	
  
Mapper	
  

Mapper	
  
Mapper	
  

Reducer	
  

Data	
  
Data	
  
Data	
  

Reducer	
  

Data	
  
Data	
  
Data	
  

Reducer	
  

Data	
  
Data	
  
Data	
  
MapReduce	
  goes	
  a	
  long	
  way,	
  but	
  
not	
  all	
  data	
  processing	
  and	
  analy=cs	
  
are	
  solved	
  the	
  same	
  way	
  
Some=mes	
  your	
  data	
  applica=on	
  
needs	
  parallel	
  processing	
  and	
  inter-­‐
process	
  communica=on	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  

Process	
  
Data	
  
Data	
  
Data	
  
Process	
  

Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  

Process	
  

Process	
  

Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
…like	
  Complex	
  Event	
  Processing	
  
in	
  Apache	
  Storm	
  
Some=mes	
  your	
  machine	
  learning	
  
data	
  applica=on	
  needs	
  to	
  process	
  in	
  
memory	
  and	
  iterate	
  	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  

Process	
  
Data	
  
Data	
  
Data	
  
Process	
  

Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  

Process	
  

Process	
  

Process	
  

Process	
  

Process	
  

Data	
  
Data	
  
Data	
  

Data	
  
Data	
  
Data	
  
…like	
  in	
  Machine	
  Learning	
  in	
  
Spark	
  
Introducing	
  YARN	
  
YARN	
  =	
  Yet	
  Another	
  Resource	
  
Nego=ator	
  
YARN	
  abstracts	
  resource	
  
management	
  so	
  you	
  can	
  run	
  more	
  
than	
  just	
  MapReduce	
  
MapReduce	
  V2	
  
MapReduce	
  V?	
   STORM	
  

Giraph	
  

Tez	
  

YARN	
  
HDFS2	
  

MPI	
  
HBase	
   …	
  and	
  
more	
  
Spark	
  
Node	
  Manager	
  

Resource	
  Manager	
  

Container	
  

Scheduler	
  
Pig	
  

AppMaster	
  
Container	
  

Resource	
  Manager	
  
+	
  
Node	
  Managers	
  
=	
  YARN	
  

Node	
  Manager	
  
Container	
  
Container	
  
Storm	
  

Node	
  Manager	
  
Node	
  Manager	
  
MapReduce	
  

AppMaster	
  

Container	
  
Container	
  

Container	
  
Container	
  
Container	
  

AppMaster	
  
YARN	
  turns	
  Hadoop	
  into	
  a	
  smart	
  
phone:	
  An	
  App	
  Ecosystem	
  
hortonworks.com/yarn/	
  
Check	
  out	
  the	
  book	
  too…	
  

Preview	
  at:	
  
hortonworks.com/yarn/	
  
YARN	
  is	
  an	
  essen=al	
  part	
  of	
  a	
  
balanced	
  breakfast	
  in	
  Hadoop	
  2.x	
  
Introducing	
  Tez	
  
Tez	
  is	
  a	
  YARN	
  applica=on,	
  like	
  
MapReduce	
  is	
  a	
  YARN	
  applica=on	
  
Tez	
  is	
  the	
  Lego	
  set	
  for	
  your	
  data	
  
applica=on	
  
Tez	
  provides	
  a	
  layer	
  for	
  abstract	
  
tasks,	
  these	
  could	
  be	
  mappers,	
  
reducers,	
  customized	
  stream	
  
processes,	
  in	
  memory	
  structures,	
  
etc	
  
Tez	
  can	
  chain	
  tasks	
  together	
  into	
  one	
  
job	
  to	
  get	
  Map	
  –	
  Reduce	
  –	
  Reduce	
  jobs	
  
suitable	
  for	
  things	
  like	
  Hive	
  SQL	
  
projec=ons,	
  group	
  by,	
  and	
  order	
  by	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  

TezMap	
  

TezMap	
  

TezReduce	
  

TezReduce	
  

Data	
  
Data	
  
Data	
  

TezMap	
  

TezReduce	
  

TezReduce	
  

Data	
  
Data	
  
Data	
  

TezReduce	
  

TezReduce	
  

TezMap	
  
TezMap	
  

Data	
  
Data	
  
Data	
  
Tez	
  can	
  provide	
  long-­‐running	
  
containers	
  for	
  applica=ons	
  like	
  Hive	
  
to	
  side-­‐step	
  batch	
  process	
  startups	
  
you	
  would	
  have	
  with	
  MapReduce	
  
Hadoop	
  has	
  other	
  open	
  source	
  
projects…	
  
Hive	
  =	
  {SQL	
  -­‐>	
  Tez	
  ||	
  MapReduce}	
  
SQL-­‐IN-­‐HADOOP	
  
Pig	
  =	
  {PigLa=n	
  -­‐>	
  Tez	
  ||	
  
MapReduce}	
  
HCatalog	
  =	
  {metadata*	
  for	
  
MapReduce,	
  Hive,	
  Pig,	
  HBase}	
  

*metadata	
  =	
  tables,	
  columns,	
  par==ons,	
  types	
  
Oozie	
  =	
  Job::{Task,	
  Task,	
  if	
  Task,	
  
then	
  Task,	
  final	
  Task}	
  
Falcon	
  
Feed	
   Feed	
  
Feed	
  

Feed	
  

Hadoop	
  

DR	
  

Feed	
  

Replica=on	
  

Feed	
  

Feed	
  

Hadoop	
  
Feed	
  
Knox	
  
REST	
  
Client	
  
REST	
  
Client	
  

Knox	
  Gateway	
  
REST	
  
Client	
  

Hadoop	
  
Cluster	
  
Hadoop	
  
Cluster	
  

Enterprise	
  
LDAP	
  
Flume	
  
Files	
  

Flume	
  
JMS	
  

Weblogs	
  

Events	
  

Flume	
  

Flume	
  

Flume	
  

Flume	
  

Flume	
  

Hadoop	
  
Sqoop	
  
DB	
  

DB	
  

Sqoop	
  
Hadoop	
  

Sqoop	
  
Ambari	
  =	
  {install,	
  manage,	
  
monitor}	
  
HBase	
  =	
  {real-­‐=me,	
  distributed-­‐
map,	
  big-­‐tables}	
  
Storm	
  =	
  {Complex	
  Event	
  Processing,	
  
Near-­‐Real-­‐Time,	
  Provisioned	
  by	
  
YARN	
  }	
  
Tez	
  

Storm	
  

YARN	
  

Pig	
  

HDFS	
  

MapReduce	
  

Apache	
  Hadoop	
  

HCatalog	
  

Hive	
  
HBase	
  

Ambari	
  

Knox	
  

Sqoop	
  

Falcon	
  
Flume	
  
Storm	
  

Tez	
  
Pig	
  

YARN	
  

HDFS	
  

MapReduce	
  

Hortonworks	
  Data	
  Plaeorm	
  
HCatalog	
  

Hive	
  
HBase	
  

Ambari	
  

Knox	
  

Sqoop	
  

Falcon	
  
Flume	
  
What	
  else	
  are	
  we	
  working	
  on?	
  
hortonworks.com/labs/	
  
Hadoop	
  is	
  the	
  new	
  Modern	
  Data	
  
Architecture	
  

Más contenido relacionado

La actualidad más candente

What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Edureka!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaHadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaEdureka!
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big DataEdureka!
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IEdureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...Simplilearn
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 

La actualidad más candente (20)

What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaHadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 

Destacado

Hadoop - Introduzione all’architettura ed approcci applicativi
Hadoop - Introduzione all’architettura ed approcci applicativiHadoop - Introduzione all’architettura ed approcci applicativi
Hadoop - Introduzione all’architettura ed approcci applicativilostrettodigitale
 
Introduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big DataIntroduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big DataVincenzo Manzoni
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingCloudera, Inc.
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Destacado (15)

Hug Milano September 2014: Hadoop Summit Europe Impressions
Hug Milano September 2014: Hadoop Summit Europe ImpressionsHug Milano September 2014: Hadoop Summit Europe Impressions
Hug Milano September 2014: Hadoop Summit Europe Impressions
 
Big Data Infrastructures - Hadoop ecosystem, M. E. Piras
Big Data Infrastructures - Hadoop ecosystem, M. E. PirasBig Data Infrastructures - Hadoop ecosystem, M. E. Piras
Big Data Infrastructures - Hadoop ecosystem, M. E. Piras
 
Hadoop - Introduzione all’architettura ed approcci applicativi
Hadoop - Introduzione all’architettura ed approcci applicativiHadoop - Introduzione all’architettura ed approcci applicativi
Hadoop - Introduzione all’architettura ed approcci applicativi
 
Big data e Business Intelligence | presentazione open day @Fondazione Kennedy...
Big data e Business Intelligence | presentazione open day @Fondazione Kennedy...Big data e Business Intelligence | presentazione open day @Fondazione Kennedy...
Big data e Business Intelligence | presentazione open day @Fondazione Kennedy...
 
Introduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big DataIntroduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big Data
 
FANTIN BIG DATA (1)
FANTIN BIG DATA (1)FANTIN BIG DATA (1)
FANTIN BIG DATA (1)
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Creare una campagna online (simulazione)
Creare una campagna online (simulazione)Creare una campagna online (simulazione)
Creare una campagna online (simulazione)
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Similar a 2014 feb 5_what_ishadoop_mda

Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastEric Kavanagh
 
Is Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceIs Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceEdureka!
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Pavlo Baron
 
May 2012 HUG: The Changing Big Data Landscape
May 2012 HUG: The Changing Big Data LandscapeMay 2012 HUG: The Changing Big Data Landscape
May 2012 HUG: The Changing Big Data LandscapeYahoo Developer Network
 
Denodo Data Innovation Award: The Largest Logical Data Warehouse on the Plane...
Denodo Data Innovation Award: The Largest Logical Data Warehouse on the Plane...Denodo Data Innovation Award: The Largest Logical Data Warehouse on the Plane...
Denodo Data Innovation Award: The Largest Logical Data Warehouse on the Plane...Denodo
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15Edureka!
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryCoert Du Plessis (杜康)
 
Big Data Science at the Digital Catapult
Big Data Science at the Digital CatapultBig Data Science at the Digital Catapult
Big Data Science at the Digital CatapultChandan Rajah
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
UBC STAT545 2014 Cm001 intro to-course
UBC STAT545 2014 Cm001 intro to-courseUBC STAT545 2014 Cm001 intro to-course
UBC STAT545 2014 Cm001 intro to-courseJennifer Bryan
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...Athens Big Data
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesBen Siscovick
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPDr Geetha Mohan
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesKrishna Sankar
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 

Similar a 2014 feb 5_what_ishadoop_mda (20)

Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Is Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceIs Hadoop a necessity for Data Science
Is Hadoop a necessity for Data Science
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)
 
May 2012 HUG: The Changing Big Data Landscape
May 2012 HUG: The Changing Big Data LandscapeMay 2012 HUG: The Changing Big Data Landscape
May 2012 HUG: The Changing Big Data Landscape
 
Denodo Data Innovation Award: The Largest Logical Data Warehouse on the Plane...
Denodo Data Innovation Award: The Largest Logical Data Warehouse on the Plane...Denodo Data Innovation Award: The Largest Logical Data Warehouse on the Plane...
Denodo Data Innovation Award: The Largest Logical Data Warehouse on the Plane...
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
 
Big Data Science at the Digital Catapult
Big Data Science at the Digital CatapultBig Data Science at the Digital Catapult
Big Data Science at the Digital Catapult
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
UBC STAT545 2014 Cm001 intro to-course
UBC STAT545 2014 Cm001 intro to-courseUBC STAT545 2014 Cm001 intro to-course
UBC STAT545 2014 Cm001 intro to-course
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA Ventures
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 

Más de Adam Muise

2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadamAdam Muise
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1Adam Muise
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_pointsAdam Muise
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalogAdam Muise
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012Adam Muise
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotechAdam Muise
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohugAdam Muise
 

Más de Adam Muise (14)

2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 

Último

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 

Último (20)

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 

2014 feb 5_what_ishadoop_mda

  • 1. Adam  Muise  –  Hortonworks   WELCOME  TO  HADOOP  
  • 3. Why  are  we  here?  
  • 5. “Big  Data”  is  the  marke=ng  term   of  the  decade  
  • 6. What  lurks  behind  the  hype  is   the  democra=za=on  of  Data.  
  • 7. You  need  to  deal  with  Data.  
  • 8. You’re  probably  not  as  good  at   that  as  you  think.  
  • 9. Put  it  away,  delete  it,  tweet  it,   compress  it,  shred  it,  wikileak-­‐it,  put   it  in  a  database,  put  it  in  SAN/NAS,   put  it  in  the  cloud,  hide  it  in  tape…  
  • 12. Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume  
  • 13. Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  
  • 14. Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume  
  • 15. Storage,  Management,  Processing   all  become  challenges  with  Data  at   Volume  
  • 16. Tradi=onal  technologies  adopt  a   divide,  drop,  and  conquer  approach  
  • 17. Another  EDW   Analy=cal  DB   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   The  solu=on?   EDW   Data   Data   Data   Data   Data   Data   Data   Data   Data   OLTP   Data   Data   Data   Data   Data   Data   Data   Data   Data   Yet  Another  EDW   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 18. Another  EDW   Analy=cal  DB   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   OLTP   Ummm…you   dropped  something   EDW   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Yet  Another  EDW   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 19. Analyzing  the  data  usually  raises   more  interes=ng  ques=ons…  
  • 20. …which  leads  to  more  data  
  • 21. Wait,  you’ve  seen  this  before.   Data   Data   Data   …   Sausage  Factory   Data   Data   Data   Data   Data   Data   Data   Data   Data   …   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 23. What  keeps  us  from  Data?  
  • 24. “Prices,  Stupid  passwords,  and   Boring  Sta=s=cs.”     -­‐  Hans  Rosling   h"p://www.youtube.com/watch?v=hVimVzgtD6w  
  • 25. Your  data  silos  are  lonely  places.   EDW   Accounts   Customers   Web  Proper=es   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 26. …  Data  likes  to  be  together.   EDW   Accounts   Customers   Data   Data   Web  Proper=es   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 27. CDR   Data   Data   Data   Machine  Data   Facebook   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Weather  Data   TwiYer   Data   Data  likes  to  socialize  too.   Data   Data   EDW   Data   Data   Data   Data   Data   Data   Accounts   Data   Web  Proper=es   Data   Data   Data   Customers   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 28. New  types  of  data  don’t  quite  fit  into   your  pris=ne  view  of  the  world.   Logs   Data   Data   Data   Data   Data  Data   Data   Machine  Data   Data   Data   Data   Data   Data  Data   Data   My  LiYle  Data  Empire   Data   ?   Data   ?   Data   Data   Data   Data   Data   ?  ?   Data   Data  
  • 29. To  resolve  this,  some  people  take   hints  from  Lord  Of  The  Rings...  
  • 30. …and  create  One-­‐Schema-­‐To-­‐ Rule-­‐Them-­‐All…   EDW   Data   Data   Data   Data   Data   Schema   Data   Data   Data   Data  
  • 31. ETL   Data   Data   Data   ETL   ETL   ETL   EDW   Data   Data   Data   Data   Data   Schema   Data   Data   Data   Data   …but  that  has  its  problems  too.   ETL   Data   Data   Data   ETL   ETL   ETL   EDW   Data   Data   Data   Data   Data   Schema   Data   Data   Data   Data  
  • 32. So  what  is  the  answer?  
  • 33. Enter  the  Hadoop.   ………   hYp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/  
  • 34. Hadoop  was  created  because  Big  IT   never  cut  it  for  the  Internet   Proper=es  like  Google,  Yahoo,   Facebook,  TwiYer,  and  LinkedIn  
  • 35. Tradi=onal  architecture  didn’t   scale  enough…   App   App   App   App   App   App   App   App   DB   DB   DB   SAN   App   App   App   App   DB   DB   DB   SAN   DB   DB   DB   SAN  
  • 36. Databases  become  bloated  and   useless  
  • 37. $upercompu=ng   Tradi=onal  architectures  cost  too   much  at  that  volume…   $/TB   $pecial   Hardware  
  • 38. How  would  you  fix  this?  
  • 39. If  you  could  design  a  system  that   would  handle  this,  what  would  it   look  like?  
  • 40. It  would  probably  need  a  highly   resilient,  self-­‐healing,  cost-­‐efficient,   distributed  file  system…   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage  
  • 41. It  would  probably  need  a  completely   parallel  processing  framework  that   took  tasks  to  the  data…   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  • 42. It  would  probably  run  on  commodity   hardware,  virtualized  machines,  and   common  OS  plaeorms   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  • 43. It  would  probably  be  open  source  so   innova=on  could  happen  as  quickly   as  possible  
  • 44. It  would  need  a  cri=cal  mass  of   users  
  • 45. It  would  be  Apache  Hadoop  
  • 46. {Processing  +  Storage}   =   {MapReduce/Tez/YARN+  HDFS}  
  • 47. HDFS  stores  data  in  blocks  and   replicates  those  blocks   block1   Processing   Processing  Processing   Storage   Storage   Storage   block2   block2   Processing   Processing  Processing   block1   Storage   Storage   Storage   block3   block2   Processing   Storage   block3   Processing  Processing   block1   Storage   Storage   block3  
  • 48. If  a  block  fails  then  HDFS  always  has   the  other  copies  and  heals  itself   block1   Processing   Processing  Processing   block3   Storage   Storage   Storage   block2   block2   Processing   Processing  Processing   block1   Storage   Storage   Storage   block3   block2   Processing   Storage   block3   Processing  Processing   block1   Storage   Storage   X
  • 49. MapReduce  is  a  programming   paradigm  that  completely  parallel   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Mapper   Mapper   Mapper   Mapper   Mapper   Reducer   Data   Data   Data   Reducer   Data   Data   Data   Reducer   Data   Data   Data  
  • 50. MapReduce  has  three  phases:   Map,  Sort/Shuffle,  Reduce   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Mapper   Mapper   Key,  Value   Key,  Value   Key,  Value   Reducer   Key,  Value   Key,  Value   Key,  Value   Mapper   Reducer   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Mapper   Reducer   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Key,  Value   Mapper   Key,  Value   Key,  Value   Key,  Value  
  • 51. MapReduce  applies  to  a  lot  of   data  processing  problems   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Mapper   Mapper   Mapper   Mapper   Mapper   Reducer   Data   Data   Data   Reducer   Data   Data   Data   Reducer   Data   Data   Data  
  • 52. MapReduce  goes  a  long  way,  but   not  all  data  processing  and  analy=cs   are  solved  the  same  way  
  • 53. Some=mes  your  data  applica=on   needs  parallel  processing  and  inter-­‐ process  communica=on   Data   Data   Data   Data   Data   Data   Process   Data   Data   Data   Process   Data   Data   Data   Data   Data   Data   Data   Data   Data   Process   Process   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 54. …like  Complex  Event  Processing   in  Apache  Storm  
  • 55. Some=mes  your  machine  learning   data  applica=on  needs  to  process  in   memory  and  iterate     Data   Data   Data   Data   Data   Data   Process   Data   Data   Data   Process   Data   Data   Data   Data   Data   Data   Data   Data   Data   Process   Process   Process   Process   Process   Data   Data   Data   Data   Data   Data  
  • 56. …like  in  Machine  Learning  in   Spark  
  • 58. YARN  =  Yet  Another  Resource   Nego=ator  
  • 59. YARN  abstracts  resource   management  so  you  can  run  more   than  just  MapReduce   MapReduce  V2   MapReduce  V?   STORM   Giraph   Tez   YARN   HDFS2   MPI   HBase   …  and   more   Spark  
  • 60. Node  Manager   Resource  Manager   Container   Scheduler   Pig   AppMaster   Container   Resource  Manager   +   Node  Managers   =  YARN   Node  Manager   Container   Container   Storm   Node  Manager   Node  Manager   MapReduce   AppMaster   Container   Container   Container   Container   Container   AppMaster  
  • 61. YARN  turns  Hadoop  into  a  smart   phone:  An  App  Ecosystem   hortonworks.com/yarn/  
  • 62. Check  out  the  book  too…   Preview  at:   hortonworks.com/yarn/  
  • 63. YARN  is  an  essen=al  part  of  a   balanced  breakfast  in  Hadoop  2.x  
  • 65. Tez  is  a  YARN  applica=on,  like   MapReduce  is  a  YARN  applica=on  
  • 66. Tez  is  the  Lego  set  for  your  data   applica=on  
  • 67. Tez  provides  a  layer  for  abstract   tasks,  these  could  be  mappers,   reducers,  customized  stream   processes,  in  memory  structures,   etc  
  • 68. Tez  can  chain  tasks  together  into  one   job  to  get  Map  –  Reduce  –  Reduce  jobs   suitable  for  things  like  Hive  SQL   projec=ons,  group  by,  and  order  by   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   TezMap   TezMap   TezReduce   TezReduce   Data   Data   Data   TezMap   TezReduce   TezReduce   Data   Data   Data   TezReduce   TezReduce   TezMap   TezMap   Data   Data   Data  
  • 69. Tez  can  provide  long-­‐running   containers  for  applica=ons  like  Hive   to  side-­‐step  batch  process  startups   you  would  have  with  MapReduce  
  • 70. Hadoop  has  other  open  source   projects…  
  • 71. Hive  =  {SQL  -­‐>  Tez  ||  MapReduce}   SQL-­‐IN-­‐HADOOP  
  • 72. Pig  =  {PigLa=n  -­‐>  Tez  ||   MapReduce}  
  • 73. HCatalog  =  {metadata*  for   MapReduce,  Hive,  Pig,  HBase}   *metadata  =  tables,  columns,  par==ons,  types  
  • 74. Oozie  =  Job::{Task,  Task,  if  Task,   then  Task,  final  Task}  
  • 75. Falcon   Feed   Feed   Feed   Feed   Hadoop   DR   Feed   Replica=on   Feed   Feed   Hadoop   Feed  
  • 76. Knox   REST   Client   REST   Client   Knox  Gateway   REST   Client   Hadoop   Cluster   Hadoop   Cluster   Enterprise   LDAP  
  • 77. Flume   Files   Flume   JMS   Weblogs   Events   Flume   Flume   Flume   Flume   Flume   Hadoop  
  • 78. Sqoop   DB   DB   Sqoop   Hadoop   Sqoop  
  • 79. Ambari  =  {install,  manage,   monitor}  
  • 80. HBase  =  {real-­‐=me,  distributed-­‐ map,  big-­‐tables}  
  • 81. Storm  =  {Complex  Event  Processing,   Near-­‐Real-­‐Time,  Provisioned  by   YARN  }  
  • 82. Tez   Storm   YARN   Pig   HDFS   MapReduce   Apache  Hadoop   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  
  • 83. Storm   Tez   Pig   YARN   HDFS   MapReduce   Hortonworks  Data  Plaeorm   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  
  • 84. What  else  are  we  working  on?   hortonworks.com/labs/  
  • 85. Hadoop  is  the  new  Modern  Data   Architecture