SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
Marc	
  Cluet	
  –	
  Lynx	
  Consultants	
  
What’s	
  behind	
  Big	
  Data	
  
What we’ll cover?
¡  Understand	
  Hadoop	
  components	
  
¡  Understand	
  different	
  technologies	
  involved	
  
¡  Embrace	
  Big	
  Data!	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  SQL	
  has	
  a	
  limited	
  ability	
  to	
  process	
  changing	
  data	
  
§  SQL	
  schemas	
  are	
  the	
  truth,	
  data	
  needs	
  to	
  fit	
  that	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
§  Designed	
  to	
  handle	
  Terabytes	
  of	
  data	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
§  Designed	
  to	
  handle	
  Terabytes	
  of	
  data	
  
§  Designed	
  for	
  fault	
  tolerance	
  and	
  securing	
  data	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
§  Designed	
  to	
  handle	
  Terabytes	
  of	
  data	
  
§  Designed	
  for	
  fault	
  tolerance	
  and	
  securing	
  data	
  
§  Designed	
  around	
  exploiting	
  hardware	
  to	
  the	
  fullest	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
§  Designed	
  to	
  handle	
  Terabytes	
  of	
  data	
  
§  Designed	
  for	
  fault	
  tolerance	
  and	
  securing	
  data	
  
§  Designed	
  around	
  exploiting	
  hardware	
  to	
  the	
  fullest	
  
§  Designed	
  around	
  Map/Reduce	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Hadoop?
Lynx	
  Consultants	
  ©	
  2013	
  
What is Hadoop?
¡  	
  Hadoop	
  is	
  one	
  of	
  the	
  big	
  players	
  for	
  Big	
  Data	
  
§  Developed	
  as	
  an	
  Open	
  Source	
  implementation	
  to	
  implement	
  
Google	
  BigTable	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Hadoop?
¡  	
  Hadoop	
  is	
  one	
  of	
  the	
  big	
  players	
  for	
  Big	
  Data	
  
§  Developed	
  as	
  an	
  Open	
  Source	
  implementation	
  to	
  implement	
  
Google	
  BigTable	
  
§  Mainly	
  developed	
  at	
  Yahoo!	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Hadoop?
¡  	
  Hadoop	
  is	
  one	
  of	
  the	
  big	
  players	
  for	
  Big	
  Data	
  
§  Developed	
  as	
  an	
  Open	
  Source	
  implementation	
  to	
  implement	
  
Google	
  BigTable	
  
§  Mainly	
  developed	
  at	
  Yahoo!	
  
§  Current	
  companies	
  behind	
  it:	
  Hortonworks	
  and	
  Cloudera	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
§  HDFS	
  is	
  a	
  distributed	
  filesystem	
  across	
  many	
  nodes	
  
§  Has	
  many	
  copies	
  of	
  your	
  data	
  (default:	
  3)	
  
§  If	
  one	
  node	
  goes	
  down	
  makes	
  sure	
  all	
  the	
  data	
  is	
  rebalanced	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
§  Schemaless	
  Key-­‐Value	
  storage	
  
§  All	
  data	
  exportable	
  in	
  JSON	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
¡  	
  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
§  This	
  was	
  invented	
  by	
  Google	
  
§  Given	
  a	
  dataset	
  we	
  Map	
  all	
  that	
  match	
  a	
  criteria	
  
§  Then	
  we	
  Reduce	
  this	
  to	
  a	
  result	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
¡  	
  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
¡  	
  Hive	
  –	
  SQL	
  for	
  NoSQL	
  
§  Hive	
  provides	
  a	
  SQL	
  language	
  called	
  HiveSQL	
  
§  Provides	
  a	
  good	
  entrance	
  for	
  SQL	
  users	
  :)	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
¡  	
  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
¡  	
  Hive	
  –	
  SQL	
  for	
  NoSQL	
  
¡  	
  Pig	
  –	
  Map/Reduce	
  made	
  easy	
  
§  Creates	
  data	
  results	
  given	
  a	
  reduced	
  language	
  
§  Reinvents	
  SQL	
  somehow	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Hive	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Pig	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
¡  	
  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
¡  	
  Hive	
  –	
  SQL	
  for	
  NoSQL	
  
¡  	
  Pig	
  –	
  Map/Reduce	
  made	
  easy	
  
¡  	
  Flume	
  –	
  Fault	
  Tolerant	
  transport	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Flume	
  
§  Divides	
  in	
  Sources,	
  Channels,	
  Sinks	
  
§  Can	
  have	
  multiple	
  of	
  everything,	
  makes	
  it	
  fault	
  tolerant	
  
§  Many	
  sources!	
  
▪  Avro,	
  Exec,	
  JMS,	
  Syslog,	
  HTTP,	
  NetCat,	
  Your	
  Own	
  (Java)	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Flume	
  
§  Divides	
  in	
  Sources,	
  Channels,	
  Sinks	
  
§  Can	
  have	
  multiple	
  of	
  everything,	
  makes	
  it	
  fault	
  tolerant	
  
§  Many	
  sources!	
  
§  Many	
  channels!	
  
▪  Memory,	
  File,	
  Your	
  Own	
  (Java)	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Flume	
  
§  Divides	
  in	
  Sources,	
  Channels,	
  Sinks	
  
§  Can	
  have	
  multiple	
  of	
  everything,	
  makes	
  it	
  fault	
  tolerant	
  
§  Many	
  sources!	
  
§  Many	
  channels!	
  
§  Many	
  sinks!	
  
▪  Avro,	
  HDFS,	
  Logger,	
  IRC,	
  File,	
  Hbase,	
  ElasticSearch,	
  S3,	
  Community	
  
sinks,	
  Your	
  Own	
  (Java)	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Flume	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
§  Primary	
  Namenode	
  
§  Secondary	
  Namenode	
  
§  Data	
  Node	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
§  Primary	
  Namenode	
  
▪  Controls	
  all	
  the	
  cluster,	
  knows	
  where	
  the	
  data	
  resides	
  
▪  Runs	
  the	
  job	
  tracker	
  to	
  keep	
  track	
  of	
  Map/Reduce	
  jobs	
  
▪  Biggest	
  point	
  of	
  failure,	
  shadowing	
  it	
  is	
  a	
  potential	
  option	
  
§  Secondary	
  Namenode	
  
§  Data	
  Node	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
§  Primary	
  Namenode	
  
§  Secondary	
  Namenode	
  
▪  Performs	
  secondary	
  cleanup	
  options	
  
§  Data	
  Node	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
§  Primary	
  Namenode	
  
§  Secondary	
  Namenode	
  
§  Data	
  Node	
  
▪  Stores	
  all	
  the	
  information	
  
▪  Runs	
  Map/Reduce	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Questions?
Lynx	
  Consultants	
  ©	
  2013	
  

Más contenido relacionado

La actualidad más candente

Soft-Shake 2013 : Enabling Realtime Queries to End Users
Soft-Shake 2013 : Enabling Realtime Queries to End UsersSoft-Shake 2013 : Enabling Realtime Queries to End Users
Soft-Shake 2013 : Enabling Realtime Queries to End UsersBenoit Perroud
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionFrank Y
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Mike King
 
Webinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxWebinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxCloudian
 
Stockage des données : quel système pour quel usage ?
Stockage des données : quel système pour quel usage ?Stockage des données : quel système pour quel usage ?
Stockage des données : quel système pour quel usage ?Zouheir Cadi
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryptionOwen O'Malley
 
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment PROIDEA
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Red hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabaseRed hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabaseEDB
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Gavin Heavyside
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...DataStax Academy
 
Doing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database AdoptionDoing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database AdoptionEDB
 

La actualidad más candente (19)

SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Soft-Shake 2013 : Enabling Realtime Queries to End Users
Soft-Shake 2013 : Enabling Realtime Queries to End UsersSoft-Shake 2013 : Enabling Realtime Queries to End Users
Soft-Shake 2013 : Enabling Realtime Queries to End Users
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In Action
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5
 
Webinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxWebinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptx
 
Stockage des données : quel système pour quel usage ?
Stockage des données : quel système pour quel usage ?Stockage des données : quel système pour quel usage ?
Stockage des données : quel système pour quel usage ?
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
 
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Red hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabaseRed hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabase
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
 
Doing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database AdoptionDoing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database Adoption
 

Destacado

Innovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich EventInnovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich EventMarc Cluet
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operationsMarc Cluet
 
Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014Marc Cluet
 
Puppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and PuppetPuppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and PuppetMarc Cluet
 
Ssh that wonderful thing
Ssh that wonderful thingSsh that wonderful thing
Ssh that wonderful thingMarc Cluet
 
Networking & dns 101
Networking & dns 101Networking & dns 101
Networking & dns 101Marc Cluet
 
Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015Marc Cluet
 

Destacado (7)

Innovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich EventInnovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich Event
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
 
Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014
 
Puppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and PuppetPuppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and Puppet
 
Ssh that wonderful thing
Ssh that wonderful thingSsh that wonderful thing
Ssh that wonderful thing
 
Networking & dns 101
Networking & dns 101Networking & dns 101
Networking & dns 101
 
Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015
 

Similar a Introduction to hadoop

Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Jean-Pierre König
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2TarjeiRomtveit
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherJohn Wood
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopMark Ginnebaugh
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramSkillspeed
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
DBA to Data Scientist
DBA to Data ScientistDBA to Data Scientist
DBA to Data Scientistpasalapudi
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 

Similar a Introduction to hadoop (20)

Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
DBA to Data Scientist
DBA to Data ScientistDBA to Data Scientist
DBA to Data Scientist
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 

Más de Marc Cluet

Your Kernel and You
Your Kernel and YouYour Kernel and You
Your Kernel and YouMarc Cluet
 
Managing DevOps teams, staying alive
Managing DevOps teams, staying aliveManaging DevOps teams, staying alive
Managing DevOps teams, staying aliveMarc Cluet
 
The DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlesslyThe DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlesslyMarc Cluet
 
Elastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptosElastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptosMarc Cluet
 
Service discovery and puppet
Service discovery and puppetService discovery and puppet
Service discovery and puppetMarc Cluet
 
Consul First Steps
Consul First StepsConsul First Steps
Consul First StepsMarc Cluet
 
Microservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff MeetupMicroservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff MeetupMarc Cluet
 
Microservices and the Cloud
Microservices and the CloudMicroservices and the Cloud
Microservices and the CloudMarc Cluet
 
How to implement microservices
How to implement microservicesHow to implement microservices
How to implement microservicesMarc Cluet
 
A Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and PuppetA Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and PuppetMarc Cluet
 
Autoscaling Best Practices
Autoscaling Best PracticesAutoscaling Best Practices
Autoscaling Best PracticesMarc Cluet
 
Rackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & PackerRackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & PackerMarc Cluet
 
Introduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech nightIntroduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech nightMarc Cluet
 
Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)Marc Cluet
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheapMarc Cluet
 

Más de Marc Cluet (15)

Your Kernel and You
Your Kernel and YouYour Kernel and You
Your Kernel and You
 
Managing DevOps teams, staying alive
Managing DevOps teams, staying aliveManaging DevOps teams, staying alive
Managing DevOps teams, staying alive
 
The DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlesslyThe DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlessly
 
Elastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptosElastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptos
 
Service discovery and puppet
Service discovery and puppetService discovery and puppet
Service discovery and puppet
 
Consul First Steps
Consul First StepsConsul First Steps
Consul First Steps
 
Microservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff MeetupMicroservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff Meetup
 
Microservices and the Cloud
Microservices and the CloudMicroservices and the Cloud
Microservices and the Cloud
 
How to implement microservices
How to implement microservicesHow to implement microservices
How to implement microservices
 
A Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and PuppetA Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and Puppet
 
Autoscaling Best Practices
Autoscaling Best PracticesAutoscaling Best Practices
Autoscaling Best Practices
 
Rackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & PackerRackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & Packer
 
Introduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech nightIntroduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech night
 
Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Introduction to hadoop

  • 1. Marc  Cluet  –  Lynx  Consultants   What’s  behind  Big  Data  
  • 2. What we’ll cover? ¡  Understand  Hadoop  components   ¡  Understand  different  technologies  involved   ¡  Embrace  Big  Data!   Lynx  Consultants  ©  2013  
  • 3. What is Big Data? Lynx  Consultants  ©  2013  
  • 4. What is Big Data? ¡   SQL  has  a  limited  ability  to  process  changing  data   §  SQL  schemas  are  the  truth,  data  needs  to  fit  that   Lynx  Consultants  ©  2013  
  • 5. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   Lynx  Consultants  ©  2013  
  • 6. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   §  Designed  to  handle  Terabytes  of  data   Lynx  Consultants  ©  2013  
  • 7. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   §  Designed  to  handle  Terabytes  of  data   §  Designed  for  fault  tolerance  and  securing  data   Lynx  Consultants  ©  2013  
  • 8. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   §  Designed  to  handle  Terabytes  of  data   §  Designed  for  fault  tolerance  and  securing  data   §  Designed  around  exploiting  hardware  to  the  fullest   Lynx  Consultants  ©  2013  
  • 9. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   §  Designed  to  handle  Terabytes  of  data   §  Designed  for  fault  tolerance  and  securing  data   §  Designed  around  exploiting  hardware  to  the  fullest   §  Designed  around  Map/Reduce   Lynx  Consultants  ©  2013  
  • 10. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 11. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 12. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 13. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 14. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 15. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 16. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 17. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 18. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 19. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 20. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 21. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 22. What is Hadoop? Lynx  Consultants  ©  2013  
  • 23. What is Hadoop? ¡   Hadoop  is  one  of  the  big  players  for  Big  Data   §  Developed  as  an  Open  Source  implementation  to  implement   Google  BigTable   Lynx  Consultants  ©  2013  
  • 24. What is Hadoop? ¡   Hadoop  is  one  of  the  big  players  for  Big  Data   §  Developed  as  an  Open  Source  implementation  to  implement   Google  BigTable   §  Mainly  developed  at  Yahoo!   Lynx  Consultants  ©  2013  
  • 25. What is Hadoop? ¡   Hadoop  is  one  of  the  big  players  for  Big  Data   §  Developed  as  an  Open  Source  implementation  to  implement   Google  BigTable   §  Mainly  developed  at  Yahoo!   §  Current  companies  behind  it:  Hortonworks  and  Cloudera   Lynx  Consultants  ©  2013  
  • 26. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   §  HDFS  is  a  distributed  filesystem  across  many  nodes   §  Has  many  copies  of  your  data  (default:  3)   §  If  one  node  goes  down  makes  sure  all  the  data  is  rebalanced   Lynx  Consultants  ©  2013  
  • 27. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   Lynx  Consultants  ©  2013  
  • 28. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   §  Schemaless  Key-­‐Value  storage   §  All  data  exportable  in  JSON   Lynx  Consultants  ©  2013  
  • 29. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   Lynx  Consultants  ©  2013  
  • 30. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   ¡   Map/Reduce  –  The  key  to  it  all   §  This  was  invented  by  Google   §  Given  a  dataset  we  Map  all  that  match  a  criteria   §  Then  we  Reduce  this  to  a  result   Lynx  Consultants  ©  2013  
  • 31. What are the features of Hadoop? ¡  Map/Reduce  –  The  key  to  it  all   Lynx  Consultants  ©  2013  
  • 32. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   ¡   Map/Reduce  –  The  key  to  it  all   ¡   Hive  –  SQL  for  NoSQL   §  Hive  provides  a  SQL  language  called  HiveSQL   §  Provides  a  good  entrance  for  SQL  users  :)   Lynx  Consultants  ©  2013  
  • 33. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   ¡   Map/Reduce  –  The  key  to  it  all   ¡   Hive  –  SQL  for  NoSQL   ¡   Pig  –  Map/Reduce  made  easy   §  Creates  data  results  given  a  reduced  language   §  Reinvents  SQL  somehow   Lynx  Consultants  ©  2013  
  • 34. What are the features of Hadoop? ¡   Hive   Lynx  Consultants  ©  2013  
  • 35. What are the features of Hadoop? ¡   Pig   Lynx  Consultants  ©  2013  
  • 36. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   ¡   Map/Reduce  –  The  key  to  it  all   ¡   Hive  –  SQL  for  NoSQL   ¡   Pig  –  Map/Reduce  made  easy   ¡   Flume  –  Fault  Tolerant  transport   Lynx  Consultants  ©  2013  
  • 37. What are the features of Hadoop? ¡   Flume   §  Divides  in  Sources,  Channels,  Sinks   §  Can  have  multiple  of  everything,  makes  it  fault  tolerant   §  Many  sources!   ▪  Avro,  Exec,  JMS,  Syslog,  HTTP,  NetCat,  Your  Own  (Java)   Lynx  Consultants  ©  2013  
  • 38. What are the features of Hadoop? ¡   Flume   §  Divides  in  Sources,  Channels,  Sinks   §  Can  have  multiple  of  everything,  makes  it  fault  tolerant   §  Many  sources!   §  Many  channels!   ▪  Memory,  File,  Your  Own  (Java)   Lynx  Consultants  ©  2013  
  • 39. What are the features of Hadoop? ¡   Flume   §  Divides  in  Sources,  Channels,  Sinks   §  Can  have  multiple  of  everything,  makes  it  fault  tolerant   §  Many  sources!   §  Many  channels!   §  Many  sinks!   ▪  Avro,  HDFS,  Logger,  IRC,  File,  Hbase,  ElasticSearch,  S3,  Community   sinks,  Your  Own  (Java)   Lynx  Consultants  ©  2013  
  • 40. What are the features of Hadoop? ¡   Flume   Lynx  Consultants  ©  2013  
  • 41. How Hadoop looks like in a DC ¡   Components   §  Primary  Namenode   §  Secondary  Namenode   §  Data  Node   Lynx  Consultants  ©  2013  
  • 42. How Hadoop looks like in a DC ¡   Components   §  Primary  Namenode   ▪  Controls  all  the  cluster,  knows  where  the  data  resides   ▪  Runs  the  job  tracker  to  keep  track  of  Map/Reduce  jobs   ▪  Biggest  point  of  failure,  shadowing  it  is  a  potential  option   §  Secondary  Namenode   §  Data  Node   Lynx  Consultants  ©  2013  
  • 43. How Hadoop looks like in a DC ¡   Components   §  Primary  Namenode   §  Secondary  Namenode   ▪  Performs  secondary  cleanup  options   §  Data  Node   Lynx  Consultants  ©  2013  
  • 44. How Hadoop looks like in a DC ¡   Components   §  Primary  Namenode   §  Secondary  Namenode   §  Data  Node   ▪  Stores  all  the  information   ▪  Runs  Map/Reduce   Lynx  Consultants  ©  2013  
  • 45. How Hadoop looks like in a DC ¡   Components   Lynx  Consultants  ©  2013