SlideShare una empresa de Scribd logo
1 de 19
stef-bauer.com/2012/12/10/you-need-a-zetta-what
“Big Data”

Hadoop Introduction



     Stefan Bauer
A little about me…

   Data Warehouse Administrator
       Architect (logical/physical)
       DBA (monitoring, space management, etc)
       SSIS Developer (build it… run it… support it)
       SSAS/SSRS (performance tuning, supporting)
       Performance monitoring (is it all working?)
       I am a geek (Some people have pointed that out about me…
        judge for yourself)
What we will cover
   Why do you care (or at least why you should)?
   General overview
   Basic terms (get us on the same page)
   A Look at some of the technology (aka demo)




   All of the technical parts are in a multi-part
    series on my Blog
What kind of data do sort
        through?
   Interesting technology…
   might not be for you




                                    You have big data…
             Getting there… might   and you know it!
             be something
             interesting to start
             working out the
             details…
What is that Hadoop thing I
       keep hearing about?
   A Framework (collection of technologies)
   Complex processing
   Massively parallel
   Large amounts of data
   Commodity hardware
Hadoop … what is it not

   Ad hoc analytics
   Low latency between data arrival,
    analysis, and query usage
   “fast” (speed is a relative thing)
       Facebook has interactive queries on Hadoop
        framework
   Good for small data
Terms
   Cloud
   Cluster
   Hadoop
   Hadoop Distributed File System (HDFS)
   Hue (Web Interface for Mapreduce/Oozie)
   Mapreduce
       Job Tracker
       Task Trackers (on Data Nodes)
   Oozie (Workflow Management)
Terms
   Pig (Distributed Transformation Scripting)
   Beeswax (Wrapper for Hive)
   Hive
       EDW on (10’s, 100’s, 1000’s servers)
       HiveQL (Based on Ansi SQL)
       Reporting Tools/Business Analytics
   Name Node
       Data Nodes
   Zookeeper (Distributed Configuration Management)
   Cloudera/MapR/Amazon/Hortonworks …
HDFS
Cloudera
Hive
Questions?
Questions?

Stef-Bauer.com


@stefbauer


Stef_Bauer@hotmail.com

Más contenido relacionado

La actualidad más candente

Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
Jean-Pierre König
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Lucidworks (Archived)
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Volha Banadyseva
 
Accessing Hadoop Data using Hive
Accessing Hadoop Data using HiveAccessing Hadoop Data using Hive
Accessing Hadoop Data using Hive
Tejas Oza
 

La actualidad más candente (18)

Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Cortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftCortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ Microsoft
 
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud SystemsBig Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud Systems
 
Redis memory optimization sripathi, CTO hashedin
Redis memory optimization   sripathi, CTO hashedinRedis memory optimization   sripathi, CTO hashedin
Redis memory optimization sripathi, CTO hashedin
 
What does the future of Big data look like?How to get a fresher job in data a...
What does the future of Big data look like?How to get a fresher job in data a...What does the future of Big data look like?How to get a fresher job in data a...
What does the future of Big data look like?How to get a fresher job in data a...
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
The world with Cloud, Big Data, ML, IoT and AI
The world with Cloud, Big Data, ML, IoT and AIThe world with Cloud, Big Data, ML, IoT and AI
The world with Cloud, Big Data, ML, IoT and AI
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
 
Accessing Hadoop Data using Hive
Accessing Hadoop Data using HiveAccessing Hadoop Data using Hive
Accessing Hadoop Data using Hive
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
 
Introdution to Apache Hadoop
Introdution to Apache HadoopIntrodution to Apache Hadoop
Introdution to Apache Hadoop
 
Nosql Introduction, Basics
Nosql Introduction, BasicsNosql Introduction, Basics
Nosql Introduction, Basics
 

Destacado

My love
My loveMy love
My love
bymafe
 
Mathematics
MathematicsMathematics
Mathematics
bymafe
 
AT HOME
AT HOMEAT HOME
AT HOME
paula
 
Vs self rest
Vs self restVs self rest
Vs self rest
jianfeng
 
Strange natural landscapes
Strange natural landscapesStrange natural landscapes
Strange natural landscapes
bymafe
 
Test greek
Test greekTest greek
Test greek
bymafe
 
Medical ehtics
Medical ehticsMedical ehtics
Medical ehtics
jianfeng
 
تصاميمي
تصاميميتصاميمي
تصاميمي
botareq
 
Uusi kasvu ja uusi työ akava berd volume
Uusi kasvu ja uusi työ akava berd volumeUusi kasvu ja uusi työ akava berd volume
Uusi kasvu ja uusi työ akava berd volume
Vesa Vuorenkoski
 

Destacado (20)

Sql user group
Sql user groupSql user group
Sql user group
 
My love
My loveMy love
My love
 
Mathematics
MathematicsMathematics
Mathematics
 
Darwinismo digital nova era do windows - ufv
Darwinismo digital   nova era do windows - ufvDarwinismo digital   nova era do windows - ufv
Darwinismo digital nova era do windows - ufv
 
AT HOME
AT HOMEAT HOME
AT HOME
 
Vs self rest
Vs self restVs self rest
Vs self rest
 
Internet per Umarells&Zdaore
Internet per Umarells&Zdaore Internet per Umarells&Zdaore
Internet per Umarells&Zdaore
 
Pasiva
PasivaPasiva
Pasiva
 
Strange natural landscapes
Strange natural landscapesStrange natural landscapes
Strange natural landscapes
 
Photos insolites
Photos insolitesPhotos insolites
Photos insolites
 
Virtualidad
VirtualidadVirtualidad
Virtualidad
 
Hoja julio
Hoja julioHoja julio
Hoja julio
 
Test greek
Test greekTest greek
Test greek
 
Medical ehtics
Medical ehticsMedical ehtics
Medical ehtics
 
تصاميمي
تصاميميتصاميمي
تصاميمي
 
Caso mp3
Caso mp3Caso mp3
Caso mp3
 
Food of the world
Food of the worldFood of the world
Food of the world
 
Uusi kasvu ja uusi työ akava berd volume
Uusi kasvu ja uusi työ akava berd volumeUusi kasvu ja uusi työ akava berd volume
Uusi kasvu ja uusi työ akava berd volume
 
नेपाल भूकंप त्रासदी फाइनल
नेपाल भूकंप त्रासदी फाइनलनेपाल भूकंप त्रासदी फाइनल
नेपाल भूकंप त्रासदी फाइनल
 
Children Included
Children Included Children Included
Children Included
 

Similar a Hadoop intro

Similar a Hadoop intro (20)

SQLSat 245 - Por Onde Começar no BigData
SQLSat 245 - Por Onde Começar no BigDataSQLSat 245 - Por Onde Começar no BigData
SQLSat 245 - Por Onde Começar no BigData
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Hadoop intro

  • 3. A little about me…  Data Warehouse Administrator  Architect (logical/physical)  DBA (monitoring, space management, etc)  SSIS Developer (build it… run it… support it)  SSAS/SSRS (performance tuning, supporting)  Performance monitoring (is it all working?)  I am a geek (Some people have pointed that out about me… judge for yourself)
  • 4. What we will cover  Why do you care (or at least why you should)?  General overview  Basic terms (get us on the same page)  A Look at some of the technology (aka demo)  All of the technical parts are in a multi-part series on my Blog
  • 5. What kind of data do sort through? Interesting technology… might not be for you You have big data… Getting there… might and you know it! be something interesting to start working out the details…
  • 6. What is that Hadoop thing I keep hearing about?  A Framework (collection of technologies)  Complex processing  Massively parallel  Large amounts of data  Commodity hardware
  • 7. Hadoop … what is it not  Ad hoc analytics  Low latency between data arrival, analysis, and query usage  “fast” (speed is a relative thing)  Facebook has interactive queries on Hadoop framework  Good for small data
  • 8. Terms  Cloud  Cluster  Hadoop  Hadoop Distributed File System (HDFS)  Hue (Web Interface for Mapreduce/Oozie)  Mapreduce  Job Tracker  Task Trackers (on Data Nodes)  Oozie (Workflow Management)
  • 9. Terms  Pig (Distributed Transformation Scripting)  Beeswax (Wrapper for Hive)  Hive  EDW on (10’s, 100’s, 1000’s servers)  HiveQL (Based on Ansi SQL)  Reporting Tools/Business Analytics  Name Node  Data Nodes  Zookeeper (Distributed Configuration Management)  Cloudera/MapR/Amazon/Hortonworks …
  • 10. HDFS
  • 11.
  • 12.
  • 14.
  • 15.
  • 16. Hive
  • 17.