SlideShare una empresa de Scribd logo
1 de 27
THE   ANSWER
     TO THE   QUESTION
     OF THE   DATA

eleks         by Victor Haydin
DevTalks #1
Gordon Moore
1975                                       2012
                 Cost of 1 TB storage

$208 000 000                                     $110

         Cost of 1 GFLOPS/s computing facility

$62 000 000                                      $1.50

               Number of network hosts

    57                                   > 1 000 000 000


      World’s data amount
~130 GB                                  ~2.9 ZB
1 ZB = 1 000 000 000 000 000 000 000 B
                 (1021)
Commodity Hardware
Wikipedia: “Apache Hadoop is a software
framework that supports data-intensive
distributed applications”
Main Contributors
HDFS: Hadoop Distributed File System

   Hardware Failure


   Streaming Data Access


   Large Data Sets


   Simple Coherency Mode (write-once)


   Portability
Moving Computation is cheaper then
          moving Data
MapReduce
Map(k1,v1) → list(k2,v2)
void map(string key, string value):
  for each word w in value:
    yield return KeyValuePair(w, 1);


Reduce(k2, list (v2)) → list(v3)
void reduce(string key, int[] values):
  int sum = 0;
  for each pc in values:
    sum += pc;
  return KeyValuePair(key, sum);
Demo
Ecosystem




ZooKeeper
45K nodes, 180-200 PB




3K+ nodes, 36+ PB
powered by
Future
Core:
• HDFS: high-availability and scalability
• MapReduce: modularity and alternative ways to perform queries
Ecosystem development:
• Apache BigTop: consolidation project
• HBase, Hive, Pig, ZooKeeper, Avro, Sqoop: stabilizing,
interoperability
• Incubator: Flume, Ozzie, Whirr
Demo
Q&A

Más contenido relacionado

La actualidad más candente

Guide for visualizing JMA's GSM outputs using GrADS
Guide for visualizing JMA's GSM outputs using GrADSGuide for visualizing JMA's GSM outputs using GrADS
Guide for visualizing JMA's GSM outputs using GrADS
JMA_447
 
Jma hr gsm_data_gr_ads_20130529
Jma hr gsm_data_gr_ads_20130529Jma hr gsm_data_gr_ads_20130529
Jma hr gsm_data_gr_ads_20130529
JMA_447
 

La actualidad más candente (18)

XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
Kubernetes Optimization - How We Cut Our Cloud Infrastructure Cost By 40% Usi...
Kubernetes Optimization - How We Cut Our Cloud Infrastructure Cost By 40% Usi...Kubernetes Optimization - How We Cut Our Cloud Infrastructure Cost By 40% Usi...
Kubernetes Optimization - How We Cut Our Cloud Infrastructure Cost By 40% Usi...
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbag
 
Guide for visualizing JMA's GSM outputs using GrADS
Guide for visualizing JMA's GSM outputs using GrADSGuide for visualizing JMA's GSM outputs using GrADS
Guide for visualizing JMA's GSM outputs using GrADS
 
Containers @ Google
Containers @ GoogleContainers @ Google
Containers @ Google
 
Supporting HDF5 in GrADS
Supporting HDF5 in GrADSSupporting HDF5 in GrADS
Supporting HDF5 in GrADS
 
SoCal Data Science Conference: Machine Learning & Data Science in the Age of ...
SoCal Data Science Conference: Machine Learning & Data Science in the Age of ...SoCal Data Science Conference: Machine Learning & Data Science in the Age of ...
SoCal Data Science Conference: Machine Learning & Data Science in the Age of ...
 
Data weekender deploying prod grade sql 2019 big data clusters
Data weekender deploying prod grade sql 2019 big data clustersData weekender deploying prod grade sql 2019 big data clusters
Data weekender deploying prod grade sql 2019 big data clusters
 
Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012 Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012
 
Jma hr gsm_data_gr_ads_20130529
Jma hr gsm_data_gr_ads_20130529Jma hr gsm_data_gr_ads_20130529
Jma hr gsm_data_gr_ads_20130529
 
The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications
The Case For Docker In Multi-Cloud Enabled Bioinformatics ApplicationsThe Case For Docker In Multi-Cloud Enabled Bioinformatics Applications
The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications
 
Nebula - Christopher Kemp, Founder, Nebula - OpenStackSV 2014
Nebula - Christopher Kemp, Founder, Nebula - OpenStackSV 2014Nebula - Christopher Kemp, Founder, Nebula - OpenStackSV 2014
Nebula - Christopher Kemp, Founder, Nebula - OpenStackSV 2014
 
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, BetterMachine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
 
-XX:+UseG1GC
-XX:+UseG1GC-XX:+UseG1GC
-XX:+UseG1GC
 
Supporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud servicesSupporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud services
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
 
CPAC Connectome Analysis in the Cloud
CPAC Connectome Analysis in the CloudCPAC Connectome Analysis in the Cloud
CPAC Connectome Analysis in the Cloud
 
50120140506014
5012014050601450120140506014
50120140506014
 

Similar a Hadoop: the Big Answer to the Big Question of the Big Data

Data flow super computing valentina balas
Data flow super computing   valentina balasData flow super computing   valentina balas
Data flow super computing valentina balas
Valentina Emilia Balas
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 

Similar a Hadoop: the Big Answer to the Big Question of the Big Data (20)

IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data era
 
Data flow super computing valentina balas
Data flow super computing   valentina balasData flow super computing   valentina balas
Data flow super computing valentina balas
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data Systems
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
O(1) DHT
O(1) DHTO(1) DHT
O(1) DHT
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De Boer
 

Más de Victor Haydin

Concurrency: how to shoot yourself in both feet. Simultaneously.
Concurrency: how to shoot yourself in both feet. Simultaneously.Concurrency: how to shoot yourself in both feet. Simultaneously.
Concurrency: how to shoot yourself in both feet. Simultaneously.
Victor Haydin
 
Cloud Computing in a Nutshell
Cloud Computing in a NutshellCloud Computing in a Nutshell
Cloud Computing in a Nutshell
Victor Haydin
 
Distributed vcs basics + hg
Distributed vcs basics + hgDistributed vcs basics + hg
Distributed vcs basics + hg
Victor Haydin
 
Web Development: Yesterday, Today, Tomorrow
Web Development: Yesterday, Today, TomorrowWeb Development: Yesterday, Today, Tomorrow
Web Development: Yesterday, Today, Tomorrow
Victor Haydin
 
ASP.Net Core Services
ASP.Net Core ServicesASP.Net Core Services
ASP.Net Core Services
Victor Haydin
 

Más de Victor Haydin (12)

IoT: future that has already happened
IoT: future that has already happenedIoT: future that has already happened
IoT: future that has already happened
 
Marketing by nerds: how R&D actually works
Marketing by nerds: how R&D actually worksMarketing by nerds: how R&D actually works
Marketing by nerds: how R&D actually works
 
How to write your database: the story about Event Store
How to write your database: the story about Event StoreHow to write your database: the story about Event Store
How to write your database: the story about Event Store
 
Not Only Java [JDay Lviv 2013]
Not Only Java [JDay Lviv 2013]Not Only Java [JDay Lviv 2013]
Not Only Java [JDay Lviv 2013]
 
The Renaissance of C++
The Renaissance of C++The Renaissance of C++
The Renaissance of C++
 
Fast & Furious: building HPC solutions in a nutshell
Fast & Furious: building HPC solutions in a nutshellFast & Furious: building HPC solutions in a nutshell
Fast & Furious: building HPC solutions in a nutshell
 
Concurrency: how to shoot yourself in both feet. Simultaneously.
Concurrency: how to shoot yourself in both feet. Simultaneously.Concurrency: how to shoot yourself in both feet. Simultaneously.
Concurrency: how to shoot yourself in both feet. Simultaneously.
 
Cloud Computing in a Nutshell
Cloud Computing in a NutshellCloud Computing in a Nutshell
Cloud Computing in a Nutshell
 
Databases in .NET
Databases in .NETDatabases in .NET
Databases in .NET
 
Distributed vcs basics + hg
Distributed vcs basics + hgDistributed vcs basics + hg
Distributed vcs basics + hg
 
Web Development: Yesterday, Today, Tomorrow
Web Development: Yesterday, Today, TomorrowWeb Development: Yesterday, Today, Tomorrow
Web Development: Yesterday, Today, Tomorrow
 
ASP.Net Core Services
ASP.Net Core ServicesASP.Net Core Services
ASP.Net Core Services
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Hadoop: the Big Answer to the Big Question of the Big Data