SlideShare una empresa de Scribd logo
1 de 29
Data Warehousing con Hadoop
y el paradigma MapReduce
Ismel Martínez Díaz
Motivación y Antecedentes de Hadoop
• Alto volúmenes de datos
• Variedad de formatos
• Alta velocidad de generación de datos
• Sistemas de bases de datos distribuidos
• Programación funcional
Historia
• Creador: Doug Cutting
• 2002 Nutch
• 2004 GFS y MapReduce
• 2006 – 2008 Hadoop (HDFS y MapReduce)
• Actualidad: CloudEra y HortonWorks (Hive, Pig, HBase, etc)
Hadoop
• Proyecto Open Source
• Procesamiento de grandes cantidades de datos
• Computación distribuida
• Escalable, fiable, eficiente y económico
HDFS
• Fallos en el hardware y uso de heartbeats
• Millones de archivos y espacio de nombres único
• Portabilidad
• Escribir una vez, leer varias
• Archivos divididos en bloques y replicación.
• Acceso directo a los datos y validación.
MapReduce
• Programación funcional para la computación distribuida.
HDFS y MapReduce
• Útil
• Sencillo
• Pensamiento funcional y distribuido
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
MapTask
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Format
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
MapTask
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
MapTask
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
MapTask
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
MapTask
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
MapTask
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
MapTask
Output
Formar
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Output
Formar
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
Cliente Job
Tracker
Task
Tracker
map
Input
Formar
combine()
reduce()
region
region
ram Task
Tracker
read
sort
reduce()
Output
Formar
Input
Files
split
split
split
Output
File
Task
Tracker
Task
Tracker
region
region
region
region
Ejemplo
HDFS y MapReduce
Ecosistema Hadoop
Servicios profesionales
RDBMS y Hadoop
RDBMS y Hadoop
Data Warehousing con Hadoop
Data Warehousing con Hadoop
Hive
Gracias

Más contenido relacionado

La actualidad más candente

Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copyMohammad_Tariq
 
Map reduce & HDFS with Hadoop
Map reduce & HDFS with HadoopMap reduce & HDFS with Hadoop
Map reduce & HDFS with HadoopDiego Pacheco
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop foundation for analytics
Hadoop foundation for analyticsHadoop foundation for analytics
Hadoop foundation for analyticsHariniA7
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.MaharajothiP
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache HadoopKMS Technology
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 

La actualidad más candente (20)

Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
 
Map reduce & HDFS with Hadoop
Map reduce & HDFS with HadoopMap reduce & HDFS with Hadoop
Map reduce & HDFS with Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Apache Hadoop HDFS
Apache Hadoop HDFSApache Hadoop HDFS
Apache Hadoop HDFS
 
Bigdata
BigdataBigdata
Bigdata
 
Hadoop foundation for analytics
Hadoop foundation for analyticsHadoop foundation for analytics
Hadoop foundation for analytics
 
Hadoop
HadoopHadoop
Hadoop
 
Big data
Big dataBig data
Big data
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Anju
AnjuAnju
Anju
 

Destacado

Electronic publishing presentation
Electronic publishing presentationElectronic publishing presentation
Electronic publishing presentationWenxiao Wang
 
BU econ professor runs for president, hopes to mobilize young voters
BU econ professor runs for president, hopes to mobilize young votersBU econ professor runs for president, hopes to mobilize young voters
BU econ professor runs for president, hopes to mobilize young votersKalina Newman
 
El peso de la lectura!
El peso de la lectura!El peso de la lectura!
El peso de la lectura!Juan Osvaldo
 
Soldadura Oxiacetilenica y Smaw
Soldadura Oxiacetilenica y SmawSoldadura Oxiacetilenica y Smaw
Soldadura Oxiacetilenica y Smawjoa52
 
Presentación1 comunicacion
Presentación1 comunicacionPresentación1 comunicacion
Presentación1 comunicacionMara Wonka
 
Ipswich Motorway Upgrade-Intelligent Transport Systems (ITS) Layouts 5
Ipswich Motorway Upgrade-Intelligent Transport Systems (ITS) Layouts 5Ipswich Motorway Upgrade-Intelligent Transport Systems (ITS) Layouts 5
Ipswich Motorway Upgrade-Intelligent Transport Systems (ITS) Layouts 5Gary Hayes
 
Transfer cap. presentacion p ower p
Transfer cap. presentacion p ower pTransfer cap. presentacion p ower p
Transfer cap. presentacion p ower pMULTINIVELES
 
Programa completo,lic. en software
Programa completo,lic. en softwarePrograma completo,lic. en software
Programa completo,lic. en softwareVladimir Morote
 
Cuestionario de computacion del primer quimestre
Cuestionario de computacion del primer quimestreCuestionario de computacion del primer quimestre
Cuestionario de computacion del primer quimestrewilychisa
 

Destacado (20)

El tren de la vida
El tren de la vidaEl tren de la vida
El tren de la vida
 
María+p.
María+p.María+p.
María+p.
 
La república.
La república.La república.
La república.
 
T a t
T a tT a t
T a t
 
Electronic publishing presentation
Electronic publishing presentationElectronic publishing presentation
Electronic publishing presentation
 
BU econ professor runs for president, hopes to mobilize young voters
BU econ professor runs for president, hopes to mobilize young votersBU econ professor runs for president, hopes to mobilize young voters
BU econ professor runs for president, hopes to mobilize young voters
 
El peso de la lectura!
El peso de la lectura!El peso de la lectura!
El peso de la lectura!
 
Soldadura Oxiacetilenica y Smaw
Soldadura Oxiacetilenica y SmawSoldadura Oxiacetilenica y Smaw
Soldadura Oxiacetilenica y Smaw
 
Presentación1 comunicacion
Presentación1 comunicacionPresentación1 comunicacion
Presentación1 comunicacion
 
En el silencio de tu alma.
En el silencio de tu alma.En el silencio de tu alma.
En el silencio de tu alma.
 
Ipswich Motorway Upgrade-Intelligent Transport Systems (ITS) Layouts 5
Ipswich Motorway Upgrade-Intelligent Transport Systems (ITS) Layouts 5Ipswich Motorway Upgrade-Intelligent Transport Systems (ITS) Layouts 5
Ipswich Motorway Upgrade-Intelligent Transport Systems (ITS) Layouts 5
 
Como hacerte saber
Como hacerte saberComo hacerte saber
Como hacerte saber
 
Transfer cap. presentacion p ower p
Transfer cap. presentacion p ower pTransfer cap. presentacion p ower p
Transfer cap. presentacion p ower p
 
IBF INTERNACIONAL
IBF INTERNACIONALIBF INTERNACIONAL
IBF INTERNACIONAL
 
Humanismo. Camino a la trascendencia
Humanismo. Camino a la trascendenciaHumanismo. Camino a la trascendencia
Humanismo. Camino a la trascendencia
 
Programa completo,lic. en software
Programa completo,lic. en softwarePrograma completo,lic. en software
Programa completo,lic. en software
 
M. del mar
M. del marM. del mar
M. del mar
 
Cuestionario de computacion del primer quimestre
Cuestionario de computacion del primer quimestreCuestionario de computacion del primer quimestre
Cuestionario de computacion del primer quimestre
 
Cuadro
Cuadro Cuadro
Cuadro
 
Currey_PentateuchTheme
Currey_PentateuchThemeCurrey_PentateuchTheme
Currey_PentateuchTheme
 

Similar a Data warehousing con hadoop y el paradigma map reduce

Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoopyaevents
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete informationbhargavi804095
 
Analytics using big data technologies
Analytics using big data technologiesAnalytics using big data technologies
Analytics using big data technologiesBalakrishnan Vinchu
 

Similar a Data warehousing con hadoop y el paradigma map reduce (20)

Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big data
Big dataBig data
Big data
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Concepts on Hadoop
Concepts on HadoopConcepts on Hadoop
Concepts on Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Big data and hadoop anupama
Big data and hadoop anupamaBig data and hadoop anupama
Big data and hadoop anupama
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Analytics using big data technologies
Analytics using big data technologiesAnalytics using big data technologies
Analytics using big data technologies
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Data warehousing con hadoop y el paradigma map reduce

Notas del editor

  1. Presentarse
  2. Problemas del procesamiento Big Data Se guardan grandes volúmenes de datos de distintos formatos que son generados a una alta velocidad. En específico Bases de datos distribuídas. programación funcional relacionada con el álgebra de monoide
  3. Nutch es un proyecto open source de motor de búsqueda, problemas con escalabilidad en despliegue En el 2004 Google presenta si sistema de archivos distribuídos y la primera propuesta de programación mapReduce para la comunidad open source Estas tecnologías de Google son adoptadas por Nutch En 2006 se crea el proyecto open source Hadoop y se consolida en 2008 con la creación de su propio sistema de archivos distribuídos e implementación de MapReduce. Servicios profesionales y herramientas de gestión. Actualidad, ecosistema Hadoop
  4. Características de software y proyecto. Escalable hasta
  5. Supone que el hardware falla, redundancia, DataNodes send heartbeat to the NameNode – Once every 3 seconds para detectar fallos Altos volúmenes de datos , petabytes de información Correr en cualquier plataforma Sólo se puede añadir datos a los archivos ya existentes, no borrar. En el futuro no se podrán modificar los archivos Normalmente en 64MB ó 128MB Varias réplicas en diferentes DataNodes: normalmente 3 , Una vez el NameNode nos ha indicado donde están los datos el cliente accede directamente a los mismos • Use Checksums to validate data – Use CRC32 (comprobación de redundancia cíclica) • File Creation – Client computes checksum per 512 byte – DataNode stores the checksum • File access – Client retrieves the data and checksum from DataNode – If Validation fails, Client tries other replicas
  6. Procesamien en paralelo y en tiempo real.
  7. El cliente manda la configuración (el nombre de las funciones map y reduce), los directorios de entrada y salida y las clases java que se utilizarán para procesar los datos.
  8. Le devuelve un JobID al Cliente. Y empieza a asignar tareas de tipo map a los TaskTrackers que se muestran disponibles (modelo pull). En función de proximidad de los datos: mismo nodo, mismo rack, mismo switch de red.
  9. Extrae y divide el INPUT a partir de RecordReader e InputFormat.
  10. Entonces se invoca la función MAP que emitirá conjuntos de tipo key/value.
  11. En ocasiones en las fases de map() se puedne producir reducciones parciales así como ordenaciones parciales para favorecer el aprovechamiento de los buffers
  12. Cuando varios tasktrackers han acabado sus fases de mapeo, el JobTracker empezará a asignar tareas reduce() (modelo pull de nuevo).
  13. Cuando varios tasktrackers han acabado sus fases de mapeo, el JobTracker empezará a asignar tareas reduce() (modelo pull de nuevo).
  14. Cuando todas las tareas de MAP se han completado el JobTracker les indicará a todos los TaskTrackers que procedan con la fase final de REDUCE.
  15. Al final se escribirá en el HDFS los archivos de salida previo formateo
  16. Word Count in Spark Python Se lee desde un fichero en el HDFS y se escribe hacia un fichero en el HDFS.
  17. YARN es la última versión del MapReduce permite procesamiento batch, scriptings, SQL y no Sql, en streming, en memoria
  18. Hive Data Warehouse que provee una interfaz SQL HBase base de datos orientada a columnas.
  19. ZooKeeper Coordinacion de los cluster HBase, base de datos no-Sql y de consulta en tiempo real Hive orientado a batch, procesamiento tipo SQL Servicios profesionales y herramientas de gestión y administración : CloudEra y HortonWorks
  20. Comparacion entre Sistemas de base de datos relacionales y Map-reduce propietario, open sourse caro, barato datos estructurados, datos no estructurados semántica relacional, soporte a semántica relacional de modo indirecto soporte indirecto a estructuras de datos complejos, soporte profundo a estructuras de datos complejos soporte a procesamiento transaccional. soporte a iteraciones
  21. Formas de relacionar Hadoop y sistemas de gestión de bases de datos relacionales para poder utilizar herramientas de la inteligencia de negocio.
  22. raw data exists in HDFS, es necesario ETL
  23. Se pueden realizar distintas consultas y obtener varias vistas el procesamiento es en paralelo y los datos se encuentran distribuidos, pueden ser estructurado o no estructurados.
  24. Hive permite definir tablas, datos estructurados realiza consultas SQL que se transforman en operaciones de tipo MapReduce consultar o obtener una vista de los hashtags que están el los tweets.