SlideShare una empresa de Scribd logo
1 de 9
HADOOP & DISTRIBUTED CLOUD
COMPUTING
DATA PROCESSING IN CLOUD




 Presentation By : Rajan Kumar Upadhyay || rajan24oct@gmail.com
CLOUD COMPUTING ?

Cloud computing is a virtual setup box that includes
following
- Delivery of computing as a service rather than product
 - Shared resources are software, utility, hardware provided over a network ( Typically
Internet )

                                   Delivery of computing


                                        Public Utilities


                                     Shared Resources
DISTRIBUTED CLOUD COMPUTING

As the name explains : Distributed computing in cloud
Examples:
• Distributed computing is nothing more than utilizing many networked computers to partition
(split it into many smaller pieces) a question or problem and allow the network to solve the
issue piecemeal
• Software like Hadoop. Written in Java, Hadoop is a scalable, efficient, distributed software
platform designed to process enormous amounts of data. Hadoop can scale to thousands of
computers across many clusters.
• Another instance of distributed computing, for storage instead of processing power, is
bittorrent. A torrent is a file that is split into many pieces and stored on many computers
around the internet. When a local machine wants to access that file, the small pieces are
retrieved and rebuilt.
• P2P network, that send communication/data packages into multiple pieces across multiple
network routes. Then assemble them in receivers end.
Distributed computing on cloud is nothing but next generation framework to utilize the
maximum value of resources over distributed architecure
WHAT IS HADOOP
Flexible infrastructure for large scale computation and data processing on a network of
commodity hardware.
Why Hadoop?
A common infrastructure pattern extracted from building distributed systems


•Scale                                          • Apache.org Open Source project
•Incremental growth                             • Yahoo !, Facebook, Google, Fox, Amazon, IBM,
•Cost                                           NY times uses it for their core infrastructure
•Flexibility                                    • Widely Adopted A valuable and reusable skill set
• Distributed File System                             Taught at major universities
• Distributed Processing Framework                    Easier to hire for
                                                      Easier to train on
                                                      Portable across projects, groups
HOW IT WORKS

HDFS: Hadoop Distributed File System
A distributed file system for large data
• Your data in triplicate ( one local and two remote copies)
• Built-in redundancy, resiliency to large scale failures
 (automated restart and re-allocation )
• Intelligent distribution, striping across racks
• Accommodates very large data sizes On commodity hardware
PROGRAMMING MODEL

There are various programming model for Hadoop
developments. I personally like & experienced with
Map/Reduce

Why Map/Reduce:
•Simple programming technique:
         •   Map(anything)->key, value
         •   Sort, partition on key
         •   Reduce(key,value)->key, value
• No parallel processing / message passing semantics
• Programmable in Java or any other language




                                                       Continued …
PROGRAMMING MODEL



                                                       Gather output of
Create/Allocate                  Move computation       map, sort or
    cluster                          to Data           partition on key




   Put Data                                                  Run          Results of job
                                    Program                reduce          stored on
   into File
                                    Execution               task             HDFS
   System

                                    Your Map code
               Data is split        is copied to the
               into                 allocated nodes,
               blocks, store        preferring nodes
               d in triplicate      that contain
               across your          copies of your
                                    data
               cluster
PRACTICES

Put large data source into HDFS
Perform aggregations, transformations, normalizations on
the data
Load into RDBMS
THANK YOU

Thank you for reading this. I hope you find it useful. Please contact me to
rajan24oct@gmail.com if you have any queries/feedback. My Name is Rajan
Kumar Upadhyay, I have more than 10 years of collective IT experience as a
techie.
If you have anything to share/looking for consulting etc. Please feel free to contact
me.

Más contenido relacionado

La actualidad más candente

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop ClustersPerformance Issues on Hadoop Clusters
Performance Issues on Hadoop ClustersXiao Qin
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceCsaba Toth
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recoverySandeep Singh
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopDataWorks Summit
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1
 
Jstorm introduction-0.9.6
Jstorm introduction-0.9.6Jstorm introduction-0.9.6
Jstorm introduction-0.9.6longda feng
 

La actualidad más candente (19)

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Cppt
CpptCppt
Cppt
 
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop ClustersPerformance Issues on Hadoop Clusters
Performance Issues on Hadoop Clusters
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Presentation
PresentationPresentation
Presentation
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Jstorm introduction-0.9.6
Jstorm introduction-0.9.6Jstorm introduction-0.9.6
Jstorm introduction-0.9.6
 

Similar a Hadoop & distributed cloud computing

Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 

Similar a Hadoop & distributed cloud computing (20)

Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 

Más de Rajan Kumar Upadhyay

Más de Rajan Kumar Upadhyay (8)

Speed Up RPA Deployment 10 times faster
Speed Up RPA Deployment 10 times fasterSpeed Up RPA Deployment 10 times faster
Speed Up RPA Deployment 10 times faster
 
RPA & Supply Chain
RPA  &  Supply ChainRPA  &  Supply Chain
RPA & Supply Chain
 
Tango with django
Tango with djangoTango with django
Tango with django
 
Features of globalization and india in global economy
Features of globalization and india in global economyFeatures of globalization and india in global economy
Features of globalization and india in global economy
 
State of Retail E-commerce In India
State of Retail E-commerce In IndiaState of Retail E-commerce In India
State of Retail E-commerce In India
 
Nextop Cloud computing Platform
Nextop Cloud computing PlatformNextop Cloud computing Platform
Nextop Cloud computing Platform
 
Data analysis & decisions
Data analysis & decisionsData analysis & decisions
Data analysis & decisions
 
Business Intelligence & its Best Practices
Business Intelligence & its Best PracticesBusiness Intelligence & its Best Practices
Business Intelligence & its Best Practices
 

Último

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Hadoop & distributed cloud computing

  • 1. HADOOP & DISTRIBUTED CLOUD COMPUTING DATA PROCESSING IN CLOUD Presentation By : Rajan Kumar Upadhyay || rajan24oct@gmail.com
  • 2. CLOUD COMPUTING ? Cloud computing is a virtual setup box that includes following - Delivery of computing as a service rather than product - Shared resources are software, utility, hardware provided over a network ( Typically Internet ) Delivery of computing Public Utilities Shared Resources
  • 3. DISTRIBUTED CLOUD COMPUTING As the name explains : Distributed computing in cloud Examples: • Distributed computing is nothing more than utilizing many networked computers to partition (split it into many smaller pieces) a question or problem and allow the network to solve the issue piecemeal • Software like Hadoop. Written in Java, Hadoop is a scalable, efficient, distributed software platform designed to process enormous amounts of data. Hadoop can scale to thousands of computers across many clusters. • Another instance of distributed computing, for storage instead of processing power, is bittorrent. A torrent is a file that is split into many pieces and stored on many computers around the internet. When a local machine wants to access that file, the small pieces are retrieved and rebuilt. • P2P network, that send communication/data packages into multiple pieces across multiple network routes. Then assemble them in receivers end. Distributed computing on cloud is nothing but next generation framework to utilize the maximum value of resources over distributed architecure
  • 4. WHAT IS HADOOP Flexible infrastructure for large scale computation and data processing on a network of commodity hardware. Why Hadoop? A common infrastructure pattern extracted from building distributed systems •Scale • Apache.org Open Source project •Incremental growth • Yahoo !, Facebook, Google, Fox, Amazon, IBM, •Cost NY times uses it for their core infrastructure •Flexibility • Widely Adopted A valuable and reusable skill set • Distributed File System Taught at major universities • Distributed Processing Framework Easier to hire for Easier to train on Portable across projects, groups
  • 5. HOW IT WORKS HDFS: Hadoop Distributed File System A distributed file system for large data • Your data in triplicate ( one local and two remote copies) • Built-in redundancy, resiliency to large scale failures (automated restart and re-allocation ) • Intelligent distribution, striping across racks • Accommodates very large data sizes On commodity hardware
  • 6. PROGRAMMING MODEL There are various programming model for Hadoop developments. I personally like & experienced with Map/Reduce Why Map/Reduce: •Simple programming technique: • Map(anything)->key, value • Sort, partition on key • Reduce(key,value)->key, value • No parallel processing / message passing semantics • Programmable in Java or any other language Continued …
  • 7. PROGRAMMING MODEL Gather output of Create/Allocate Move computation map, sort or cluster to Data partition on key Put Data Run Results of job Program reduce stored on into File Execution task HDFS System Your Map code Data is split is copied to the into allocated nodes, blocks, store preferring nodes d in triplicate that contain across your copies of your data cluster
  • 8. PRACTICES Put large data source into HDFS Perform aggregations, transformations, normalizations on the data Load into RDBMS
  • 9. THANK YOU Thank you for reading this. I hope you find it useful. Please contact me to rajan24oct@gmail.com if you have any queries/feedback. My Name is Rajan Kumar Upadhyay, I have more than 10 years of collective IT experience as a techie. If you have anything to share/looking for consulting etc. Please feel free to contact me.