SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Introduction to big data and hadoop
Slide 2 CMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co www.edureka.co/big-data-and-hadoop
Objectives
At the end of this session , you will understand the:
 Big Data Introduction
 Use Cases of Big Data in Multiple Industry Verticals
 Hadoop and Its Eco-System
 Hadoop Architecture
 Learning Path for Developers, Administrators, Testing Professionals and Aspiring Data Scientists
Slide 3 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Un-structured Data is Exploding
Source: Twitter
Slide 4 www.edureka.co/big-data-and-hadoop
IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
IBM’s Definition of Big Data
Slide 5 www.edureka.co/big-data-and-hadoop
Annie’s Introduction
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Slide 6 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Annie’s Question
Map the following to corresponding data type:
» XML files, e-mail body
» Audio, Video, Images, Archived documents
» Data from Enterprise systems (ERP, CRM etc.)
Slide 7 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Annie’s Answer
Ans. XML files, e-mail body  Semi-structured data
Audio, Video, Image, Files, Archived documents  Unstructured data
Data from Enterprise systems (ERP, CRM etc.)  Structured data
Slide 8 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Further Reading
More on Big Data
http://www.edureka.in/blog/the-hype-behind-big-data/
Why Hadoop?
http://www.edureka.in/blog/why-hadoop/
Opportunities in Hadoop
http://www.edureka.in/blog/jobs-in-hadoop/
Big Data
http://en.wikipedia.org/wiki/Big_Data
IBM’s definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
Slide 9Slide 9Slide 9 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Common Big Data Customer Scenarios
 Web and e-tailing
» Recommendation Engines
» Ad Targeting
» Search Quality
» Abuse and Click Fraud Detection
 Telecommunications
» Customer Churn Prevention
» Network Performance Optimization
» Calling Data Record (CDR) Analysis
» Analysing Network to Predict Failure
http://wiki.apache.org/hadoop/PoweredBy
Slide 10Slide 10Slide 10 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
 Government
» Fraud Detection and Cyber Security
» Welfare Schemes
» Justice
 Healthcare and Life Sciences
» Health Information Exchange
» Gene Sequencing
» Serialization
» Healthcare Service Quality Improvements
» Drug Safety
http://wiki.apache.org/hadoop/PoweredBy
Common Big Data Customer Scenarios (Contd.)
Slide 11Slide 11Slide 11 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Common Big Data Customer Scenarios (Contd.)
 Banks and Financial services
» Modeling True Risk
» Threat Analysis
» Fraud Detection
» Trade Surveillance
» Credit Scoring and Analysis
 Retail
» Point of Sales Transaction Analysis
» Customer Churn Analysis
» Sentiment Analysis
http://wiki.apache.org/hadoop/PoweredBy
Slide 12Slide 12Slide 12 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Why DFS?
Read 1 TB Data
4 I/O Channels
Each Channel – 100 MB/s
1 Machine
4 I/O Channels
Each Channel – 100 MB/s
10 Machine
Slide 13Slide 13Slide 13 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Why DFS? (Contd.)
4 I/O Channels
Each Channel – 100 MB/s
1 Machine
4 I/O Channels
Each Channel – 100 MB/s
10 Machine
43 Minutes
Read 1 TB Data
Slide 14Slide 14Slide 14 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Why DFS? (Contd.)
4 I/O Channels
Each Channel – 100 MB/s
1 Machine
4 I/O Channels
Each Channel – 100 MB/s
10 Machine
4.3 Minutes43 Minutes
Read 1 TB Data
Slide 15Slide 15Slide 15 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Slide 16 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Hadoop Cluster: A Typical Use Case
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 32 GB,
Hard disk: 1 TB
Processor: Xenon with 4 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
Active NameNodeSecondary NameNode
DataNode DataNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
StandBy NameNode
Slide 17Slide 17Slide 17 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Hidden Treasure
 Insight into data can provide Business Advantage.
 Some key early indicators can mean Fortunes to Business.
 More Precise Analysis with more data.
*Sears was using traditional systems such as Oracle Exadata, Teradata and
SAS etc., to store and process the customer activity and sales data.
Case Study: Sears Holding Corporation
http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038?
Slide 18Slide 18Slide 18 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Mostly Append
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
ETL Compute Grid
Storage only Grid (Original Raw Data)
Collection
Instrumentation
A meagre
10% of the
~2PB data is
available for
BI
Storage
2. Moving data to compute
doesn’t scale
90% of
the ~2PB
archived
Processing
3. Premature data
death
1. Can’t explore original
high fidelity raw data
Limitations of Existing Data Analytics Architecture
Slide 19Slide 19Slide 19 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Mostly Append
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
Hadoop : Storage + Compute Grid
Collection
Instrumentation
Both
Storage
And
Processing
Entire ~2PB
Data is
available for
processing
No Data
Archiving
1. Data Exploration &
Advanced analytics
2. Scalable throughput for ETL &
aggregation
3. Keep data alive
forever
*Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather than a meagre 10% as
was the case with existing Non-Hadoop solutions.
Solution: A Combined Storage Computer Layer
Slide 20 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Annie’s Question
Hadoop is a framework that allows for the distributed
processing of:
» Small Data Sets
» Large Data Sets
Slide 21 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Annie’s Answer
Ans. Large Data Sets.
It is also capable of processing small data-sets. However, to
experience the true power of Hadoop, one needs to have
data in TB’s. Because this is where RDBMS takes hours and
fails whereas Hadoop does the same in couple of minutes.
Slide 22Slide 22Slide 22 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Hadoop Ecosystem
Pig Latin
Data Analysis
Hive
DW System
Other
YARN
Frameworks
(MPI, GRAPH)
HBaseMapReduce Framework
YARN
Cluster Resource Management
Apache Oozie
(Workflow)
HDFS
(Hadoop Distributed File System)
Hadoop 2.0
Sqoop
Unstructured or
Semi-structured Data Structured Data
Flume
Mahout
Machine Learning
Slide 23 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Hadoop Cluster: Facebook
Facebook
 We use Hadoop to store copies of internal log and dimension data sources and use
it as a source for reporting/analytics and machine learning.
 Currently we have 2 major clusters:
» A 1100-machine cluster with 8800 cores and about 12 PB raw storage.
» A 300-machine cluster with 2400 cores and about 3 PB raw storage.
» Each (commodity) node has 8 cores and 12 TB of storage.
» We are heavy users of both streaming as well as the Java APIs. We have
built a higher level data warehousing framework using these features called
Hive(see the http://Hadoop.apache.org/hive/). We have also developed a
FUSE implementation over HDFS.
Slide 24 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
BATCH
(MapReduce)
INTERACTIVE
(Text)
ONLINE
(HBase)
STREAMING
(Storm, S4, …)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
HPC MPI
(OpenMPI)
OTHER
(Search)
(Weave..)
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html
YARN – Moving beyond MapReduce
Slide 25 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
 No daemons, everything runs in a single JVM.
 Suitable for running MapReduce programs during development.
 Has no DFS.
 Hadoop daemons run on the local machine.
 Hadoop daemons run on a cluster of machines.
Standalone (or Local) Mode
Hadoop Cluster Modes
Slide 26 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Big Data Learning Path
• Java / Python / Ruby
• Hadoop Eco-system
• NoSQL DB
• Spark
• Linux Administration
• Cluster Management
• Cluster Performance
• Virtualization
• Statistics Skills
• Machine Learning
• Hadoop Essentials
• Expertise in R
Developer/Testing
Administration
Data Analyst
Big Data and Hadoop
MapReduce
Design Patterns
Apache
Spark & Scala
Apache Cassandra
Linux Administration Hadoop Administration
Data Science
Business Analytics
Using R
Advance Predictive
Modelling in R
Talend for Big Data
Data Visualization
Using Tableau
Slide 27 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Learning Path to Certification
CourseLIVE Online Class Class Recording in LMS
24/7 Post Class Support Module Wise Quiz and Assignment
Project Work
Verifiable Certificate
1. Assistance from Peers and
Support team
2. Review for Certification
Slide 28 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Further Reading
 Apache Hadoop and HDFS
http://www.edureka.in/blog/introduction-to-apache-hadoop-hdfs/
 Apache Hadoop HDFS Architecture
http://www.edureka.in/blog/apache-hadoop-hdfs-architecture/
Slide 29 www.edureka.co/big-data-and-hadoop
Assignment
Referring the documents present in the LMS under assignment solve the below problem.
How many such DataNodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster?
Slide 30
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make
the course better!
Please spare few minutes to take the survey after the webinar.
www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
Survey
Introduction to Big Data & Hadoop

Más contenido relacionado

La actualidad más candente

Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsEdureka!
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Edureka!
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Hadoop Career Path and Interview Preparation
Hadoop Career Path and Interview PreparationHadoop Career Path and Interview Preparation
Hadoop Career Path and Interview PreparationEdureka!
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And HadoopEdureka!
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Learn Hadoop
Learn HadoopLearn Hadoop
Learn HadoopEdureka!
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 

La actualidad más candente (20)

Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionals
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Hadoop Career Path and Interview Preparation
Hadoop Career Path and Interview PreparationHadoop Career Path and Interview Preparation
Hadoop Career Path and Interview Preparation
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Learn Hadoop
Learn HadoopLearn Hadoop
Learn Hadoop
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 

Destacado

5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!Edureka!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!Edureka!
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Edureka!
 
Apache spark
Apache spark Apache spark
Apache spark Edureka!
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceEdureka!
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analyticsEdureka!
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaEdureka!
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with KafkaEdureka!
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache SparkEdureka!
 
Monkey runner & Monkey testing
Monkey runner & Monkey testingMonkey runner & Monkey testing
Monkey runner & Monkey testingSWAAM Tech
 
Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Gerben Zaagsma
 
Touch Screen Based Home Automation System
Touch Screen Based Home Automation SystemTouch Screen Based Home Automation System
Touch Screen Based Home Automation SystemEdgefxkits & Solutions
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduceEdureka!
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch ProcessingEdureka!
 

Destacado (20)

5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
 
Apache spark
Apache spark Apache spark
Apache spark
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Monkey talk
Monkey talkMonkey talk
Monkey talk
 
Monkey runner & Monkey testing
Monkey runner & Monkey testingMonkey runner & Monkey testing
Monkey runner & Monkey testing
 
HMI
HMIHMI
HMI
 
Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...
 
Human machine interface
Human machine interfaceHuman machine interface
Human machine interface
 
Touch Screen Based Home Automation System
Touch Screen Based Home Automation SystemTouch Screen Based Home Automation System
Touch Screen Based Home Automation System
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
 

Similar a Introduction to Big Data & Hadoop

Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15Edureka!
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Apache Storm
Apache StormApache Storm
Apache StormEdureka!
 
Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big DataEdureka!
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...Precisely
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to HadoopEdureka!
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use HadoopEdureka!
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend Edureka!
 
Hadoop is Happening
Hadoop is HappeningHadoop is Happening
Hadoop is HappeningPrecisely
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
Ensuring compliance of patient data with big data
Ensuring compliance of patient data with big dataEnsuring compliance of patient data with big data
Ensuring compliance of patient data with big dataAyad Shammout
 

Similar a Introduction to Big Data & Hadoop (20)

Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big Data
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend
 
Hadoop is Happening
Hadoop is HappeningHadoop is Happening
Hadoop is Happening
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
Ensuring compliance of patient data with big data
Ensuring compliance of patient data with big dataEnsuring compliance of patient data with big data
Ensuring compliance of patient data with big data
 

Más de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Más de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Último

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Último (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Introduction to Big Data & Hadoop

  • 1. www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Introduction to big data and hadoop
  • 2. Slide 2 CMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co www.edureka.co/big-data-and-hadoop Objectives At the end of this session , you will understand the:  Big Data Introduction  Use Cases of Big Data in Multiple Industry Verticals  Hadoop and Its Eco-System  Hadoop Architecture  Learning Path for Developers, Administrators, Testing Professionals and Aspiring Data Scientists
  • 3. Slide 3 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Un-structured Data is Exploding Source: Twitter
  • 4. Slide 4 www.edureka.co/big-data-and-hadoop IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ IBM’s Definition of Big Data
  • 5. Slide 5 www.edureka.co/big-data-and-hadoop Annie’s Introduction Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions.
  • 6. Slide 6 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Annie’s Question Map the following to corresponding data type: » XML files, e-mail body » Audio, Video, Images, Archived documents » Data from Enterprise systems (ERP, CRM etc.)
  • 7. Slide 7 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Annie’s Answer Ans. XML files, e-mail body  Semi-structured data Audio, Video, Image, Files, Archived documents  Unstructured data Data from Enterprise systems (ERP, CRM etc.)  Structured data
  • 8. Slide 8 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Further Reading More on Big Data http://www.edureka.in/blog/the-hype-behind-big-data/ Why Hadoop? http://www.edureka.in/blog/why-hadoop/ Opportunities in Hadoop http://www.edureka.in/blog/jobs-in-hadoop/ Big Data http://en.wikipedia.org/wiki/Big_Data IBM’s definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/
  • 9. Slide 9Slide 9Slide 9 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Common Big Data Customer Scenarios  Web and e-tailing » Recommendation Engines » Ad Targeting » Search Quality » Abuse and Click Fraud Detection  Telecommunications » Customer Churn Prevention » Network Performance Optimization » Calling Data Record (CDR) Analysis » Analysing Network to Predict Failure http://wiki.apache.org/hadoop/PoweredBy
  • 10. Slide 10Slide 10Slide 10 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co  Government » Fraud Detection and Cyber Security » Welfare Schemes » Justice  Healthcare and Life Sciences » Health Information Exchange » Gene Sequencing » Serialization » Healthcare Service Quality Improvements » Drug Safety http://wiki.apache.org/hadoop/PoweredBy Common Big Data Customer Scenarios (Contd.)
  • 11. Slide 11Slide 11Slide 11 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Common Big Data Customer Scenarios (Contd.)  Banks and Financial services » Modeling True Risk » Threat Analysis » Fraud Detection » Trade Surveillance » Credit Scoring and Analysis  Retail » Point of Sales Transaction Analysis » Customer Churn Analysis » Sentiment Analysis http://wiki.apache.org/hadoop/PoweredBy
  • 12. Slide 12Slide 12Slide 12 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Why DFS? Read 1 TB Data 4 I/O Channels Each Channel – 100 MB/s 1 Machine 4 I/O Channels Each Channel – 100 MB/s 10 Machine
  • 13. Slide 13Slide 13Slide 13 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Why DFS? (Contd.) 4 I/O Channels Each Channel – 100 MB/s 1 Machine 4 I/O Channels Each Channel – 100 MB/s 10 Machine 43 Minutes Read 1 TB Data
  • 14. Slide 14Slide 14Slide 14 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Why DFS? (Contd.) 4 I/O Channels Each Channel – 100 MB/s 1 Machine 4 I/O Channels Each Channel – 100 MB/s 10 Machine 4.3 Minutes43 Minutes Read 1 TB Data
  • 15. Slide 15Slide 15Slide 15 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co
  • 16. Slide 16 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Hadoop Cluster: A Typical Use Case RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores. Ethernet: 3 x 10 GB/s OS: 64-bit CentOS RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 32 GB, Hard disk: 1 TB Processor: Xenon with 4 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply Active NameNodeSecondary NameNode DataNode DataNode RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply StandBy NameNode
  • 17. Slide 17Slide 17Slide 17 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Hidden Treasure  Insight into data can provide Business Advantage.  Some key early indicators can mean Fortunes to Business.  More Precise Analysis with more data. *Sears was using traditional systems such as Oracle Exadata, Teradata and SAS etc., to store and process the customer activity and sales data. Case Study: Sears Holding Corporation http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-on-hadoop/d/d-id/1107038?
  • 18. Slide 18Slide 18Slide 18 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Mostly Append BI Reports + Interactive Apps RDBMS (Aggregated Data) ETL Compute Grid Storage only Grid (Original Raw Data) Collection Instrumentation A meagre 10% of the ~2PB data is available for BI Storage 2. Moving data to compute doesn’t scale 90% of the ~2PB archived Processing 3. Premature data death 1. Can’t explore original high fidelity raw data Limitations of Existing Data Analytics Architecture
  • 19. Slide 19Slide 19Slide 19 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Mostly Append BI Reports + Interactive Apps RDBMS (Aggregated Data) Hadoop : Storage + Compute Grid Collection Instrumentation Both Storage And Processing Entire ~2PB Data is available for processing No Data Archiving 1. Data Exploration & Advanced analytics 2. Scalable throughput for ETL & aggregation 3. Keep data alive forever *Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather than a meagre 10% as was the case with existing Non-Hadoop solutions. Solution: A Combined Storage Computer Layer
  • 20. Slide 20 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Annie’s Question Hadoop is a framework that allows for the distributed processing of: » Small Data Sets » Large Data Sets
  • 21. Slide 21 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Annie’s Answer Ans. Large Data Sets. It is also capable of processing small data-sets. However, to experience the true power of Hadoop, one needs to have data in TB’s. Because this is where RDBMS takes hours and fails whereas Hadoop does the same in couple of minutes.
  • 22. Slide 22Slide 22Slide 22 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Hadoop Ecosystem Pig Latin Data Analysis Hive DW System Other YARN Frameworks (MPI, GRAPH) HBaseMapReduce Framework YARN Cluster Resource Management Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Hadoop 2.0 Sqoop Unstructured or Semi-structured Data Structured Data Flume Mahout Machine Learning
  • 23. Slide 23 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Hadoop Cluster: Facebook Facebook  We use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning.  Currently we have 2 major clusters: » A 1100-machine cluster with 8800 cores and about 12 PB raw storage. » A 300-machine cluster with 2400 cores and about 3 PB raw storage. » Each (commodity) node has 8 cores and 12 TB of storage. » We are heavy users of both streaming as well as the Java APIs. We have built a higher level data warehousing framework using these features called Hive(see the http://Hadoop.apache.org/hive/). We have also developed a FUSE implementation over HDFS.
  • 24. Slide 24 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co BATCH (MapReduce) INTERACTIVE (Text) ONLINE (HBase) STREAMING (Storm, S4, …) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..) http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html YARN – Moving beyond MapReduce
  • 25. Slide 25 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Hadoop can run in any of the following three modes: Fully-Distributed Mode Pseudo-Distributed Mode  No daemons, everything runs in a single JVM.  Suitable for running MapReduce programs during development.  Has no DFS.  Hadoop daemons run on the local machine.  Hadoop daemons run on a cluster of machines. Standalone (or Local) Mode Hadoop Cluster Modes
  • 26. Slide 26 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Big Data Learning Path • Java / Python / Ruby • Hadoop Eco-system • NoSQL DB • Spark • Linux Administration • Cluster Management • Cluster Performance • Virtualization • Statistics Skills • Machine Learning • Hadoop Essentials • Expertise in R Developer/Testing Administration Data Analyst Big Data and Hadoop MapReduce Design Patterns Apache Spark & Scala Apache Cassandra Linux Administration Hadoop Administration Data Science Business Analytics Using R Advance Predictive Modelling in R Talend for Big Data Data Visualization Using Tableau
  • 27. Slide 27 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Learning Path to Certification CourseLIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz and Assignment Project Work Verifiable Certificate 1. Assistance from Peers and Support team 2. Review for Certification
  • 28. Slide 28 www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Further Reading  Apache Hadoop and HDFS http://www.edureka.in/blog/introduction-to-apache-hadoop-hdfs/  Apache Hadoop HDFS Architecture http://www.edureka.in/blog/apache-hadoop-hdfs-architecture/
  • 29. Slide 29 www.edureka.co/big-data-and-hadoop Assignment Referring the documents present in the LMS under assignment solve the below problem. How many such DataNodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster?
  • 30. Slide 30 Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better! Please spare few minutes to take the survey after the webinar. www.edureka.co/big-data-and-hadoopCMC Contact : aparna.jaiswal@cmcltd.com Edureka Contact : corp@edureka.co Survey