Forrester predicts, CIOs who are late to the Hadoop game will finally make the platform a priority in 2015. Hadoop has evolved as a must-to-know technology and has been a reason for better career, salary and job opportunities for many professionals.
2. www.edureka.co/big-data-and-hadoopSlide 2
Objectives
At the end of this module, you will be able to…
Understand When not to use Hadoop
» Real Time Analytics
» Not a Replacement
» Dataset Size
» Complexity
» Security
Understand When to use Hadoop
» Huge Unstructured Datasets
» Response Time is Not an Issue
» Future Planning
» Multiple Frameworks for Big Data
» Lifetime Data Availability
4. Slide 4Slide 4 www.edureka.co/big-data-and-hadoopSlide 4
When Not To Use Hadoop
5. Slide 5Slide 5 www.edureka.co/big-data-and-hadoopSlide 5
If you want to do some Real Time Analytics, where you are expecting result quickly, Hadoop
should not be used directly
Hadoop works on Batch processing, hence response time is high
Day 1 Day 2 Day 3 Day 4 ......... ………. ………. Day n
Day 1 Day 2 Day 3 Day 4 ......... ………. ………. Day n
Input
Data
Processing
Data
Input
Data
Processing
Data
Input
Data
Processing
Data
Input Data
Processing Data using MR
Time Lag
Real Time Analytics
6. Slide 6Slide 6 www.edureka.co/big-data-and-hadoopSlide 6
Real Time Analytics – Accepted Way
Streaming
Data
Storing
7. Slide 7Slide 7 www.edureka.co/big-data-and-hadoopSlide 7
14 sec
0.6 sec
Real Time Analytics – Accepted Way
8. Slide 8Slide 8 www.edureka.co/big-data-and-hadoopSlide 8
Hadoop is not a replacement for your existing data processing infrastructure
After processing the data in Hadoop you need to send the output to relational database technologies today
for BI, decision support, reporting etc
It’s not going to replace your database, but your database isn’t likely to replace Hadoop either
Different tools for different jobs
Not a Replacement for Existing Infrastructure
9. Slide 9Slide 9 www.edureka.co/big-data-and-hadoopSlide 9
Hadoop framework is not recommendable for small structured datasets as you have other tools available
in market which can do this work quite easily and at a fast pace than Hadoop like MS excel, RDBMS etc
For a small data analytics, Hadoop can be costlier than other tools
Merge all the small files into one
Multiple Smaller Datasets – Accepted Way
10. Slide 10Slide 10 www.edureka.co/big-data-and-hadoopSlide 10
Multiple Smaller Datasets – Accepted Way
4225284
Each file of x MB Slow Execution – 10400 ms
4225284
All the above
files merged into
one file (9x MB)
Fast Execution – 6140 ms
Same OutputSame Input
11. Slide 11Slide 11 www.edureka.co/big-data-and-hadoopSlide 11
Unless you have a better understanding of the Hadoop framework, its not suggested to use Hadoop for
production
Learning Hadoop and it eco-system tools and deciding which technology suits your need is again a different level
of complexity
Novice Hadoopers
12. Slide 12Slide 12 www.edureka.co/big-data-and-hadoopSlide 12
Many enterprises — especially within highly regulated industries dealing with sensitive data— aren’t able to
move as quickly as they would like towards implementing Big Data projects and Hadoop
“Example Health-care data used by Insurance companies to calculate premium”
Where Security is the Primary Concern?
They don’t have to hesitate though,
as many of the security and
compliance challenges are being
continuously worked upon and can be
surmountable (for example, by using
Apache Accumulo on top of Hadoop).
13. Slide 13Slide 13 www.edureka.co/big-data-and-hadoopSlide 13
Where security is the primary concern – Accepted way
Healthcare Data
Hadoop Analytic Integration
Healthcare Data
Hadoop Analytic Integration
14. Slide 14Slide 14 www.edureka.co/big-data-and-hadoopSlide 14
When To Use Hadoop
15. Slide 15Slide 15 www.edureka.co/big-data-and-hadoopSlide 15
Your have different types of data : structured, semi-structured
and unstructured
The data set is huge in size i.e. several Terabytes or Petabytes
You are not in a hurry for Answers
Data Size and Data Diversity
16. Slide 16Slide 16 www.edureka.co/big-data-and-hadoopSlide 16
To implement Hadoop on you data you should first understand the level of complexity of data and the rate it is
going to grow
So we need a cluster planning, its may begin with building a small or medium cluster in your industry as per
data (in GBs or few TBs ) available at present and scale up your cluster in future depending on the growth of
your data
Future Planning
17. Slide 17Slide 17 www.edureka.co/big-data-and-hadoopSlide 17
Hadoop can be integrated with multiple analytic tools to get the best out of it, like M-Learning, R , Python,
Spark, MongoDB etc.
Multiple Frameworks for Big Data
18. Slide 18Slide 18 www.edureka.co/big-data-and-hadoopSlide 18
When you want your data to be live and running forever, it can be achieved using Hadoop’s scalability
Lifetime Data Availability
20. LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
Slide 20 www.edureka.co/big-data-and-hadoop
How it Works?