Is Hadoop a necessity for Data Science

•Descargar como PPTX, PDF•

3 recomendaciones•1,959 vistas

Edureka!

Tecnología

Slide 2Slide 2Slide 2 www.edureka.co/big-data-and-hadoop
Today we will take you through the following:
 What is Big Data & Hadoop?
 What is a Data Product?
 What is Data Science?
 Why Hadoop for Data Science?
 Is Hadoop a necessity for Data Science?
AGENDA

Slide 3Slide 3Slide 3 www.edureka.co/big-data-and-hadoop
What is
Big Data & Hadoop?

Slide 4Slide 4Slide 4 www.edureka.co/big-data-and-hadoop
BIG DATA
Big data is a popular term used to describe the exponential
growth of data.
Big Data can be either Structured data or Unstructured data
or a combination of both.
Big Data

Slide 5Slide 5Slide 5 www.edureka.co/big-data-and-hadoop
BIGDATA
3 V’s (Volume, Variety and Velocity) are three defining properties or dimensions of Big Data.

Slide 6Slide 6Slide 6 www.edureka.co/big-data-and-hadoop
HADOOP
Hadoop is a programming framework
that supports the processing of large
data sets in a distributed computing
environment.
Hadoop was the first and still
the best tool to handle Big
Data.

Slide 7Slide 7Slide 7 www.edureka.co/big-data-and-hadoop
A BRIEF HISTORY OF HADOOP

Slide 8Slide 8Slide 8 www.edureka.co/big-data-and-hadoop
HADOOP:- HDFS & MAP-REDUCE
Most efficient for Large-Scale Storage & Processing
 HDFS: Distributed file system
Self-Healing Data store
 MAP-REDUCE: Distributed computation framework
that handles the complexities of distributed
programming

Slide 9Slide 9Slide 9 www.edureka.co/big-data-and-hadoop
KEY TO HADOOP’S POWER
 Computation co-located with data
Data and computation system co-designed and co-developed to work
together
 Process data in parallel across thousands of “commodity” hardware
nodes
Self-healing; failure handled by software
 Designed for one write and multiple reads
There are no random writes
Optimized for minimum seek on hard drives

Slide 10Slide 10Slide 10 www.edureka.co/big-data-and-hadoop
What is a Data product?
“A software system whose core functionality
depends on the application of statistical analysis
and machine learning to data.”

Slide 11Slide 11Slide 11 www.edureka.co/big-data-and-hadoop
Example #1: People you may know

Slide 12Slide 12Slide 12 www.edureka.co/big-data-and-hadoop
Example #2: Spell Correction

Slide 13Slide 13Slide 13 www.edureka.co/big-data-and-hadoop
What is
Data Science?

Slide 14Slide 14Slide 14 www.edureka.co/big-data-and-hadoop
DATA SCIENCE
#1: Extracting deep meaning from data
(data mining; finding “gems” in data)

Slide 15Slide 15Slide 15 www.edureka.co/big-data-and-hadoop
Common Data Science tasks

Slide 16Slide 16Slide 16 www.edureka.co/big-data-and-hadoop
DATA SCIENCE
#2: Building Data Products
(Delivering Gems on a regular basis)

Slide 17Slide 17Slide 17 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #1:
Explore full datasets

Slide 18Slide 18Slide 18 www.edureka.co/big-data-and-hadoop
#1: Exploration of Data sets

Slide 19Slide 19Slide 19 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #2:
Mining of larger datasets

Slide 20Slide 20Slide 20 www.edureka.co/big-data-and-hadoop
#2: Mining of larger data sets
More Data ---> Better Outcomes

Slide 21Slide 21Slide 21 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #3:
Large-scale data preparation

Slide 22Slide 22Slide 22 www.edureka.co/big-data-and-hadoop
#3: Large-Scale Data preparation
80% of data science work is data preparation

Slide 23Slide 23Slide 23 www.edureka.co/big-data-and-hadoop
Reason #4:
Accelerate data-driven innovation
Why HADOOP for DATA SCIENCE?

Slide 24Slide 24Slide 24 www.edureka.co/big-data-and-hadoop
Speed Barriers of traditional Data Architectures

Slide 25Slide 25Slide 25 www.edureka.co/big-data-and-hadoop
“Schema on read” means faster time-to-innovation

Slide 28
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.
SURVEY

Más contenido relacionado

La actualidad más candente

Bulk Loading Into HBase With MapReduceEdureka!

WhatisbigdataandwhylearnhadoopEdureka!

Introduction to Big data & Hadoop -IEdureka!

Big Data Analytics for Non-ProgrammersEdureka!

Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!

5 Scenarios: When To Use & When Not to Use HadoopEdureka!

Big Data and Hadoop BasicsSonal Tiwari

Webinar: Ways to Succeed with Hadoop in 2015Edureka!

Intro to HDFS and MapReduceRyan Tabora

Introduction to Bigdata and HADOOP vinoth kumar

Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed

Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!

Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!

Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaEdureka!

Simplifying Big Data ETL with TalendEdureka!

Why Talend for Big Data?Edureka!

Introduction of Big data and Hadoop Arohi Khandelwal

Hadoop Seminar ReportBhushan Kulkarni

Big data and hadoopChanchal Tripathi

La actualidad más candente (19)

Bulk Loading Into HBase With MapReduce

Whatisbigdataandwhylearnhadoop

Introduction to Big data & Hadoop -I

Big Data Analytics for Non-Programmers

Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka

5 Scenarios: When To Use & When Not to Use Hadoop

Big Data and Hadoop Basics

Webinar: Ways to Succeed with Hadoop in 2015

Intro to HDFS and MapReduce

Introduction to Bigdata and HADOOP

Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals

Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...

Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...

Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka

Simplifying Big Data ETL with Talend

Why Talend for Big Data?

Introduction of Big data and Hadoop

Hadoop Seminar Report

Big data and hadoop

Destacado

Python for Big Data AnalyticsEdureka!

The Agile way with PMI-ACPEdureka!

PMI-ACP WebinarEdureka!

Hadoop Streaming Tutorial With PythonJoe Stein

Python for Big Data AnalyticsEdureka!

Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson

Hadoop with PythonDonald Miner

Pig and Python to Process Big DataShawn Hermans

Destacado (8)

Python for Big Data Analytics

The Agile way with PMI-ACP

PMI-ACP Webinar

Hadoop Streaming Tutorial With Python

Python for Big Data Analytics

Python in the Hadoop Ecosystem (Rock Health presentation)

Hadoop with Python

Pig and Python to Process Big Data

Similar a Is Hadoop a necessity for Data Science

Hadoop Webinar 28July15Edureka!

Hadoop : The Pile of Big DataEdureka!

Is Hadoop a Necessity for Data ScienceEdureka!

Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!

Talend For Big Data : Secret Key to HadoopEdureka!

Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaEdureka!

Hadoop, Iot and Analytics- The Three MusketeersEdureka!

Introduction to Big Data & HadoopEdureka!

Next Generation Hadoop IntroductionAdam Muise

What is Hadoop? Oct 17 2013Adam Muise

Talend webinarEdureka!

TSE_Pres12.pptxssuseracaaae2

How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...Edureka!

ETL using Big Data Talend Edureka!

Oh! Session on Introduction to BIG DataPrakalp Agarwal

5 Tips to Building a Successful Big Data StrategyWestern Digital

Hadoop at the Center: The Next Generation of HadoopAdam Muise

Big Data & Open Source - Neil JadhavSwapnil (Neil) Jadhav

Understanding Big Data And HadoopEdureka!

Introduction to Data ScienceEdureka!

Similar a Is Hadoop a necessity for Data Science (20)

Hadoop Webinar 28July15

Hadoop : The Pile of Big Data

Is Hadoop a Necessity for Data Science

Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...

Talend For Big Data : Secret Key to Hadoop

Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka

Hadoop, Iot and Analytics- The Three Musketeers

Introduction to Big Data & Hadoop

Next Generation Hadoop Introduction

What is Hadoop? Oct 17 2013

Talend webinar

TSE_Pres12.pptx

How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...

ETL using Big Data Talend

Oh! Session on Introduction to BIG Data

5 Tips to Building a Successful Big Data Strategy

Hadoop at the Center: The Next Generation of Hadoop

Big Data & Open Source - Neil Jadhav

Understanding Big Data And Hadoop

Introduction to Data Science

Más de Edureka!

What to learn during the 21 days Lockdown | EdurekaEdureka!

Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!

Top 5 Trending Business Intelligence Tools | EdurekaEdureka!

Tableau Tutorial for Data Science | EdurekaEdureka!

Python Programming Tutorial | EdurekaEdureka!

Top 5 PMP Certifications | EdurekaEdureka!

Top Maven Interview Questions in 2020 | EdurekaEdureka!

Linux Mint Tutorial | EdurekaEdureka!

How to Deploy Java Web App in AWS| EdurekaEdureka!

Importance of Digital Marketing | EdurekaEdureka!

RPA in 2020 | EdurekaEdureka!

Email Notifications in Jenkins | EdurekaEdureka!

EA Algorithm in Machine Learning | EdurekaEdureka!

Cognitive AI Tutorial | EdurekaEdureka!

AWS Cloud Practitioner Tutorial | EdurekaEdureka!

Blue Prism Top Interview Questions | EdurekaEdureka!

Big Data on AWS Tutorial | Edureka Edureka!

A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!

Kubernetes Installation on Ubuntu | EdurekaEdureka!

Introduction to DevOps | EdurekaEdureka!

Más de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka

Top 10 Dying Programming Languages in 2020 | Edureka

Top 5 Trending Business Intelligence Tools | Edureka

Tableau Tutorial for Data Science | Edureka

Python Programming Tutorial | Edureka

Top 5 PMP Certifications | Edureka

Top Maven Interview Questions in 2020 | Edureka

Linux Mint Tutorial | Edureka

How to Deploy Java Web App in AWS| Edureka

Importance of Digital Marketing | Edureka

RPA in 2020 | Edureka

Email Notifications in Jenkins | Edureka

EA Algorithm in Machine Learning | Edureka

Cognitive AI Tutorial | Edureka

AWS Cloud Practitioner Tutorial | Edureka

Blue Prism Top Interview Questions | Edureka

Big Data on AWS Tutorial | Edureka

A star algorithm | A* Algorithm in Artificial Intelligence | Edureka

Kubernetes Installation on Ubuntu | Edureka

Introduction to DevOps | Edureka

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Scaling API-first – The story of a global engineering organizationRadu Cotescu

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Developing An App To Navigate The Roads of BrazilV3cube

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Is Hadoop a necessity for Data Science

1. www.edureka.co/r-for-analytics www.edureka.co/big-data-and-hadoop Is Hadoop a necessity for data Science ?

2. Slide 2Slide 2Slide 2 www.edureka.co/big-data-and-hadoop Today we will take you through the following:  What is Big Data & Hadoop?  What is a Data Product?  What is Data Science?  Why Hadoop for Data Science?  Is Hadoop a necessity for Data Science? AGENDA

3. Slide 3Slide 3Slide 3 www.edureka.co/big-data-and-hadoop What is Big Data & Hadoop?

4. Slide 4Slide 4Slide 4 www.edureka.co/big-data-and-hadoop BIG DATA Big data is a popular term used to describe the exponential growth of data. Big Data can be either Structured data or Unstructured data or a combination of both. Big Data

5. Slide 5Slide 5Slide 5 www.edureka.co/big-data-and-hadoop BIGDATA 3 V’s (Volume, Variety and Velocity) are three defining properties or dimensions of Big Data.

6. Slide 6Slide 6Slide 6 www.edureka.co/big-data-and-hadoop HADOOP Hadoop is a programming framework that supports the processing of large data sets in a distributed computing environment. Hadoop was the first and still the best tool to handle Big Data.

7. Slide 7Slide 7Slide 7 www.edureka.co/big-data-and-hadoop A BRIEF HISTORY OF HADOOP

8. Slide 8Slide 8Slide 8 www.edureka.co/big-data-and-hadoop HADOOP:- HDFS & MAP-REDUCE Most efficient for Large-Scale Storage & Processing  HDFS: Distributed file system Self-Healing Data store  MAP-REDUCE: Distributed computation framework that handles the complexities of distributed programming

9. Slide 9Slide 9Slide 9 www.edureka.co/big-data-and-hadoop KEY TO HADOOP’S POWER  Computation co-located with data Data and computation system co-designed and co-developed to work together  Process data in parallel across thousands of “commodity” hardware nodes Self-healing; failure handled by software  Designed for one write and multiple reads There are no random writes Optimized for minimum seek on hard drives

10. Slide 10Slide 10Slide 10 www.edureka.co/big-data-and-hadoop What is a Data product? “A software system whose core functionality depends on the application of statistical analysis and machine learning to data.”

11. Slide 11Slide 11Slide 11 www.edureka.co/big-data-and-hadoop Example #1: People you may know

12. Slide 12Slide 12Slide 12 www.edureka.co/big-data-and-hadoop Example #2: Spell Correction

13. Slide 13Slide 13Slide 13 www.edureka.co/big-data-and-hadoop What is Data Science?

14. Slide 14Slide 14Slide 14 www.edureka.co/big-data-and-hadoop DATA SCIENCE #1: Extracting deep meaning from data (data mining; finding “gems” in data)

15. Slide 15Slide 15Slide 15 www.edureka.co/big-data-and-hadoop Common Data Science tasks

16. Slide 16Slide 16Slide 16 www.edureka.co/big-data-and-hadoop DATA SCIENCE #2: Building Data Products (Delivering Gems on a regular basis)

17. Slide 17Slide 17Slide 17 www.edureka.co/big-data-and-hadoop Why HADOOP for DATA SCIENCE? Reason #1: Explore full datasets

18. Slide 18Slide 18Slide 18 www.edureka.co/big-data-and-hadoop #1: Exploration of Data sets

19. Slide 19Slide 19Slide 19 www.edureka.co/big-data-and-hadoop Why HADOOP for DATA SCIENCE? Reason #2: Mining of larger datasets

20. Slide 20Slide 20Slide 20 www.edureka.co/big-data-and-hadoop #2: Mining of larger data sets More Data ---> Better Outcomes

21. Slide 21Slide 21Slide 21 www.edureka.co/big-data-and-hadoop Why HADOOP for DATA SCIENCE? Reason #3: Large-scale data preparation

22. Slide 22Slide 22Slide 22 www.edureka.co/big-data-and-hadoop #3: Large-Scale Data preparation 80% of data science work is data preparation

23. Slide 23Slide 23Slide 23 www.edureka.co/big-data-and-hadoop Reason #4: Accelerate data-driven innovation Why HADOOP for DATA SCIENCE?

24. Slide 24Slide 24Slide 24 www.edureka.co/big-data-and-hadoop Speed Barriers of traditional Data Architectures

25. Slide 25Slide 25Slide 25 www.edureka.co/big-data-and-hadoop “Schema on read” means faster time-to-innovation

26. Demo

27. Questions Slide 27

28. Slide 28 Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better! Please spare few minutes to take the survey after the webinar. SURVEY

Is Hadoop a necessity for Data Science

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Destacado

Destacado (8)

Similar a Is Hadoop a necessity for Data Science

Similar a Is Hadoop a necessity for Data Science (20)

Más de Edureka!

Más de Edureka! (20)

Último

Último (20)

Is Hadoop a necessity for Data Science