Big data and hadoop

•Descargar como PPTX, PDF•

0 recomendaciones•46 vistas

Chanchal Tripathi

PPT on Big data and hadoop

Datos y análisis

Big data And Hadoop
Submitted To:-
Mr. B. Varghese
Submitted By:-
Chanchal tripathi
Roll no:- 36
MCA-5
1

Index
• Big data
• How much big data
• Type of Data
• Apache Hadoop
• Design principle of hadoop
• Hadoop Vs other system
• Hadoop Vs database
• About Cloudera
• Other elements of big data
2

Big Data EveryWhere!
• Lots of data is being collected
and warehoused
– Web data, e-commerce
– purchases at department/
grocery stores
– Bank/Credit Card
transactions
– Social Network
3

How much data?
• Google processes 20 PB a day (2008)
• Wayback Machine has 3 PB + 100 TB/month
• Facebook has 2.5 PB of user data + 15 TB/day
• eBay has 6.5 PB of user data + 50 TB/day
• CERN’s Large Hydron Collider (LHC) generates
15 PB a year
4

Type of Data
• Relational Data
(Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
– Social Network, Semantic Web (RDF), …
5

Apache Hadoop
• Open source software framework for distributed
processing of large datasets across large clusters of
computers
• Hadoop is based on a simple programming model called
MapReduce
• Hadoop is based on a simple data model, any data will
fit
• Hadoop is one of the tools designed to handle big data
6

• Automatic parallelization & distribution
– Hidden from the end-user
• Fault tolerance and automatic recovery
– Nodes/tasks will fail and will recover automatically
• Clean and simple programming abstraction
– Users only provide two functions “map” and
“reduce”
Design Principles of Hadoop
7

Hadoop vs. Other Systems
Distributed Databases Hadoop
Computing Model - Notion of transactions
- Transaction is the unit of work
- ACID properties, Concurrency control
- Notion of jobs
- Job is the unit of work
- No concurrency control
Data Model - Structured data with known schema
- Read/Write mode
- Any data will fit in any format
- (un)(semi)structured
- Read Only mode
Cost Model - Expensive servers - Cheap commodity machines
Fault Tolerance - Failures are rare
- Recovery mechanisms
- Failures are common over thousands
of machines
- Simple yet efficient fault tolerance
Key Characteristics - Efficiency, optimizations, fine-tuning - Scalability, flexibility, fault tolerance
8

Why Hadoop is able to compete?
vs.
Flexibility in accepting all
data formats (no schema)
Commodity inexpensive
hardware
Efficient and simple fault-
tolerant mechanism
Performance (tons of indexing,
tuning, data organization tech.)
Features:
- Provenance tracking
- Annotation
management
Scalability (petabytes of
data, thousands of
machines)
Database
9

About Cloudera
• Cloudera is The commercial Hadoop
company
• Founded by leading experts on Hadoop from
Facebook, Google,Oracle and Yahoo
• Provides consulting and training services for
Hadoop users
10

BIG DATA is not just HADOOP
Manage & store huge volume
of any data
Hadoop File System
Map Reduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all
data sources
Integration, Data Quality, Security,
Lifecycle Management, MDM
Understand and navigate
federated big data sources
Federated Discovery and Navigation
11

Conclusion
• Big data is fastest emerging business
intelligence analysis data and hadoop is a tool
to handle those big data in a very efficient
way.
• Big data make BIG changes in the business
analyzing world.

References
• Hadoop: The Definitive Guide: The Definitive
Guide - Tom White - Google Books
• hadoop - Google Scholar
• http://hadoop.apache.org/#Who+Uses+Hado
op%3F
• http://hadoop.apache.org/
• http://www.cloudera.com
12

Más contenido relacionado

La actualidad más candente

Hadoop infoNikita Sure

Big Data and Hadoop BasicsSonal Tiwari

Hadoop Seminar ReportAtul Kushwaha

Big data and HadoopRahul Agarwal

Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari

Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi

Introduction to Bigdata and HADOOP vinoth kumar

Seminar_Report_hadoopVarun Narang

Big data Hadoop presentation Shivanee garg

Hadoop Tutorial For BeginnersDataflair Web Services Pvt Ltd

Big Data and Hadoop IntroductionDzung Nguyen

Hadoop MapReduce FrameworkEdureka!

Hadoop seminarKrishnenduKrishh

Big data Analytics HadoopMishika Bharadwaj

Hadoop for beginners free course pptNjain85

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn

Big Data and HadoopFlavio Vit

Big data technologies and Hadoop infrastructureRoman Nikitchenko

Big Data & Hadoop TutorialEdureka!

Introduction to Big Data & HadoopEdureka!

La actualidad más candente (20)

Hadoop info

Big Data and Hadoop Basics

Hadoop Seminar Report

Big data and Hadoop

Big data Hadoop Analytic and Data warehouse comparison guide

Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...

Introduction to Bigdata and HADOOP

Seminar_Report_hadoop

Big data Hadoop presentation

Hadoop Tutorial For Beginners

Big Data and Hadoop Introduction

Hadoop MapReduce Framework

Hadoop seminar

Big data Analytics Hadoop

Hadoop for beginners free course ppt

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...

Big Data and Hadoop

Big data technologies and Hadoop infrastructure

Big Data & Hadoop Tutorial

Introduction to Big Data & Hadoop

Similar a Big data and hadoop

Introduction To Big Data & HadoopBlackvard

Big dataroysonli

Hadoop Master Class : A concise overviewAbhishek Roy

5 Things that Make Hadoop a Game ChangerCaserta

Bi 2.0 hadoop everywhereDmitry Tolpeko

Big data and hadoop overvewKunal Khanna

Hadoop and the Data Warehouse: When to Use Which DataWorks Summit

Hadoop HDFS.ppt6535ANURAGANURAG

Hadoop and the Data Warehouse: Point/Counter PointInside Analysis

Big data.pptIdontKnow66967

Lecture1Manish Singh

Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2

Big Data Analytics with HadoopPhilippe Julio

Meta scale kognitio hadoop webinarMichael Hiskey

Intro to Big DataZohar Elkayam

Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler

Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan

Big Data & Hadoop IntroductionJayant Mukherjee

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu

Introduction to Big DataRoi Blanco

Similar a Big data and hadoop (20)

Introduction To Big Data & Hadoop

Big data

Hadoop Master Class : A concise overview

5 Things that Make Hadoop a Game Changer

Bi 2.0 hadoop everywhere

Big data and hadoop overvew

Hadoop and the Data Warehouse: When to Use Which

Hadoop HDFS.ppt

Hadoop and the Data Warehouse: Point/Counter Point

Big data.ppt

Lecture1

Lecture1 BIG DATA and Types of data in details

Big Data Analytics with Hadoop

Meta scale kognitio hadoop webinar

Intro to Big Data

Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)

Simple, Modular and Extensible Big Data Platform Concept

Big Data & Hadoop Introduction

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios

Introduction to Big Data

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Probability Grade 10 Third Quarter LessonsJoseMangaJr1

Week-01-2.ppt BBB human Computer interactionfulawalesam

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

Invezz.com - Grow your wealth with trading signalsInvezz1

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Midocean dropshipping via API with DroFxolyaivanovalion

Big data and hadoop

1. Big data And Hadoop Submitted To:- Mr. B. Varghese Submitted By:- Chanchal tripathi Roll no:- 36 MCA-5 1

2. Index • Big data • How much big data • Type of Data • Apache Hadoop • Design principle of hadoop • Hadoop Vs other system • Hadoop Vs database • About Cloudera • Other elements of big data 2

3. Big Data EveryWhere! • Lots of data is being collected and warehoused – Web data, e-commerce – purchases at department/ grocery stores – Bank/Credit Card transactions – Social Network 3

4. How much data? • Google processes 20 PB a day (2008) • Wayback Machine has 3 PB + 100 TB/month • Facebook has 2.5 PB of user data + 15 TB/day • eBay has 6.5 PB of user data + 50 TB/day • CERN’s Large Hydron Collider (LHC) generates 15 PB a year 4

5. Type of Data • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data – Social Network, Semantic Web (RDF), … 5

6. Apache Hadoop • Open source software framework for distributed processing of large datasets across large clusters of computers • Hadoop is based on a simple programming model called MapReduce • Hadoop is based on a simple data model, any data will fit • Hadoop is one of the tools designed to handle big data 6

7. • Automatic parallelization & distribution – Hidden from the end-user • Fault tolerance and automatic recovery – Nodes/tasks will fail and will recover automatically • Clean and simple programming abstraction – Users only provide two functions “map” and “reduce” Design Principles of Hadoop 7

8. Hadoop vs. Other Systems Distributed Databases Hadoop Computing Model - Notion of transactions - Transaction is the unit of work - ACID properties, Concurrency control - Notion of jobs - Job is the unit of work - No concurrency control Data Model - Structured data with known schema - Read/Write mode - Any data will fit in any format - (un)(semi)structured - Read Only mode Cost Model - Expensive servers - Cheap commodity machines Fault Tolerance - Failures are rare - Recovery mechanisms - Failures are common over thousands of machines - Simple yet efficient fault tolerance Key Characteristics - Efficiency, optimizations, fine-tuning - Scalability, flexibility, fault tolerance 8

9. Why Hadoop is able to compete? vs. Flexibility in accepting all data formats (no schema) Commodity inexpensive hardware Efficient and simple fault- tolerant mechanism Performance (tons of indexing, tuning, data organization tech.) Features: - Provenance tracking - Annotation management Scalability (petabytes of data, thousands of machines) Database 9

10. About Cloudera • Cloudera is The commercial Hadoop company • Founded by leading experts on Hadoop from Facebook, Google,Oracle and Yahoo • Provides consulting and training services for Hadoop users 10

11. BIG DATA is not just HADOOP Manage & store huge volume of any data Hadoop File System Map Reduce Manage streaming data Stream Computing Analyze unstructured data Text Analytics Engine Data WarehousingStructure and control data Integrate and govern all data sources Integration, Data Quality, Security, Lifecycle Management, MDM Understand and navigate federated big data sources Federated Discovery and Navigation 11

12. Conclusion • Big data is fastest emerging business intelligence analysis data and hadoop is a tool to handle those big data in a very efficient way. • Big data make BIG changes in the business analyzing world.

13. References • Hadoop: The Definitive Guide: The Definitive Guide - Tom White - Google Books • hadoop - Google Scholar • http://hadoop.apache.org/#Who+Uses+Hado op%3F • http://hadoop.apache.org/ • http://www.cloudera.com 12

14. 13

15. 14

Big data and hadoop

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Big data and hadoop

Similar a Big data and hadoop (20)

Último

Último (20)

Big data and hadoop