Big datatraining ranga_1

•

1 like•753 views

Ranga Vadlamudi

Big Data Training Slides

Technology Education

BIG
DATA
TRAINING

Ranga
Vadlamudi

March
2014

What
is
Big
Data

•  Volume:
Large
Amounts
of
Data
at
rest

•  Velocity:
milliseconds
to
seconds
to
respond

•  Variety:
Data
in
many
forms
(Structured,

Unstructured,
MulEmedia,
Text
etc.)

•  Veracity:
Data
in
doubt

•  30
billion
pieces
of
content
a
month

• 
1
Peta
byte
of
content
every
day

•  2
Billion
videos
watched
everyday

•  3
Billion
people
will
be
online

•  Sharing
8
zeQabytes
of
data

CAP
THEOREM

(Consistency,
Availability,
ParEEon)

Big
Data
SoluEons

Big
Data

Real
Time

Querying

Batch

Querying

Mining
&

AnalyEcs

Machine

Learning

Storage

Background

•  Underlying
Technology
invented
by
Google

•  Google
Big-‐Table
&
Google
File
System

•  Doug
Cung
created
NUTCH
and
Hadoop
was

spun
oﬀ
at
Yahoo

•  Yahoo
played
a
key
role
in
developing
Hadoop

for
enterprise
applicaEons

Hadoop

•  Is
a
framework

•  Built
on
commodity
hardware

•  Implements
computaEonal
paradigm
called

Map-‐Reduce

•  Provides
a
distributed
ﬁle
system
called
HDFS

to
store
data

•  Node
failures
are
automaEcally
handled

Data
Becomes
BoQleneck

•  Geng
data
to
processors
is
expensive

•  Typical
disk
data
transfer
rate
75MB/sec

•  100GB
data
transfer
:
22mins
approx.

•  New
approach
is
needed

Hadoop
Solves

•  Problems
where
you
have
lot
of
data

•  Mixture
of
complex
and
structured
data

•  Speeds
up
computaEons
by
distribuEon

•  Mantra
is
take
computaEon
to
the
data,
don’t

bring
data
to
computaEon

Hadoop
Architecture

•  Master
Slave
philosophy

•  Designed
to
run
on
large
number
of
machines

•  Machines
don’t
share
memory
or
disk

•  Rack
them
up
and
run
Hadoop
on
each

machine

Hadoop
Architecture

•  Data
is
divided
and
spread
across
servers

•  Hadoop
keeps
track
of
where
the
data
is

•  Hadoop
replicates
data
to
mulEple
copies
to

avoid
single
point
of
failure

•  MapReduce
is
a
programming
model

to
process

large
sets
of
data
in
parallel

•  Map
the
operaEon
out
to
all
servers

•  Shuﬄe
the
results

•  Reduce
the
results
back
into
one
result
set

HDFS

•  Distributed
ﬁle
system

•  Highly
fault
tolerant

•  HDFS
instance
can
span
across
many
servers

•  Has
large
datasets
into
terabytes
to
petabytes

•  Moving
computaEon
is
cheaper
than
moving

data

•  Large
block
sizes
(128MB
for
example)

Cloudera
Manager

•  Management
sogware
to
manage
Hadoop

ecosystem

•  Helps
install,
manage
and
maintain
a
cluster

•  Resource
consumpEon
tracking

•  ProacEve
health
checks

•  AlerEng

•  Conﬁg
changes

Demo
Cloudera

Demo
Cassandra

Demo
Mongo
DB

What's hot

Dremio introductionAlexis Gendronneau

Reblaze Case Study on GCPIdan Tohami

Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA

The of Operational Analytics Data StoreRommel Garcia

Big Data Open Source Technologiesneeraj rathore

Bi on Big Data - Strata 2016 in LondonDremio Corporation

Cloud and Big Data trendsSebastien Goasguen

DataStax Enterprise in Practice (Field Notes)DataStax

Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana

How DataStax Enterprise and Azure Make Your Apps Scale from Day 1DataStax

Ankus, bigdata deployment and orchestration frameworkAshrith Mekala

Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME

Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Rommel Garcia

Case study on big dataKhushboo Kumari

Big Data Case Study: Fortune 100 TelcoBlueData, Inc.

Don’t Bring Old Problems to Your New Cloud Data Warehouse Precisely

Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...DataStax

Big Data in the CloudNati Shalom

Cloudian HyperStore Operating EnvironmentCloudian

Hadoop world overview trends and topicsValentin Kropov

What's hot (20)

Dremio introduction

Reblaze Case Study on GCP

Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...

The of Operational Analytics Data Store

Big Data Open Source Technologies

Bi on Big Data - Strata 2016 in London

Cloud and Big Data trends

DataStax Enterprise in Practice (Field Notes)

Data saturday malta - ADX Azure Data Explorer overview

How DataStax Enterprise and Azure Make Your Apps Scale from Day 1

Ankus, bigdata deployment and orchestration framework

Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...

Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"

Case study on big data

Big Data Case Study: Fortune 100 Telco

Don’t Bring Old Problems to Your New Cloud Data Warehouse

Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...

Big Data in the Cloud

Cloudian HyperStore Operating Environment

Hadoop world overview trends and topics

Similar to Big datatraining ranga_1

Introduction to BIg Data and HadoopAmir Shaikh

2. hadoop fundamentalsLokesh Ramaswamy

Big data and hadoop overvewKunal Khanna

Hadoop HDFS.ppt6535ANURAGANURAG

List of Engineering Colleges in UttarakhandRoorkee College of Engineering, Roorkee

Hadoop.pptxarslanhaneef

Hadoop.pptxsonukumar379092

Colorado Springs Open Source Hadoop/MySQL David Smelker

Big data and hadoopRoushan Sinha

02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1

Intro to Big DataZohar Elkayam

Hadoop TechnologyAtul Kushwaha

Architecting Your First Big Data ImplementationAdaryl "Bob" Wakefield, MBA

Hadoop ppt1chariorienit

Big data - Online TrainingLearntek1

Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam

Hadoop Eco systemTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Hadoop jonHumoyun Ahmedov

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw

Hadoop trainingTIB Academy

Similar to Big datatraining ranga_1 (20)

Introduction to BIg Data and Hadoop

2. hadoop fundamentals

Big data and hadoop overvew

Hadoop HDFS.ppt

List of Engineering Colleges in Uttarakhand

Hadoop.pptx

Colorado Springs Open Source Hadoop/MySQL

Big data and hadoop

02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY

Intro to Big Data

Hadoop Technology

Architecting Your First Big Data Implementation

Hadoop ppt1

Big data - Online Training

Rapid Cluster Computing with Apache Spark 2016

Hadoop Eco system

Hadoop jon

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2

Hadoop training

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

ICT role in 21st century education and its challengesrafiqahmad00786416

A Year of the Servo Reboot: Where Are We Now?Igalia

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays

GenAI Risks & Security Meetup 01052024.pdflior mazor

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Architecting Cloud Native ApplicationsWSO2

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Manulife - Insurer Transformation Award 2024The Digital Insurer

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024

presentation ICT roal in 21st century education

ICT role in 21st century education and its challenges

A Year of the Servo Reboot: Where Are We Now?

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Exploring the Future Potential of AI-Enabled Smartphone Processors

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu

GenAI Risks & Security Meetup 01052024.pdf

Powerful Google developer tools for immediate impact! (2023-24 C)

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - The value of a flexible API Management solution for O...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Architecting Cloud Native Applications

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Artificial Intelligence Chap.5 : Uncertainty

Manulife - Insurer Transformation Award 2024

Big datatraining ranga_1

1. BIG DATA TRAINING Ranga Vadlamudi March 2014

3. What is Big Data •  Volume: Large Amounts of Data at rest •  Velocity: milliseconds to seconds to respond •  Variety: Data in many forms (Structured, Unstructured, MulEmedia, Text etc.) •  Veracity: Data in doubt

4. •  30 billion pieces of content a month •  1 Peta byte of content every day •  2 Billion videos watched everyday •  3 Billion people will be online •  Sharing 8 zeQabytes of data

6. CAP THEOREM (Consistency, Availability, ParEEon)

7. Big Data SoluEons Big Data Real Time Querying Batch Querying Mining & AnalyEcs Machine Learning Storage

8. Technology

9. Background •  Underlying Technology invented by Google •  Google Big-‐Table & Google File System •  Doug Cung created NUTCH and Hadoop was spun oﬀ at Yahoo •  Yahoo played a key role in developing Hadoop for enterprise applicaEons

10. Hadoop •  Is a framework •  Built on commodity hardware •  Implements computaEonal paradigm called Map-‐Reduce •  Provides a distributed ﬁle system called HDFS to store data •  Node failures are automaEcally handled

11. Data Becomes BoQleneck •  Geng data to processors is expensive •  Typical disk data transfer rate 75MB/sec •  100GB data transfer : 22mins approx. •  New approach is needed

12. Hadoop Solves •  Problems where you have lot of data •  Mixture of complex and structured data •  Speeds up computaEons by distribuEon •  Mantra is take computaEon to the data, don’t bring data to computaEon

13. Hadoop DistribuEons

14. Hadoop Architecture •  Master Slave philosophy •  Designed to run on large number of machines •  Machines don’t share memory or disk •  Rack them up and run Hadoop on each machine

15. Hadoop Architecture •  Data is divided and spread across servers •  Hadoop keeps track of where the data is •  Hadoop replicates data to mulEple copies to avoid single point of failure •  MapReduce is a programming model to process large sets of data in parallel •  Map the operaEon out to all servers •  Shuﬄe the results •  Reduce the results back into one result set

16. Hadoop Components

17. HDFS (Hadoop File System

18. HDFS •  Distributed ﬁle system •  Highly fault tolerant •  HDFS instance can span across many servers •  Has large datasets into terabytes to petabytes •  Moving computaEon is cheaper than moving data •  Large block sizes (128MB for example)

19.

20. HDFS Layout

21. Cloudera Manager •  Management sogware to manage Hadoop ecosystem •  Helps install, manage and maintain a cluster •  Resource consumpEon tracking •  ProacEve health checks •  AlerEng •  Conﬁg changes

22. Cloudera CapabiliEes

23. Demo Cloudera Demo Cassandra Demo Mongo DB

24. QuesEons?

Big datatraining ranga_1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big datatraining ranga_1

Similar to Big datatraining ranga_1 (20)

Recently uploaded

Recently uploaded (20)

Big datatraining ranga_1