SlideShare una empresa de Scribd logo
1 de 62
Descargar para leer sin conexión
Introduction to Big Data Survival Guide!

Luan Cestari
February 28 , 2014

1

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Please, let me ask ...
●

●

2

Who already tested a product/project related to Big
Data?
Who does work with Big Data?

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
What are we going to see here
●

The demystification the term ¨Big Data¨ and beyond!
●
●

What does the people claim to be Big Data
What is the relationship between Big Data and
databases
●
●

●

Some facts about database history
Why there are so many DB available?

How to clue all this stuff together?
●

3

Some well-known Hadoop ecosystem tools that cover a very
wide of Big Data issues

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Why Big Data is important
●

Many companies is already dealing with Big Data
using Open Source tools
●

●

4

There is demand for people to work with those tools as
a developer and analyst
You can also work with some integration between those
system and building to improve a already existing tool or
the next Big Data Tool

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Why Big Data is important
●

When a company is using Big Data tools, it can grow
very fast and complex:
●

●

●

5

Many different clusters (due tenant, geo localized or
different versions)
Different technologies for very related propose (also due
different team skills or use cases)
Many many software integration, layers to segregate the
different aspects and re factoring due the the fast pace

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Cool ... but what is Big Data after all?
●

Just tons of information isn't enough, it also needs to
be have:
●
●

Velocity

●

Value

●

6

Variety

And Volume

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
More about Volume: How Big it can be?
●

What is the size of daily batch job from Facebook? 100
GB 10000GB 100000GB?
●

7

Answer:104 857 600 gigabytes of users log

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
More about Variety: Where the data are from?
●

Customer generated Content

●

M2M

●

Sensors

●

B2B

●

B2C

●

Social Network

●

8

And others Devices: mobile phones, setbox, Security
Cameras

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
More about Value
●

The value is about the processing the data in a
reasonable period of time, so you can forecast
something. Because of that you will need some data
scientists, so they can do:
●

9

Analysts (find correlations using statistics, signal
processing, machine learning, persona, etc) using
different kind of tools (SQL, search engines, stream
processing)

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
More about Value
●

The value is about the processing the data in a
reasonable period of time, so you can forecast
something. Because of that you will need some data
scientists, so they can:
●

10

Find correlations using statistical or predictive analytics,
signal processing, machine learning, natural language
processing, BI, visualization, etc using different kind of
tools (SQL, search engines, stream processing)

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
More about Value
●

●

11

So the value are the insights generated that may help
you to generate a better product, making better
decision or take a competitive advantage over the
other competitors
The Open Source helps also the value to enable it in a
cost effective way, instead buying tons of expensive
tools

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
... and the Velocity
●

This is a very interesting point due different analyzes
may require different times:
●

●

12

A traffic system may need a streaming system to
analyze and predict the actual traffic and suggest better
routes over the city
The same traffic system may need to process several
weeks to have a good prediction of the average traffic
over the road, so that could be an offline batch

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
... and the Velocity
●

13

The main point is that there isn't a silver bullet for this,
different store system may be required for different
services that it aims to provide

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
SQL History
●

●

Hierarchical Database in 60`s
Then Relational Database in 80`s and until couple
years ago was the only solution used in most of the
enterprise
●

14

Big companies used to buy expensive special DW
database system to analyze their data

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
... and now

15

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
... and now

16

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Again the reason for that
●

For example the Web Analysis in Facebook:
●
●

+240 Billion photos

●

+1 Trillion connections

●

●

+1 Billion users

22% of references of the Internet

Harvard Business Review
●

●

17

A change from DW to a Big Data system made a 96
hours job run in just 4 hours
2012 2.5 exabyte create a day

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
We need to avoid the Golden hammer/Silver
Bullet Anti-pattern

18

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

●

Open Source projects that help you to deal with the Big
Data
Don't need vertical scaling (big machines), you ca use
cluster of commodity machines and archive even
better results
●

Parallel Processing

●

Fault tolerant Jobs

●

Redundant and distributed data (for disk failure and to
avoid moving data around)

●
●

19

Less complex programming model
It have low level native lib for high performance
RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

●

But the Hadoop file system (HDFS) doesn't handle well
low latency requests and small files =(
Well, there isn't silver bullet, we need more tools
●

20

so this is why Hadoop is not alone, there are many
different projects which integrate with it

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

●

But the Hadoop file system (HDFS) doesn't handle well
low latency requests and small files =(
Well, there isn't silver bullet, we need more tools
●

●

so this is why Hadoop is not alone, there are many
different projects which integrate with it
There are several big companies that offer Hadoop and
other projects as a big product and they help the
community, I will talk a little more about Hortonworks
and Cloudera`s projects sets as they are very wellknown and how they integrate. Find more on
http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support

21

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

22

Cluadera: CDH

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

Cluadera:
●

23

How to create this whole stack with minimum effort:
Cloudera Manager

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

24

Hortonworks: HDP

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

Hortonworks:
●

●

25

They use Ambari to management the cluster like
Claudera Manager does
They also have Tez to enhance the speed of the
workloads

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

And more tools:
●

●

26

You may use Apache Mesos or Hadoop 2 YARN to
better manage and sharing your services (for example
tenants/cloud)
Apache BigTop, Fuse-DFS, Apache Crunch, Apache
Whirr, Apache Hama,Apache Giraph, Open MPI,
Cascading (and its extensions), Weave, and more

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

27

There more tools for specific cases, like low latency
with Spark ecosystem

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Hadoop ecosystem save the day
●

28

But you can also use other tools for low latency such
as Twitter Storm, Yahoo S4, Linkedin Samza (or
Kafka), Amazon Kinesis, Google Millwheel

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
The integration with other system will be complex
●

29

An overview:

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
A different approach: Lambda Architecture
●

30

Idea from Twitter Team (like Nathan Marz) about how
to deal with Big Data Systems

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Questions?

31

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Introduction to Big Data Survival Guide!

Luan Cestari
February 28 , 2014

1

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
Please, let me ask ...
●

●

2

Who already tested a product/project related to Big
Data?
Who does work with Big Data?

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

Scalable
Portable
On-demand
Resource Management
Measureable
What are we going to see here
●

The demystification the term ¨Big Data¨ and beyond!
●
●

What does the people claim to be Big Data
What is the relationship between Big Data and
databases
●
●

●

How to clue all this stuff together?
●

3

Some facts about database history
Why there are so many DB available?
Some well-known Hadoop ecosystem tools that cover a very
wide of Big Data issues

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

The difference in http://www.slideshare.net/CAinc/cloud-expo-session-fromvirtualization-to-cloud-computing-building-an-effective-pragmatic-reliable-cloud
Why Big Data is important
●

Many companies is already dealing with Big Data
using Open Source tools
●

●

4

There is demand for people to work with those tools as
a developer and analyst
You can also work with some integration between those
system and building to improve a already existing tool or
the next Big Data Tool

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

4
Why Big Data is important
●

When a company is using Big Data tools, it can grow
very fast and complex:
●

●

●

5

Many different clusters (due tenant, geo localized or
different versions)
Different technologies for very related propose (also due
different team skills or use cases)
Many many software integration, layers to segregate the
different aspects and re factoring due the the fast pace

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

5
Cool ... but what is Big Data after all?
●

Just tons of information isn't enough, it also needs to
be have:
●
●

Velocity

●

Value

●

6

Variety

And Volume

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

6
More about Volume: How Big it can be?
●

What is the size of daily batch job from Facebook? 100
GB 10000GB 100000GB?
●

7

Answer:104 857 600 gigabytes of users log

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

7
More about Variety: Where the data are from?
●

Customer generated Content

●

M2M

●

Sensors

●

B2B

●

B2C

●

Social Network

●

8

And others Devices: mobile phones, setbox, Security
Cameras

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

8
More about Value
●

The value is about the processing the data in a
reasonable period of time, so you can forecast
something. Because of that you will need some data
scientists, so they can do:
●

9

Analysts (find correlations using statistics, signal
processing, machine learning, persona, etc) using
different kind of tools (SQL, search engines, stream
processing)

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

9
More about Value
●

The value is about the processing the data in a
reasonable period of time, so you can forecast
something. Because of that you will need some data
scientists, so they can:
●

10

Find correlations using statistical or predictive analytics,
signal processing, machine learning, natural language
processing, BI, visualization, etc using different kind of
tools (SQL, search engines, stream processing)

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

10
More about Value
●

●

11

So the value are the insights generated that may help
you to generate a better product, making better
decision or take a competitive advantage over the
other competitors
The Open Source helps also the value to enable it in a
cost effective way, instead buying tons of expensive
tools

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

11
... and the Velocity
●

This is a very interesting point due different analyzes
may require different times:
●

●

12

A traffic system may need a streaming system to
analyze and predict the actual traffic and suggest better
routes over the city
The same traffic system may need to process several
weeks to have a good prediction of the average traffic
over the road, so that could be an offline batch

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

12
... and the Velocity
●

13

The main point is that there isn't a silver bullet for this,
different store system may be required for different
services that it aims to provide

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

13
SQL History
●

●

Hierarchical Database in 60`s
Then Relational Database in 80`s and until couple
years ago was the only solution used in most of the
enterprise
●

14

Big companies used to buy expensive special DW
database system to analyze their data

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

14
... and now

15

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

15
... and now

16

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

16
Again the reason for that
●

For example the Web Analysis in Facebook:
●
●

+240 Billion photos

●

+1 Trillion connections

●

●

+1 Billion users

22% of references of the Internet

Harvard Business Review
●

●

17

A change from DW to a Big Data system made a 96
hours job run in just 4 hours
2012 2.5 exabyte create a day

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

17
We need to avoid the Golden hammer/Silver
Bullet Anti-pattern

18

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

18
Hadoop ecosystem save the day
●

●

Open Source projects that help you to deal with the Big
Data
Don't need vertical scaling (big machines), you ca use
cluster of commodity machines and archive even
better results
●

Parallel Processing

●

Fault tolerant Jobs

●

Redundant and distributed data (for disk failure and to
avoid moving data around)

●
●

19

Less complex programming model
It have low level native lib for high performance
RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

19
Hadoop ecosystem save the day
●

●

But the Hadoop file system (HDFS) doesn't handle well
low latency requests and small files =(
Well, there isn't silver bullet, we need more tools
●

20

so this is why Hadoop is not alone, there are many
different projects which integrate with it

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

20
Hadoop ecosystem save the day
●

●

But the Hadoop file system (HDFS) doesn't handle well
low latency requests and small files =(
Well, there isn't silver bullet, we need more tools
●

●

so this is why Hadoop is not alone, there are many
different projects which integrate with it
There are several big companies that offer Hadoop and
other projects as a big product and they help the
community, I will talk a little more about Hortonworks
and Cloudera`s projects sets as they are very wellknown and how they integrate. Find more on
http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support

21

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

21
Hadoop ecosystem save the day
●

22

Cluadera: CDH

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

Apache Sqoop is a tool designed for efficiently
transferring bulk data between Apache Hadoop and
structured datastores such as relational databases.

22
Hadoop ecosystem save the day
●

Cluadera:
●

23

How to create this whole stack with minimum effort:
Cloudera Manager

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

23
Hadoop ecosystem save the day
●

24

Hortonworks: HDP

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

Oozie is a workflow scheduler system to manage
Apache Hadoop jobs.
Oozie Workflow jobs are Directed Acyclical Graphs
(DAGs) of actions.
Oozie Coordinator jobs are recurrent Oozie Workflow
jobs triggered by time (frequency) and data
availabilty

24
Hadoop ecosystem save the day
●

Hortonworks:
●

●

25

They use Ambari to management the cluster like
Claudera Manager does
They also have Tez to enhance the speed of the
workloads

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

25
Hadoop ecosystem save the day
●

And more tools:
●

●

26

You may use Apache Mesos or Hadoop 2 YARN to
better manage and sharing your services (for example
tenants/cloud)
Apache BigTop, Fuse-DFS, Apache Crunch, Apache
Whirr, Apache Hama,Apache Giraph, Open MPI,
Cascading (and its extensions), Weave, and more

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

Apache Whirr is a set of libraries for running cloud
services.
The Apache Crunch Java library provides a
framework for writing, testing, and running
MapReduce pipelines. Its goal is to make pipelines
that are composed of many user-defined functions
simple to write, easy to test, and efficient to run.
Open MPI is a standardized API typically used for
parallel and/or distributed computing

26
Hadoop ecosystem save the day
●

27

There more tools for specific cases, like low latency
with Spark ecosystem

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

Apache Whirr is a set of libraries for running cloud
services.

27
Hadoop ecosystem save the day
●

28

But you can also use other tools for low latency such
as Twitter Storm, Yahoo S4, Linkedin Samza (or
Kafka), Amazon Kinesis, Google Millwheel

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

Apache Whirr is a set of libraries for running cloud
services.

28
The integration with other system will be complex
●

29

An overview:

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

29
A different approach: Lambda Architecture
●

30

Idea from Twitter Team (like Nathan Marz) about how
to deal with Big Data Systems

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

30
Questions?

31

RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD

Más contenido relacionado

La actualidad más candente

Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Guido Schmutz
 
DLD Summer Workshop Big Data
DLD Summer Workshop Big DataDLD Summer Workshop Big Data
DLD Summer Workshop Big DataRoland Judas
 
Hortonworks & IBM solutions
Hortonworks & IBM solutionsHortonworks & IBM solutions
Hortonworks & IBM solutionsThiago Santiago
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavSwapnil (Neil) Jadhav
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache sparkMohammed Guller
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architectSaurabh K. Gupta
 
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud ComputingBattling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud ComputingEdwin Poot
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmekideaport
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"Nicola Ferraro
 
Opensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingOpensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingAmir Sedighi
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 
Observe Changes of Taiwan Big Data Communities with Small Data
Observe Changes of Taiwan Big Data Communities with Small DataObserve Changes of Taiwan Big Data Communities with Small Data
Observe Changes of Taiwan Big Data Communities with Small DataJazz Yao-Tsung Wang
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 

La actualidad más candente (18)

Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?
 
DLD Summer Workshop Big Data
DLD Summer Workshop Big DataDLD Summer Workshop Big Data
DLD Summer Workshop Big Data
 
Hortonworks & IBM solutions
Hortonworks & IBM solutionsHortonworks & IBM solutions
Hortonworks & IBM solutions
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil Jadhav
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
PGDay Brasilia 2017
PGDay Brasilia 2017PGDay Brasilia 2017
PGDay Brasilia 2017
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud ComputingBattling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmek
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Opensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingOpensource Frameworks and BigData Processing
Opensource Frameworks and BigData Processing
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Observe Changes of Taiwan Big Data Communities with Small Data
Observe Changes of Taiwan Big Data Communities with Small DataObserve Changes of Taiwan Big Data Communities with Small Data
Observe Changes of Taiwan Big Data Communities with Small Data
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 

Destacado

Destacado (8)

Diário Oficial do Dia
Diário Oficial do DiaDiário Oficial do Dia
Diário Oficial do Dia
 
Acp empetur joão fernando coutinho
Acp empetur   joão fernando coutinhoAcp empetur   joão fernando coutinho
Acp empetur joão fernando coutinho
 
Vergani, RGW 2011 1
Vergani, RGW 2011 1Vergani, RGW 2011 1
Vergani, RGW 2011 1
 
Going Live
Going LiveGoing Live
Going Live
 
Slides controladoria 6
Slides controladoria 6Slides controladoria 6
Slides controladoria 6
 
Retirement Certificate (Trnsfr to FMCR)
Retirement Certificate (Trnsfr to FMCR)Retirement Certificate (Trnsfr to FMCR)
Retirement Certificate (Trnsfr to FMCR)
 
Lidera5
Lidera5Lidera5
Lidera5
 
Apresentação Institucional Ibri
Apresentação Institucional IbriApresentação Institucional Ibri
Apresentação Institucional Ibri
 

Similar a Big data

Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopOCTO Technology
 
Hadoop Desktop Cluster
Hadoop Desktop ClusterHadoop Desktop Cluster
Hadoop Desktop ClusterPaul Morse
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...OpenShift Origin
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
Miguel Angel Diaz - Red Hat - OSL19
Miguel Angel Diaz - Red Hat - OSL19Miguel Angel Diaz - Red Hat - OSL19
Miguel Angel Diaz - Red Hat - OSL19marketingsyone
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapterRajiv Tiwari
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIn Marketing We Trust
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
Cw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hatCw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hatTheInevitableCloud
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Webinar: Improving Time to Value for Enterprise Big Data Analytics
Webinar: Improving Time to Value for Enterprise Big Data AnalyticsWebinar: Improving Time to Value for Enterprise Big Data Analytics
Webinar: Improving Time to Value for Enterprise Big Data AnalyticsStorage Switzerland
 

Similar a Big data (20)

BDtraining
BDtrainingBDtraining
BDtraining
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
Hadoop Desktop Cluster
Hadoop Desktop ClusterHadoop Desktop Cluster
Hadoop Desktop Cluster
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
 
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
Miguel Angel Diaz - Red Hat - OSL19
Miguel Angel Diaz - Red Hat - OSL19Miguel Angel Diaz - Red Hat - OSL19
Miguel Angel Diaz - Red Hat - OSL19
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapter
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
Cw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hatCw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hat
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Webinar: Improving Time to Value for Enterprise Big Data Analytics
Webinar: Improving Time to Value for Enterprise Big Data AnalyticsWebinar: Improving Time to Value for Enterprise Big Data Analytics
Webinar: Improving Time to Value for Enterprise Big Data Analytics
 
Big Data
Big DataBig Data
Big Data
 

Más de Luan Cestari

Tunning da jvm dos comandos às configurações
Tunning da jvm  dos comandos às configuraçõesTunning da jvm  dos comandos às configurações
Tunning da jvm dos comandos às configuraçõesLuan Cestari
 
Getting Started with SOA using SwitchYard
Getting Started with SOA using SwitchYardGetting Started with SOA using SwitchYard
Getting Started with SOA using SwitchYardLuan Cestari
 
Tunning jvm em java 8
Tunning jvm em java 8Tunning jvm em java 8
Tunning jvm em java 8Luan Cestari
 
Indo para as nuvens mais rápido e fácil com Docker
Indo para as nuvens mais rápido e fácil com DockerIndo para as nuvens mais rápido e fácil com Docker
Indo para as nuvens mais rápido e fácil com DockerLuan Cestari
 
Lightblue project
Lightblue project Lightblue project
Lightblue project Luan Cestari
 
Latinoware 2013 - OpenStack RDO - A walkthrough by the Open Source Cloud Comp...
Latinoware 2013 - OpenStack RDO - A walkthrough by the Open Source Cloud Comp...Latinoware 2013 - OpenStack RDO - A walkthrough by the Open Source Cloud Comp...
Latinoware 2013 - OpenStack RDO - A walkthrough by the Open Source Cloud Comp...Luan Cestari
 

Más de Luan Cestari (8)

Tunning da jvm dos comandos às configurações
Tunning da jvm  dos comandos às configuraçõesTunning da jvm  dos comandos às configurações
Tunning da jvm dos comandos às configurações
 
Getting Started with SOA using SwitchYard
Getting Started with SOA using SwitchYardGetting Started with SOA using SwitchYard
Getting Started with SOA using SwitchYard
 
Tunning jvm em java 8
Tunning jvm em java 8Tunning jvm em java 8
Tunning jvm em java 8
 
Indo para as nuvens mais rápido e fácil com Docker
Indo para as nuvens mais rápido e fácil com DockerIndo para as nuvens mais rápido e fácil com Docker
Indo para as nuvens mais rápido e fácil com Docker
 
Lightblue project
Lightblue project Lightblue project
Lightblue project
 
Open stack
Open stackOpen stack
Open stack
 
Open stack
Open stackOpen stack
Open stack
 
Latinoware 2013 - OpenStack RDO - A walkthrough by the Open Source Cloud Comp...
Latinoware 2013 - OpenStack RDO - A walkthrough by the Open Source Cloud Comp...Latinoware 2013 - OpenStack RDO - A walkthrough by the Open Source Cloud Comp...
Latinoware 2013 - OpenStack RDO - A walkthrough by the Open Source Cloud Comp...
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Big data

  • 1. Introduction to Big Data Survival Guide! Luan Cestari February 28 , 2014 1 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 2. Please, let me ask ... ● ● 2 Who already tested a product/project related to Big Data? Who does work with Big Data? RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 3. What are we going to see here ● The demystification the term ¨Big Data¨ and beyond! ● ● What does the people claim to be Big Data What is the relationship between Big Data and databases ● ● ● Some facts about database history Why there are so many DB available? How to clue all this stuff together? ● 3 Some well-known Hadoop ecosystem tools that cover a very wide of Big Data issues RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 4. Why Big Data is important ● Many companies is already dealing with Big Data using Open Source tools ● ● 4 There is demand for people to work with those tools as a developer and analyst You can also work with some integration between those system and building to improve a already existing tool or the next Big Data Tool RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 5. Why Big Data is important ● When a company is using Big Data tools, it can grow very fast and complex: ● ● ● 5 Many different clusters (due tenant, geo localized or different versions) Different technologies for very related propose (also due different team skills or use cases) Many many software integration, layers to segregate the different aspects and re factoring due the the fast pace RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 6. Cool ... but what is Big Data after all? ● Just tons of information isn't enough, it also needs to be have: ● ● Velocity ● Value ● 6 Variety And Volume RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 7. More about Volume: How Big it can be? ● What is the size of daily batch job from Facebook? 100 GB 10000GB 100000GB? ● 7 Answer:104 857 600 gigabytes of users log RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 8. More about Variety: Where the data are from? ● Customer generated Content ● M2M ● Sensors ● B2B ● B2C ● Social Network ● 8 And others Devices: mobile phones, setbox, Security Cameras RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 9. More about Value ● The value is about the processing the data in a reasonable period of time, so you can forecast something. Because of that you will need some data scientists, so they can do: ● 9 Analysts (find correlations using statistics, signal processing, machine learning, persona, etc) using different kind of tools (SQL, search engines, stream processing) RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 10. More about Value ● The value is about the processing the data in a reasonable period of time, so you can forecast something. Because of that you will need some data scientists, so they can: ● 10 Find correlations using statistical or predictive analytics, signal processing, machine learning, natural language processing, BI, visualization, etc using different kind of tools (SQL, search engines, stream processing) RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 11. More about Value ● ● 11 So the value are the insights generated that may help you to generate a better product, making better decision or take a competitive advantage over the other competitors The Open Source helps also the value to enable it in a cost effective way, instead buying tons of expensive tools RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 12. ... and the Velocity ● This is a very interesting point due different analyzes may require different times: ● ● 12 A traffic system may need a streaming system to analyze and predict the actual traffic and suggest better routes over the city The same traffic system may need to process several weeks to have a good prediction of the average traffic over the road, so that could be an offline batch RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 13. ... and the Velocity ● 13 The main point is that there isn't a silver bullet for this, different store system may be required for different services that it aims to provide RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 14. SQL History ● ● Hierarchical Database in 60`s Then Relational Database in 80`s and until couple years ago was the only solution used in most of the enterprise ● 14 Big companies used to buy expensive special DW database system to analyze their data RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 15. ... and now 15 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 16. ... and now 16 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 17. Again the reason for that ● For example the Web Analysis in Facebook: ● ● +240 Billion photos ● +1 Trillion connections ● ● +1 Billion users 22% of references of the Internet Harvard Business Review ● ● 17 A change from DW to a Big Data system made a 96 hours job run in just 4 hours 2012 2.5 exabyte create a day RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 18. We need to avoid the Golden hammer/Silver Bullet Anti-pattern 18 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 19. Hadoop ecosystem save the day ● ● Open Source projects that help you to deal with the Big Data Don't need vertical scaling (big machines), you ca use cluster of commodity machines and archive even better results ● Parallel Processing ● Fault tolerant Jobs ● Redundant and distributed data (for disk failure and to avoid moving data around) ● ● 19 Less complex programming model It have low level native lib for high performance RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 20. Hadoop ecosystem save the day ● ● But the Hadoop file system (HDFS) doesn't handle well low latency requests and small files =( Well, there isn't silver bullet, we need more tools ● 20 so this is why Hadoop is not alone, there are many different projects which integrate with it RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 21. Hadoop ecosystem save the day ● ● But the Hadoop file system (HDFS) doesn't handle well low latency requests and small files =( Well, there isn't silver bullet, we need more tools ● ● so this is why Hadoop is not alone, there are many different projects which integrate with it There are several big companies that offer Hadoop and other projects as a big product and they help the community, I will talk a little more about Hortonworks and Cloudera`s projects sets as they are very wellknown and how they integrate. Find more on http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support 21 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 22. Hadoop ecosystem save the day ● 22 Cluadera: CDH RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 23. Hadoop ecosystem save the day ● Cluadera: ● 23 How to create this whole stack with minimum effort: Cloudera Manager RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 24. Hadoop ecosystem save the day ● 24 Hortonworks: HDP RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 25. Hadoop ecosystem save the day ● Hortonworks: ● ● 25 They use Ambari to management the cluster like Claudera Manager does They also have Tez to enhance the speed of the workloads RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 26. Hadoop ecosystem save the day ● And more tools: ● ● 26 You may use Apache Mesos or Hadoop 2 YARN to better manage and sharing your services (for example tenants/cloud) Apache BigTop, Fuse-DFS, Apache Crunch, Apache Whirr, Apache Hama,Apache Giraph, Open MPI, Cascading (and its extensions), Weave, and more RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 27. Hadoop ecosystem save the day ● 27 There more tools for specific cases, like low latency with Spark ecosystem RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 28. Hadoop ecosystem save the day ● 28 But you can also use other tools for low latency such as Twitter Storm, Yahoo S4, Linkedin Samza (or Kafka), Amazon Kinesis, Google Millwheel RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 29. The integration with other system will be complex ● 29 An overview: RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 30. A different approach: Lambda Architecture ● 30 Idea from Twitter Team (like Nathan Marz) about how to deal with Big Data Systems RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 31. Questions? 31 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 32. Introduction to Big Data Survival Guide! Luan Cestari February 28 , 2014 1 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD
  • 33. Please, let me ask ... ● ● 2 Who already tested a product/project related to Big Data? Who does work with Big Data? RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD Scalable Portable On-demand Resource Management Measureable
  • 34. What are we going to see here ● The demystification the term ¨Big Data¨ and beyond! ● ● What does the people claim to be Big Data What is the relationship between Big Data and databases ● ● ● How to clue all this stuff together? ● 3 Some facts about database history Why there are so many DB available? Some well-known Hadoop ecosystem tools that cover a very wide of Big Data issues RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD The difference in http://www.slideshare.net/CAinc/cloud-expo-session-fromvirtualization-to-cloud-computing-building-an-effective-pragmatic-reliable-cloud
  • 35. Why Big Data is important ● Many companies is already dealing with Big Data using Open Source tools ● ● 4 There is demand for people to work with those tools as a developer and analyst You can also work with some integration between those system and building to improve a already existing tool or the next Big Data Tool RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 4
  • 36. Why Big Data is important ● When a company is using Big Data tools, it can grow very fast and complex: ● ● ● 5 Many different clusters (due tenant, geo localized or different versions) Different technologies for very related propose (also due different team skills or use cases) Many many software integration, layers to segregate the different aspects and re factoring due the the fast pace RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 5
  • 37. Cool ... but what is Big Data after all? ● Just tons of information isn't enough, it also needs to be have: ● ● Velocity ● Value ● 6 Variety And Volume RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 6
  • 38. More about Volume: How Big it can be? ● What is the size of daily batch job from Facebook? 100 GB 10000GB 100000GB? ● 7 Answer:104 857 600 gigabytes of users log RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 7
  • 39. More about Variety: Where the data are from? ● Customer generated Content ● M2M ● Sensors ● B2B ● B2C ● Social Network ● 8 And others Devices: mobile phones, setbox, Security Cameras RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 8
  • 40. More about Value ● The value is about the processing the data in a reasonable period of time, so you can forecast something. Because of that you will need some data scientists, so they can do: ● 9 Analysts (find correlations using statistics, signal processing, machine learning, persona, etc) using different kind of tools (SQL, search engines, stream processing) RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 9
  • 41. More about Value ● The value is about the processing the data in a reasonable period of time, so you can forecast something. Because of that you will need some data scientists, so they can: ● 10 Find correlations using statistical or predictive analytics, signal processing, machine learning, natural language processing, BI, visualization, etc using different kind of tools (SQL, search engines, stream processing) RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 10
  • 42. More about Value ● ● 11 So the value are the insights generated that may help you to generate a better product, making better decision or take a competitive advantage over the other competitors The Open Source helps also the value to enable it in a cost effective way, instead buying tons of expensive tools RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 11
  • 43. ... and the Velocity ● This is a very interesting point due different analyzes may require different times: ● ● 12 A traffic system may need a streaming system to analyze and predict the actual traffic and suggest better routes over the city The same traffic system may need to process several weeks to have a good prediction of the average traffic over the road, so that could be an offline batch RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 12
  • 44. ... and the Velocity ● 13 The main point is that there isn't a silver bullet for this, different store system may be required for different services that it aims to provide RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 13
  • 45. SQL History ● ● Hierarchical Database in 60`s Then Relational Database in 80`s and until couple years ago was the only solution used in most of the enterprise ● 14 Big companies used to buy expensive special DW database system to analyze their data RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 14
  • 46. ... and now 15 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 15
  • 47. ... and now 16 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 16
  • 48. Again the reason for that ● For example the Web Analysis in Facebook: ● ● +240 Billion photos ● +1 Trillion connections ● ● +1 Billion users 22% of references of the Internet Harvard Business Review ● ● 17 A change from DW to a Big Data system made a 96 hours job run in just 4 hours 2012 2.5 exabyte create a day RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 17
  • 49. We need to avoid the Golden hammer/Silver Bullet Anti-pattern 18 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 18
  • 50. Hadoop ecosystem save the day ● ● Open Source projects that help you to deal with the Big Data Don't need vertical scaling (big machines), you ca use cluster of commodity machines and archive even better results ● Parallel Processing ● Fault tolerant Jobs ● Redundant and distributed data (for disk failure and to avoid moving data around) ● ● 19 Less complex programming model It have low level native lib for high performance RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 19
  • 51. Hadoop ecosystem save the day ● ● But the Hadoop file system (HDFS) doesn't handle well low latency requests and small files =( Well, there isn't silver bullet, we need more tools ● 20 so this is why Hadoop is not alone, there are many different projects which integrate with it RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 20
  • 52. Hadoop ecosystem save the day ● ● But the Hadoop file system (HDFS) doesn't handle well low latency requests and small files =( Well, there isn't silver bullet, we need more tools ● ● so this is why Hadoop is not alone, there are many different projects which integrate with it There are several big companies that offer Hadoop and other projects as a big product and they help the community, I will talk a little more about Hortonworks and Cloudera`s projects sets as they are very wellknown and how they integrate. Find more on http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support 21 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 21
  • 53. Hadoop ecosystem save the day ● 22 Cluadera: CDH RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. 22
  • 54. Hadoop ecosystem save the day ● Cluadera: ● 23 How to create this whole stack with minimum effort: Cloudera Manager RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 23
  • 55. Hadoop ecosystem save the day ● 24 Hortonworks: HDP RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty 24
  • 56. Hadoop ecosystem save the day ● Hortonworks: ● ● 25 They use Ambari to management the cluster like Claudera Manager does They also have Tez to enhance the speed of the workloads RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 25
  • 57. Hadoop ecosystem save the day ● And more tools: ● ● 26 You may use Apache Mesos or Hadoop 2 YARN to better manage and sharing your services (for example tenants/cloud) Apache BigTop, Fuse-DFS, Apache Crunch, Apache Whirr, Apache Hama,Apache Giraph, Open MPI, Cascading (and its extensions), Weave, and more RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD Apache Whirr is a set of libraries for running cloud services. The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run. Open MPI is a standardized API typically used for parallel and/or distributed computing 26
  • 58. Hadoop ecosystem save the day ● 27 There more tools for specific cases, like low latency with Spark ecosystem RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD Apache Whirr is a set of libraries for running cloud services. 27
  • 59. Hadoop ecosystem save the day ● 28 But you can also use other tools for low latency such as Twitter Storm, Yahoo S4, Linkedin Samza (or Kafka), Amazon Kinesis, Google Millwheel RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD Apache Whirr is a set of libraries for running cloud services. 28
  • 60. The integration with other system will be complex ● 29 An overview: RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 29
  • 61. A different approach: Lambda Architecture ● 30 Idea from Twitter Team (like Nathan Marz) about how to deal with Big Data Systems RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD 30
  • 62. Questions? 31 RED HAT ENTERPRISE LINUX – FOUNDATION FOR THE OPEN HYBRID CLOUD