SlideShare una empresa de Scribd logo
1 de 7
Descargar para leer sin conexión
MK99 – Big Data 1
Big data
&
cross-platform analytics
MOOC lectures Pr. Clement Levallois
MK99 – Big Data 2
Focus on “Hadoop”
• Frequently mentioned in relation to big data
• Vague definitions available and inflated talks
• This short video will clarify it.
MK99 – Big Data 3
• Note on the terminology:
– “computers” are called “servers” when they are just used
for computing / processing / storing data
– They have no screen, no mouse and no keyboard because
that’s not needed.
– But they are basically computers!
MK99 – Big Data 4
“Hadoop”
• Created by Yahoo! engineers in ~ 2005. Named after the elephant toy of one of the
engineer’s kid.
• Made open source and now developed by the main open source developer
community, called “Apache”. So you can see sometimes “Apache Hadoop”.
• In simple words:
– Hadoop is a free, open source software.
– It serves to connect several servers, so that a single task can be accomplished in parallel on
them.
– So, with Hadoop and 5 servers you can get a task of data crunching finish 5 times sooner than
with if you had just used one server.
– That’s it!
MK99 – Big Data 5
Why are Hadoop, cloud computing and big data
often discussed together?
– Imagine that you are Walmart and want to compute something on your CRM: say, what are
the clients who are most profitable for each store, based on their purchase history.
– You will need many servers to store the data, and many servers to do the computations.
– Instead of purchasing a farm of servers for this (expensive! time consuming!), you can pay for
a service of cloud computing (such as Amazon AWS EC2) to rent servers just for this task,
– And install Hadoop on these servers to divide the task among all servers and get it to run in
parallel, speeding up computation times.
– You will get the results in minutes or hours, instead of days.
MK99 – Big Data 6
And map/reduce?
– “Map/reduce” is also an expression often discussed in relation with cloud computing and
Hadoop.
– This is a principle of programming perfected by engineers in Google around 2004, and made
open source.
– It is a principle that solves this problem: when I have data spread on 500 different servers,
how do I search some data on all the servers? Checking all servers one by one (sequential
search) would take a very long time. MapReduce dispatches the search on all servers at once,
hence it is 500 times quicker than a sequential search.
– Any software can use this principle of programming. Mapreduce is at the heart of Hadoop,
which is one of the most popular software using it.
MK99 – Big Data 7
What is the business relevance
of Hadoop?
• Hadoop made it possible to process large amounts of data quickly, using free
software.
• It enables business models where intensive data crunching is necessary to create
value.
• Examples:
– Amazon computing book recommendations for you,
– Walmart offering personalized coupons,
– NYT showing personalized display ads,
– Waze (driving app) showing the state of traffic on your road in real time,
– your electricity utility company computing how much electricity should be generated at peak
hours.

Más contenido relacionado

Más de Clement Levallois

An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for businessClement Levallois
 
A Primer on Text Mining for Business
A Primer on Text Mining for BusinessA Primer on Text Mining for Business
A Primer on Text Mining for BusinessClement Levallois
 
The business stakes of data integration
The business stakes of data integrationThe business stakes of data integration
The business stakes of data integrationClement Levallois
 

Más de Clement Levallois (7)

Twitter for beginners
Twitter for beginnersTwitter for beginners
Twitter for beginners
 
An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for business
 
Data and personalization
Data and personalizationData and personalization
Data and personalization
 
A Primer on Text Mining for Business
A Primer on Text Mining for BusinessA Primer on Text Mining for Business
A Primer on Text Mining for Business
 
The business stakes of data integration
The business stakes of data integrationThe business stakes of data integration
The business stakes of data integration
 
What is big data?
What is big data?What is big data?
What is big data?
 
What is "data"?
What is "data"?What is "data"?
What is "data"?
 

Último

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Último (20)

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

What is Hadoop?

  • 1. MK99 – Big Data 1 Big data & cross-platform analytics MOOC lectures Pr. Clement Levallois
  • 2. MK99 – Big Data 2 Focus on “Hadoop” • Frequently mentioned in relation to big data • Vague definitions available and inflated talks • This short video will clarify it.
  • 3. MK99 – Big Data 3 • Note on the terminology: – “computers” are called “servers” when they are just used for computing / processing / storing data – They have no screen, no mouse and no keyboard because that’s not needed. – But they are basically computers!
  • 4. MK99 – Big Data 4 “Hadoop” • Created by Yahoo! engineers in ~ 2005. Named after the elephant toy of one of the engineer’s kid. • Made open source and now developed by the main open source developer community, called “Apache”. So you can see sometimes “Apache Hadoop”. • In simple words: – Hadoop is a free, open source software. – It serves to connect several servers, so that a single task can be accomplished in parallel on them. – So, with Hadoop and 5 servers you can get a task of data crunching finish 5 times sooner than with if you had just used one server. – That’s it!
  • 5. MK99 – Big Data 5 Why are Hadoop, cloud computing and big data often discussed together? – Imagine that you are Walmart and want to compute something on your CRM: say, what are the clients who are most profitable for each store, based on their purchase history. – You will need many servers to store the data, and many servers to do the computations. – Instead of purchasing a farm of servers for this (expensive! time consuming!), you can pay for a service of cloud computing (such as Amazon AWS EC2) to rent servers just for this task, – And install Hadoop on these servers to divide the task among all servers and get it to run in parallel, speeding up computation times. – You will get the results in minutes or hours, instead of days.
  • 6. MK99 – Big Data 6 And map/reduce? – “Map/reduce” is also an expression often discussed in relation with cloud computing and Hadoop. – This is a principle of programming perfected by engineers in Google around 2004, and made open source. – It is a principle that solves this problem: when I have data spread on 500 different servers, how do I search some data on all the servers? Checking all servers one by one (sequential search) would take a very long time. MapReduce dispatches the search on all servers at once, hence it is 500 times quicker than a sequential search. – Any software can use this principle of programming. Mapreduce is at the heart of Hadoop, which is one of the most popular software using it.
  • 7. MK99 – Big Data 7 What is the business relevance of Hadoop? • Hadoop made it possible to process large amounts of data quickly, using free software. • It enables business models where intensive data crunching is necessary to create value. • Examples: – Amazon computing book recommendations for you, – Walmart offering personalized coupons, – NYT showing personalized display ads, – Waze (driving app) showing the state of traffic on your road in real time, – your electricity utility company computing how much electricity should be generated at peak hours.