A Hadoop Primer

•

6 recomendaciones•763 vistas

sogrady

A simple introduction to Hadoop talk given to the Maine Java Users' Group February 15, 2011.

Tecnología

Project Architecture

Source: Running Hadoop On Ubuntu Linux, Michael G. Noll, 8.8.07

12

“The big issue is not that everyone will
suddenly operate at petabyte scale; a lot of
folks do not have that much data.

The more important topics are the specifics
of the storage and processing infrastructure
and what approaches best suit each
problem.”
- Bradford Cross, Flightcaster/Woven

20

“build Amazon's product search indices”
“build the recommender system for behavioral targeting”
“ETL style processing and statistics generation”
“information extraction & search”
“searching and analysis of millions of rental bookings”
“we use Hadoop to summarize of user's tracking data”
“we use Hadoop to store ad serving logs”
“the freedom to query the data in an ad-hoc manner”
“generating web graphs on 100 nodes”
“we use Hadoop for batch-processing large RDF datasets”
“facial similarity and recognition across large datasets“
“We are using Hadoop and Nutch to crawl Blog posts”
“Used for ETL & data analysis on terascale datasets”
Source: http://wiki.apache.org/hadoop/PoweredBy

24

Crawling Largeish
Unstructured Datasets

30

Más contenido relacionado

La actualidad más candente

Dataiku big data paris - the rise of the hadoop ecosystemDataiku

Introduction to Big Data and hadoopSandeep Patil

Apache Con Eu2008 Hadoop Tour Tom Whitetomwhite

Introduction of Big data and Hadoop Arohi Khandelwal

ESIP 2018 - The Case for Archives of ConvenienceDan Pilone

Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop

Cassandra euJeremy Hanna

Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople

Hunk - Unlocking the Power of Big DataSplunk

Beauty and Big DataSri Ambati

Big Data Analytics for Non-ProgrammersEdureka!

Big data PPT Nitesh Dubey

Big Dataipower softwares

Introduction to Apache HadoopChristopher Pezza

Open source big data landscape and possible ITS applicationsSoftwareMill

Big data referenceszarigatongy

Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit

Small intro to Big Data - Old versionSoftwareMill

Winning With Big Data: Secrets of the Successful Data ScientistDataspora

Winning with Big Data: Secrets of the Successful Data ScientistDataspora

La actualidad más candente (20)

Dataiku big data paris - the rise of the hadoop ecosystem

Introduction to Big Data and hadoop

Apache Con Eu2008 Hadoop Tour Tom White

Introduction of Big data and Hadoop

ESIP 2018 - The Case for Archives of Convenience

Hadoop at Yahoo! -- Hadoop World NY 2009

Cassandra eu

Introduction To Big Data Analytics On Hadoop - SpringPeople

Hunk - Unlocking the Power of Big Data

Beauty and Big Data

Big Data Analytics for Non-Programmers

Big data PPT

Big Data

Introduction to Apache Hadoop

Open source big data landscape and possible ITS applications

Big data references

Data infrastructure architecture for medium size organization: tips for colle...

Small intro to Big Data - Old version

Winning With Big Data: Secrets of the Successful Data Scientist

Winning with Big Data: Secrets of the Successful Data Scientist

Similar a A Hadoop Primer

HadoopOded Rotter

002 Introduction to hadoop v3Dendej Sawarnkatat

Hadoop DeveloperEdureka!

Big Data in the Microsoft PlatformJesus Rodriguez

Oct 2011 CHADNUG Presentation on HadoopJosh Patterson

HadoopZubair Arshad

Hadoop and Big Data: RevealedSachin Holla

Introduction to Apache Hadoop EcosystemMahabubur Rahaman

Hadoop seminarKrishnenduKrishh

Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed

Unit IV.pdfKennyPratheepKumar

Introduction to BIg Data and HadoopAmir Shaikh

Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan

What is hadoopAsis Mohanty

HadoopHimanshu Soni

Bi on Big Data - Strata 2016 in LondonDremio Corporation

Introduction to hadoopGanesh Sanap

Introduction to Big data & Hadoop -IEdureka!

Hadoop technology doctipanagiriharika

Hadoop PrimerSteve Staso

Similar a A Hadoop Primer (20)

Hadoop

002 Introduction to hadoop v3

Hadoop Developer

Big Data in the Microsoft Platform

Oct 2011 CHADNUG Presentation on Hadoop

Hadoop

Hadoop and Big Data: Revealed

Introduction to Apache Hadoop Ecosystem

Hadoop seminar

Analyzing Big data in R and Scala using Apache Spark 17-7-19

Unit IV.pdf

Introduction to BIg Data and Hadoop

Simple, Modular and Extensible Big Data Platform Concept

What is hadoop

Hadoop

Bi on Big Data - Strata 2016 in London

Introduction to hadoop

Introduction to Big data & Hadoop -I

Hadoop technology doc

Hadoop Primer

Más de sogrady

What Will You Build, and Why?sogrady

The Open Source Forecast is Cloudysogrady

Innovate / Disruptsogrady

Freedom: For Better and For Worsesogrady

The Cloud and the New Kingmakerssogrady

What a Long Strange Trip It's Beensogrady

The Rise and Fall and Rise of Java (2013)sogrady

The New Kingmakerssogrady

What Java Can Learn From JavaScriptsogrady

Open Cloud & The Future of Cloud Computing sogrady

Begun, the IP Wars Havesogrady

Java in the Age of the JVMsogrady

RedMonk Analytics: Why, How and Whatsogrady

The Future of the Cloud is Opensogrady

Showcase Your Data w/ RedMonk Analyticssogrady

Snapshot: Developer Activitysogrady

Survival of the Forgessogrady

All Data Big and Smallsogrady

Open Source + Big Data = Big Money sogrady

Más de sogrady (20)

What Will You Build, and Why?

The Open Source Forecast is Cloudy

Innovate / Disrupt

Freedom: For Better and For Worse

The Cloud and the New Kingmakers

What a Long Strange Trip It's Been

The Rise and Fall and Rise of Java (2013)

The New Kingmakers

What Java Can Learn From JavaScript

Open Cloud & The Future of Cloud Computing

Begun, the IP Wars Have

Java in the Age of the JVM

RedMonk Analytics: Why, How and What

The Future of the Cloud is Open

Showcase Your Data w/ RedMonk Analytics

Snapshot: Developer Activity

Survival of the Forges

All Data Big and Small

Open Source + Big Data = Big Money

Último

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes

Connecting the Dots for Information Discovery.pdfNeo4j

Scale your database traffic with Read & Write split using MySQL RouterMydbops

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

2024 April Patch TuesdayIvanti

A Journey Into the Emotions of Software DevelopersNicole Novielli

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

A Hadoop Primer

1. A Hadoop Primer Feb 2011 10.20.2005

2. http://redmonk.com/public/hadoop.pdf 2

3. The Background 3

4. October, 2003 4

5. December, 2004 5

6. Map::Reduce 6

7. Job::Map Reduce::Output 7

8. Counting Shakespeare 8

9. The Birth of Hadoop 9

10. 10

11. 11

12. Project Architecture Source: Running Hadoop On Ubuntu Linux, Michael G. Noll, 8.8.07 12

13. Project Traction 13

14. Employment Potential 14

15. Hadoop Users 15

16. Why Hadoop? 16

17. More Machines = More Faster 17

18. The reason everyone knows 18

19. BIG DATA 19

20. “The big issue is not that everyone will suddenly operate at petabyte scale; a lot of folks do not have that much data. The more important topics are the specifics of the storage and processing infrastructure and what approaches best suit each problem.” - Bradford Cross, Flightcaster/Woven 20

21. The reason not everyone knows 21

22. ru d U st tu e Data n r c 22

23. What Hadoop Is 23

24. “build Amazon's product search indices” “build the recommender system for behavioral targeting” “ETL style processing and statistics generation” “information extraction & search” “searching and analysis of millions of rental bookings” “we use Hadoop to summarize of user's tracking data” “we use Hadoop to store ad serving logs” “the freedom to query the data in an ad-hoc manner” “generating web graphs on 100 nodes” “we use Hadoop for batch-processing large RDF datasets” “facial similarity and recognition across large datasets“ “We are using Hadoop and Nutch to crawl Blog posts” “Used for ETL & data analysis on terascale datasets” Source: http://wiki.apache.org/hadoop/PoweredBy 24

25. What Hadoop Isn't 25

26. A relational database killer No Yes 26

27. Beyond Hadoop 27

28. The Hadoop Ecosystem 28

29. What We Use Hadoop For 29

30. Crawling Largeish Unstructured Datasets 30

31. Like 1.3M StackOverflow Questions 31

32. Or 1.7M HackerNews Entries 32

33. Or Years of Apache Log Files 33

34. How to Get Started 34

35. We use Cloudera 35

36. Mostly because it's easy 36