An Introduction to Apache Hadoop MapReduce

•Descargar como ODP, PDF•

3 recomendaciones•1,170 vistas

An Introduction to Apache Hadoop MapReduce, what is it and how does it work ? What is the map reduce cycle and how are jobs managed. Why should it be used and who are big users and providers ?

Tecnología Empresariales

Apache Hadoop MapReduce
● What is it ?
● Why use it ?
● How does it work
● Some examples
● Big users

MapReduce – What is it ?
● Processing engine of Hadoop
● Developers create Map and Reduce jobs
● Used for big data batch processing
● Parallel processing of huge data volumes
● Fault tolerant
● Scalable

MapReduce – What use it ?
● Your data in Terabyte / Petabyte range
● You have huge I/O
● Hadoop framework takes care of
– Job and task management
– Failures
– Storage
– Replication
● You just write Map and Reduce jobs

MapReduce – How does it work ?
Take word counting as an example, something that Google does
all of the time.

MapReduce – How does it work ?
● Input data split into shards
● Split data mapped to key,value pairs i.e. Bear,1
● Mapped data shuffled/sorted by key i.e. Bear
● Sorted data reduced i.e. Bear, 2
● Final data stored on HDFS
● There might be extra map layer before shuffle
● JobTracker controls all tasks in job
● TaskTracker controls map and reduce

MapReduce - Some examples
A visual example with colours to show you the cycle
Split -> Map -> Shuffle -> Reduce

MapReduce - Some examples
A visual example of MapReduce with job and task trackers added to
individual map and reduce jobs.

Hadoop MapReduce – Big users
● Users
– Facebook
– Yahoo
– Amazon
– Ebay
● Providers
– Amazon
– Cloudera
– HortonWorks
– MapR

Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems

Más contenido relacionado

Más de Mike Frampton

PrometheusMike Frampton

Apache TephraMike Frampton

Apache KuduMike Frampton

Apache BahirMike Frampton

Apache ArrowMike Frampton

JanusGraph DBMike Frampton

Apache IgniteMike Frampton

Apache SamzaMike Frampton

Apache FlinkMike Frampton

Apache EdgentMike Frampton

Apache CouchDBMike Frampton

An introduction to Apache MesosMike Frampton

An introduction to PentahoMike Frampton

An introduction to Apache ThriftMike Frampton

An introduction to Apache CassandraMike Frampton

An example Hadoop InstallMike Frampton

An Introduction to Apache Hadoop YarnMike Frampton

An Introduction to Cloud ComputingMike Frampton

An Introduction to Hadoop Hue GuiMike Frampton

An introduction to Apache Hadoop HiveMike Frampton

Más de Mike Frampton (20)

Prometheus

Apache Tephra

Apache Kudu

Apache Bahir

Apache Arrow

JanusGraph DB

Apache Ignite

Apache Samza

Apache Flink

Apache Edgent

Apache CouchDB

An introduction to Apache Mesos

An introduction to Pentaho

An introduction to Apache Thrift

An introduction to Apache Cassandra

An example Hadoop Install

An Introduction to Apache Hadoop Yarn

An Introduction to Cloud Computing

An Introduction to Hadoop Hue Gui

An introduction to Apache Hadoop Hive

Último

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Story boards and shot lists for my a level piececharlottematthew16

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

CloudStudio User manual (basic edition):comworks

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

"ML in Production",Oleksandr BaganFwdays

Training state-of-the-art general text embeddingZilliz

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

An Introduction to Apache Hadoop MapReduce

1. Apache Hadoop MapReduce ● What is it ? ● Why use it ? ● How does it work ● Some examples ● Big users

2. MapReduce – What is it ? ● Processing engine of Hadoop ● Developers create Map and Reduce jobs ● Used for big data batch processing ● Parallel processing of huge data volumes ● Fault tolerant ● Scalable

3. MapReduce – What use it ? ● Your data in Terabyte / Petabyte range ● You have huge I/O ● Hadoop framework takes care of – Job and task management – Failures – Storage – Replication ● You just write Map and Reduce jobs

4. MapReduce – How does it work ? Take word counting as an example, something that Google does all of the time.

5. MapReduce – How does it work ? ● Input data split into shards ● Split data mapped to key,value pairs i.e. Bear,1 ● Mapped data shuffled/sorted by key i.e. Bear ● Sorted data reduced i.e. Bear, 2 ● Final data stored on HDFS ● There might be extra map layer before shuffle ● JobTracker controls all tasks in job ● TaskTracker controls map and reduce

6. MapReduce - Some examples A visual example with colours to show you the cycle Split -> Map -> Shuffle -> Reduce

7. MapReduce - Some examples A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.

8. Hadoop MapReduce – Big users ● Users – Facebook – Yahoo – Amazon – Ebay ● Providers – Amazon – Cloudera – HortonWorks – MapR

9. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

10. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

An Introduction to Apache Hadoop MapReduce

Recomendados

Recomendados

Más contenido relacionado

Más de Mike Frampton

Más de Mike Frampton (20)

Último

Último (20)

An Introduction to Apache Hadoop MapReduce