Performance of Spark vs MapReduce

•

6 recomendaciones•2,613 vistas

Edureka!

Tecnología

www.edureka.co/apache-spark-scala-training
What will you learn today ?
 Beyond Hadoop MapReduce
 How Spark is better than MapReduce?
 Benchmark : Spark vs MapReduce
 Hands-On : Analyzing data with Spark

www.edureka.co/apache-spark-scala-training
Word Count Problem - MapReduce
MapReduce Code for a Simple Word Count Problem

www.edureka.co/apache-spark-scala-training
Apache Spark
Apache Spark is a general purpose data processing
engine with in-memory computing
Spark provides API for Scala, Java, Python and R which
makes Spark widely adopted for data processing

www.edureka.co/apache-spark-scala-training
How Spark fits into Hadoop Ecosystem ?
Spark is intended to enhance, not replace, the Hadoop stack
Spark is designed to read and write data to HDFS as well as other storage systems such as
CSV files, Amazon S3 and NoSQL databases

www.edureka.co/apache-spark-scala-training
Word Count Problem - Spark
Spark Scala Code for Word Count Problem
Spark Python Code for Word Count Problem
Clearly processing data with Spark is much
easier than MapReduce and Spark gives you
the flexibility to choose your favorite
language Scala, Java, Python etc.

www.edureka.co/apache-spark-scala-training
Why Spark for Big Data Analytics ?
What makes
Spark
suitable for
Big Data
Analytics ?

www.edureka.co/apache-spark-scala-training
Why Spark for Big Data Analytics ?
Following features make Spark, the best fit for Big Data Analytics :
 Spark simplifies data analysis
 Spark provides built-in libraries to do advanced analytics
 Spark speaks more than one language
 Spark provides faster results
 Spark allows you to use different Hadoop vendors

www.edureka.co/apache-spark-scala-training
Benchmark : Spark is Blazingly Fast

www.edureka.co/apache-spark-scala-training
Isn’t Spark In-Memory Only
But I have
heard Spark is
good for only
in-memory
processing?

www.edureka.co/apache-spark-scala-training
Spark : Best of both Worlds
It’s a common misconception Spark is only for in-memory processing. From its inception
Spark was designed to be a general execution engine that works both in-memory and on-
disk. Almost all Spark operators perform external operations when data does not fit in
memory

www.edureka.co/apache-spark-scala-training
Spark Libraries
 Spark SQL : Spark’s module for working with structured data
 MLlib : Spark’s machine learning library
 GraphX : Spark’s API for graph computation
 Spark Streaming : Spark’s API to process streaming data

www.edureka.co/apache-spark-scala-training
Spark in one Snapshot

www.edureka.co/apache-spark-scala-training
Spark Use Cases
Different companies are using Spark
for solving various problems e.g.
recommendation systems, business
intelligence, fraud detection etc.

www.edureka.co/apache-spark-scala-training
Who is using Spark?
A complete list of companies using Spark can be found here : https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark

www.edureka.co/apache-spark-scala-training
Spark is here to stay
Spark is not one of those "here today, gone tomorrow". Spark is here to stay for the foreseeable
future, and it is well worth to get your teeth into it in order to get value out of your data

www.edureka.co/apache-spark-scala-training
Hands-on
Analyzing data with Spark

www.edureka.co/apache-spark-scala-training
References
IBM backs Apache Spark for Big Data Analytics :
http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/
Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' :
http://fortune.com/2015/09/09/cloudera-spark-mapreduce/
5 reasons to turn to Spark for Big Data Analytics :
http://www.infoworld.com/article/2897287/big-data/5-reasons-to-turn-to-spark-for-big-data-analytics.html

www.edureka.co/apache-spark-scala-training
References
Spark new record for large scale sorting :
https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
How eBay uses Spark to ignite Data Analytics :
http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/
Spark is fast on disk too :
https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/

www.edureka.co/apache-spark-scala-training
Survey
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.

www.edureka.co/apache-spark-scala-training
Thank You …
Questions/Queries/Feedback
Recording and presentation will be made available to you within 24 hours

Más contenido relacionado

La actualidad más candente

Apache sparkDona Mary Philip

Apache sparkTEJPAL GAUTAM

Apache spark - Architecture , Overview & librariesWalaa Hamdy Assy

Big Data Processing With SparkEdureka!

What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit

An Introduction to Apache SparkDona Mary Philip

Spark StreamingEdureka!

Introduction to Apache SparkRahul Jain

Apache spark linkedinYukti Kaura

Introduction to Apache Spark and MLlibpumaranikar

Introduction to Apache SparkSamy Dindane

Intro to Apache SparkBTI360

Introduction to Apache SparkVincent Poncet

Learn Apache Spark: A Comprehensive GuideWhizlabs

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn

Apache Spark NotesVenkateswaran Kandasamy

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Spark_Part 1Shashi Prakash

Spark For Faster Batch ProcessingEdureka!

Apache Spark & ScalaEdureka!

La actualidad más candente (20)

Apache spark

Apache spark - Architecture , Overview & libraries

Big Data Processing With Spark

What No One Tells You About Writing a Streaming App: Spark Summit East talk b...

An Introduction to Apache Spark

Spark Streaming

Introduction to Apache Spark

Apache spark linkedin

Introduction to Apache Spark and MLlib

Introduction to Apache Spark

Intro to Apache Spark

Introduction to Apache Spark

Learn Apache Spark: A Comprehensive Guide

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...

Apache Spark Notes

Spark Summit EU talk by Debasish Das and Pramod Narasimha

Spark_Part 1

Spark For Faster Batch Processing

Apache Spark & Scala

Similar a Performance of Spark vs MapReduce

5 things one must know about spark!Edureka!

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!

Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Edureka!

Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Edureka!

5 things one must know about spark!Edureka!

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn

5 Reasons why Spark is in demand!Edureka!

Module01NPN Training

Spark introduction & Architecture.pptxMUMERSHARJEELCh

[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster ComputingRakuten Group, Inc.

Introduction to sparkHome

Apache Spark OverviewDharmjit Singh

Apache spark installation [autosaved]Shweta Patnaik

Apache Spark PDFNaresh Rupareliya

Using pySpark with Google Colab & Spark 3.0 previewMario Cartia

Infra space talk on Apache Spark - Into to CASKRob Mueller

Scalable Machine Learning with PySparkLadle Patel

spark interview questions & answers acadgild blogsprateek kumar

Started with-apache-sparkHappiest Minds Technologies

Similar a Performance of Spark vs MapReduce (20)

5 things one must know about spark!

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...

Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...

Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...

5 things one must know about spark!

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...

5 Reasons why Spark is in demand!

Module01

Spark introduction & Architecture.pptx

[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing

Introduction to spark

Apache Spark Overview

Apache spark installation [autosaved]

Apache Spark PDF

Using pySpark with Google Colab & Spark 3.0 preview

Infra space talk on Apache Spark - Into to CASK

Scalable Machine Learning with PySpark

spark interview questions & answers acadgild blogs

Started with-apache-spark

Más de Edureka!

What to learn during the 21 days Lockdown | EdurekaEdureka!

Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!

Top 5 Trending Business Intelligence Tools | EdurekaEdureka!

Tableau Tutorial for Data Science | EdurekaEdureka!

Python Programming Tutorial | EdurekaEdureka!

Top 5 PMP Certifications | EdurekaEdureka!

Top Maven Interview Questions in 2020 | EdurekaEdureka!

Linux Mint Tutorial | EdurekaEdureka!

How to Deploy Java Web App in AWS| EdurekaEdureka!

Importance of Digital Marketing | EdurekaEdureka!

RPA in 2020 | EdurekaEdureka!

Email Notifications in Jenkins | EdurekaEdureka!

EA Algorithm in Machine Learning | EdurekaEdureka!

Cognitive AI Tutorial | EdurekaEdureka!

AWS Cloud Practitioner Tutorial | EdurekaEdureka!

Blue Prism Top Interview Questions | EdurekaEdureka!

Big Data on AWS Tutorial | Edureka Edureka!

A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!

Kubernetes Installation on Ubuntu | EdurekaEdureka!

Introduction to DevOps | EdurekaEdureka!

Más de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka

Top 10 Dying Programming Languages in 2020 | Edureka

Top 5 Trending Business Intelligence Tools | Edureka

Tableau Tutorial for Data Science | Edureka

Python Programming Tutorial | Edureka

Top 5 PMP Certifications | Edureka

Top Maven Interview Questions in 2020 | Edureka

Linux Mint Tutorial | Edureka

How to Deploy Java Web App in AWS| Edureka

Importance of Digital Marketing | Edureka

RPA in 2020 | Edureka

Email Notifications in Jenkins | Edureka

EA Algorithm in Machine Learning | Edureka

Cognitive AI Tutorial | Edureka

AWS Cloud Practitioner Tutorial | Edureka

Blue Prism Top Interview Questions | Edureka

Big Data on AWS Tutorial | Edureka

A star algorithm | A* Algorithm in Artificial Intelligence | Edureka

Kubernetes Installation on Ubuntu | Edureka

Introduction to DevOps | Edureka

Último

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

CloudStudio User manual (basic edition):comworks

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Install Stable Diffusion in windows machinePadma Pradeep

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

WordPress Websites for Engineers: Elevate Your Brandgvaughan

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Performance of Spark vs MapReduce

1. www.edureka.co/apache-spark-scala-training Performance of Spark vs MapReduce

2. www.edureka.co/apache-spark-scala-training What will you learn today ?  Beyond Hadoop MapReduce  How Spark is better than MapReduce?  Benchmark : Spark vs MapReduce  Hands-On : Analyzing data with Spark

3. www.edureka.co/apache-spark-scala-training Word Count Problem - MapReduce MapReduce Code for a Simple Word Count Problem

4. www.edureka.co/apache-spark-scala-training Apache Spark Apache Spark is a general purpose data processing engine with in-memory computing Spark provides API for Scala, Java, Python and R which makes Spark widely adopted for data processing

5. www.edureka.co/apache-spark-scala-training How Spark fits into Hadoop Ecosystem ? Spark is intended to enhance, not replace, the Hadoop stack Spark is designed to read and write data to HDFS as well as other storage systems such as CSV files, Amazon S3 and NoSQL databases

6. www.edureka.co/apache-spark-scala-training Word Count Problem - Spark Spark Scala Code for Word Count Problem Spark Python Code for Word Count Problem Clearly processing data with Spark is much easier than MapReduce and Spark gives you the flexibility to choose your favorite language Scala, Java, Python etc.

7. www.edureka.co/apache-spark-scala-training Why Spark for Big Data Analytics ? What makes Spark suitable for Big Data Analytics ?

8. www.edureka.co/apache-spark-scala-training Why Spark for Big Data Analytics ? Following features make Spark, the best fit for Big Data Analytics :  Spark simplifies data analysis  Spark provides built-in libraries to do advanced analytics  Spark speaks more than one language  Spark provides faster results  Spark allows you to use different Hadoop vendors

9. www.edureka.co/apache-spark-scala-training Benchmark : Spark is Blazingly Fast

10. www.edureka.co/apache-spark-scala-training Isn’t Spark In-Memory Only But I have heard Spark is good for only in-memory processing?

11. www.edureka.co/apache-spark-scala-training Spark : Best of both Worlds It’s a common misconception Spark is only for in-memory processing. From its inception Spark was designed to be a general execution engine that works both in-memory and on- disk. Almost all Spark operators perform external operations when data does not fit in memory

12. www.edureka.co/apache-spark-scala-training Spark Libraries  Spark SQL : Spark’s module for working with structured data  MLlib : Spark’s machine learning library  GraphX : Spark’s API for graph computation  Spark Streaming : Spark’s API to process streaming data

13. www.edureka.co/apache-spark-scala-training Spark in one Snapshot

14. www.edureka.co/apache-spark-scala-training Spark Use Cases Different companies are using Spark for solving various problems e.g. recommendation systems, business intelligence, fraud detection etc.

15. www.edureka.co/apache-spark-scala-training Who is using Spark? A complete list of companies using Spark can be found here : https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark

16. www.edureka.co/apache-spark-scala-training Spark is here to stay Spark is not one of those "here today, gone tomorrow". Spark is here to stay for the foreseeable future, and it is well worth to get your teeth into it in order to get value out of your data

17. www.edureka.co/apache-spark-scala-training Hands-on Analyzing data with Spark

18. www.edureka.co/apache-spark-scala-training References IBM backs Apache Spark for Big Data Analytics : http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/ Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' : http://fortune.com/2015/09/09/cloudera-spark-mapreduce/ 5 reasons to turn to Spark for Big Data Analytics : http://www.infoworld.com/article/2897287/big-data/5-reasons-to-turn-to-spark-for-big-data-analytics.html

19. www.edureka.co/apache-spark-scala-training References Spark new record for large scale sorting : https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html How eBay uses Spark to ignite Data Analytics : http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/ Spark is fast on disk too : https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/

20. www.edureka.co/apache-spark-scala-training Survey Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better! Please spare few minutes to take the survey after the webinar.

21. www.edureka.co/apache-spark-scala-training Thank You … Questions/Queries/Feedback Recording and presentation will be made available to you within 24 hours

Performance of Spark vs MapReduce

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Performance of Spark vs MapReduce

Similar a Performance of Spark vs MapReduce (20)

Más de Edureka!

Más de Edureka! (20)

Último

Último (20)

Performance of Spark vs MapReduce