SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
www.edureka.co/r-for-analytics
www.edureka.co/apache-spark-scala-training
Apache Spark: Beyond Hadoop MapReduce
Presenter: Vishal
Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training
What will you learn today?
 Strength of MapReduce
 Limitations of MapReduce
 How MapReduce limitations can be overcome
 How Spark fits the bill
 Other exciting features in Spark
Strength of MapReduce
Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training
Simple
Scalable
Fault
Tolerant
Minimal
data
motion
Strength of MapReduce
Independent of a programming language, such as
Java, C++ or Python.
It can process petabytes of data,
stored in HDFS on one cluster
MapReduce takes care of failures
using the replicated copies.
Process moves towards data to minimize Disk I/O
Limitations of MapReduce
Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training
Real
Time
Complex
Algorithm
Re-reading
and parsing
Data
Minimal
Data
Motion
Graph
Processing
Iterative
Tasks
Random
Access
Limitations Of MR
Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training
Feature Comparison with Spark
Fast 100x faster than MapReduce
Batch Processing Batch and Real-time Processing
Stores Data on Disk Stores Data in Memory
Written in Java Written in Scala
Hadoop MapReduce Hadoop Spark
Source: Databrix
What are the MR limitations and
how Spark overcomes it?
Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
By Cutting down on the number
of Reads and Writes to the disc
Real
time
Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training
Spark tries to keep things in-memory of its distributed workers, allowing for significantly faster/lower-latency
computations, whereas MapReduce keeps shuffling things in and out of disk.
Spark Cuts Down Read/Write I/O To Disk
Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Libraries for Machine
Learning & Streaming
Graph
processing
Complex
algorithm
Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training
Libraries For ML, Graph Programming …
Machine Learning
Library
Graph
programming
Spark interface
For RDBMS lovers
Utility for
continuous
ingestion of data
Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Cyclic data flows
Random
access
Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training
Cyclic Data Flows
• All jobs in spark comprise a series of operators and run on a set of data.
• All the operators in a job are used to construct a DAG (Directed Acyclic
Graph).
• The DAG is optimized by rearranging and combining operators where
possible.
Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training
Spark Features makes its Architecture better
than MR
Other Spark Features In Demand
Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training
Spark Features/Modules In Demand
Source: Typesafe
Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training
New Features In 2015
Data Frames 
• Similar API to data frames in R and Pandas
• Automatically optimised via Spark SQL
• Released in Spark 1.3
SparkR 
• Released in Spark 1.4
• Exposes DataFrames, RDD’s & MLlibrary in R
Machine Learning Pipelines 
• High Level API
• Featurization
• Evaluation
• Model Tuning
External Data Sources 
• Platform API to plug Data-Sources into Spark
• Pushes logic into sources
Source: Databrix
Slide 19Slide 19Slide 19 www.edureka.co/apache-spark-scala-training
Get Certified in Spark from Edureka
Edureka's Spark and Scala course:
• Learn large-scale data processing by mastering the concepts of Scala, RDD, Traits, OOPS and Spark SQL
• Online Live Courses: 24 hours
• Assignments: 32 hours
• Project: 20 hours
• Lifetime Access + 24 X 7 Support
Go to www.edureka.co/apache-spark-scala-training
Batch starts from 10th October (Weekend Batch)
Thank You
Questions/Queries/Feedback/Survey
Recording and presentation will be made available to you within 24 hours

Más contenido relacionado

La actualidad más candente

Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
Edureka!
 

La actualidad más candente (20)

Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
 
Spark Streaming
Spark StreamingSpark Streaming
Spark Streaming
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
 
Apache spark
Apache sparkApache spark
Apache spark
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
 
Sydney Apache Spark Meetup - Spark Natural Language Processing
Sydney Apache Spark Meetup - Spark Natural Language ProcessingSydney Apache Spark Meetup - Spark Natural Language Processing
Sydney Apache Spark Meetup - Spark Natural Language Processing
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Sydney Spark Meetup - September 2015
Sydney Spark Meetup - September 2015Sydney Spark Meetup - September 2015
Sydney Spark Meetup - September 2015
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
Spark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data ProcessingSpark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data Processing
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
 
Apache spark
Apache sparkApache spark
Apache spark
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
 

Destacado

Les Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverLes Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
OCTO Technology Suisse
 
De la pensée projet à la pensée produit
De la pensée projet à la pensée produitDe la pensée projet à la pensée produit
De la pensée projet à la pensée produit
OCTO Technology Suisse
 
Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?
OCTO Technology Suisse
 

Destacado (18)

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverLes Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Agile & Top Management
Agile & Top ManagementAgile & Top Management
Agile & Top Management
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 
De la pensée projet à la pensée produit
De la pensée projet à la pensée produitDe la pensée projet à la pensée produit
De la pensée projet à la pensée produit
 
Cloud : en 2017, sortez du stratus !
Cloud : en 2017, sortez du stratus !Cloud : en 2017, sortez du stratus !
Cloud : en 2017, sortez du stratus !
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
 
Démystifions l'API-culture!
Démystifions l'API-culture!Démystifions l'API-culture!
Démystifions l'API-culture!
 
Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 

Similar a Apache Spark beyond Hadoop MapReduce

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 

Similar a Apache Spark beyond Hadoop MapReduce (20)

Module01
 Module01 Module01
Module01
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
spark interview questions & answers acadgild blogs
 spark interview questions & answers acadgild blogs spark interview questions & answers acadgild blogs
spark interview questions & answers acadgild blogs
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Apache spark
Apache sparkApache spark
Apache spark
 

Más de Edureka!

Más de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Apache Spark beyond Hadoop MapReduce

  • 2. Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training What will you learn today?  Strength of MapReduce  Limitations of MapReduce  How MapReduce limitations can be overcome  How Spark fits the bill  Other exciting features in Spark
  • 4. Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training Simple Scalable Fault Tolerant Minimal data motion Strength of MapReduce Independent of a programming language, such as Java, C++ or Python. It can process petabytes of data, stored in HDFS on one cluster MapReduce takes care of failures using the replicated copies. Process moves towards data to minimize Disk I/O
  • 6. Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training Real Time Complex Algorithm Re-reading and parsing Data Minimal Data Motion Graph Processing Iterative Tasks Random Access Limitations Of MR
  • 7. Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training Feature Comparison with Spark Fast 100x faster than MapReduce Batch Processing Batch and Real-time Processing Stores Data on Disk Stores Data in Memory Written in Java Written in Scala Hadoop MapReduce Hadoop Spark Source: Databrix
  • 8. What are the MR limitations and how Spark overcomes it?
  • 9. Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training Overcoming MR limitations By Cutting down on the number of Reads and Writes to the disc Real time
  • 10. Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training Spark tries to keep things in-memory of its distributed workers, allowing for significantly faster/lower-latency computations, whereas MapReduce keeps shuffling things in and out of disk. Spark Cuts Down Read/Write I/O To Disk
  • 11. Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Libraries for Machine Learning & Streaming Graph processing Complex algorithm
  • 12. Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training Libraries For ML, Graph Programming … Machine Learning Library Graph programming Spark interface For RDBMS lovers Utility for continuous ingestion of data
  • 13. Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Cyclic data flows Random access
  • 14. Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training Cyclic Data Flows • All jobs in spark comprise a series of operators and run on a set of data. • All the operators in a job are used to construct a DAG (Directed Acyclic Graph). • The DAG is optimized by rearranging and combining operators where possible.
  • 15. Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training Spark Features makes its Architecture better than MR
  • 16. Other Spark Features In Demand
  • 17. Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training Spark Features/Modules In Demand Source: Typesafe
  • 18. Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training New Features In 2015 Data Frames  • Similar API to data frames in R and Pandas • Automatically optimised via Spark SQL • Released in Spark 1.3 SparkR  • Released in Spark 1.4 • Exposes DataFrames, RDD’s & MLlibrary in R Machine Learning Pipelines  • High Level API • Featurization • Evaluation • Model Tuning External Data Sources  • Platform API to plug Data-Sources into Spark • Pushes logic into sources Source: Databrix
  • 19. Slide 19Slide 19Slide 19 www.edureka.co/apache-spark-scala-training Get Certified in Spark from Edureka Edureka's Spark and Scala course: • Learn large-scale data processing by mastering the concepts of Scala, RDD, Traits, OOPS and Spark SQL • Online Live Courses: 24 hours • Assignments: 32 hours • Project: 20 hours • Lifetime Access + 24 X 7 Support Go to www.edureka.co/apache-spark-scala-training Batch starts from 10th October (Weekend Batch)
  • 20. Thank You Questions/Queries/Feedback/Survey Recording and presentation will be made available to you within 24 hours