SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Introducing Spark Core
Friday, January 22, 16
Agenda
• Assumptions
• Why Spark?
• What you need to know to begin?
Friday, January 22, 16
Assumptions
• You want to learn Apache Spark, but need to know where
to begin
• You need to know the fundamentals of Spark in order to
progress in your learning of Spark
• You need to evaluate if Spark could be an appropriate fit
for your use cases or career growth
One or more of the following
Friday, January 22, 16
In a nutshell, why spark?
• Engine for efficient large-scale processing. It’s faster than
Hadoop MapReduce
• Spark can complement your existing Hadoop investments
such as HDFS and Hive
• Rich ecosystem including support for SQL, Machine
Learning, Steaming and multiple language APIs such as
Scala, Python and Java
Friday, January 22, 16
Introduction
• Ok, so where should I start?
Friday, January 22, 16
Spark Essentials
• Resilient Distributed Datasets (RDD)
• Transformers
• Actions
• Spark Driver Programs and SparkContext
To begin, you need to know:
Friday, January 22, 16
Resilient Distributed Datasets (RDDs)
• RDDs are Spark’s primary abstraction for data
interaction (lazy, in memory)
• RDDs are an immutable, distributed collection of
elements separated into partitions
• There are multiple types of RDDs
• RDDs can be created from an external data sets such as
Hadoop InputFormats, text files on a variety of file
systems or existing RDDs via a Spark Transformations
Friday, January 22, 16
Transformations
• RDD functions which return pointers to new RDDs
(remember: lazy)
• map, flatMap, filter, etc.
Friday, January 22, 16
Actions
• RDD functions which return values to the driver
• reduce, collect, count, etc.
Friday, January 22, 16
Spark RDDs, Transformations, Actions Diagram
Load from External Source
Example: textFile
Transformations Actions
RDDs
Output Value(s)
Example: count, collect
5, ['a','b', 'c']
Friday, January 22, 16
Spark Driver Programs and Context
• Spark driver is a program that declares transformations and
actions on RDDs of data
• A driver submits the serialized RDD graph to the master
where the master creates tasks. These tasks are delegated to
the workers for execution.
• Workers are where the tasks are actually executed.
Friday, January 22, 16
Driver Program and SparkContext
Image borrowed from http://spark.apache.org/docs/latest/cluster-overview.html
Friday, January 22, 16
References
• For course information and discount coupons, visit http://
www.supergloo.com/
• Learning Spark Book Summary http://www.amazon.com/
Learning-Spark-Summary-Lightning-Fast-Deconstructed-
ebook/dp/B019HS7USA/
Friday, January 22, 16
Next Steps
Friday, January 22, 16

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Spark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark GroupSpark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark Group
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
 
Building end to end streaming application on Spark
Building end to end streaming application on SparkBuilding end to end streaming application on Spark
Building end to end streaming application on Spark
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
Apache Spark on Kubernetes
Apache Spark on KubernetesApache Spark on Kubernetes
Apache Spark on Kubernetes
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architecture
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
Building a REST API with Cassandra on Datastax Astra Using Python and Node
Building a REST API with Cassandra on Datastax Astra Using Python and NodeBuilding a REST API with Cassandra on Datastax Astra Using Python and Node
Building a REST API with Cassandra on Datastax Astra Using Python and Node
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
 

Similar a Spark Core

Similar a Spark Core (20)

Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupApache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentation
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
 
Data processing with spark in r & python
Data processing with spark in r & pythonData processing with spark in r & python
Data processing with spark in r & python
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
 
Spark Worshop
Spark WorshopSpark Worshop
Spark Worshop
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Big Data training
Big Data trainingBig Data training
Big Data training
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 

Último

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 

Último (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 

Spark Core

  • 2. Agenda • Assumptions • Why Spark? • What you need to know to begin? Friday, January 22, 16
  • 3. Assumptions • You want to learn Apache Spark, but need to know where to begin • You need to know the fundamentals of Spark in order to progress in your learning of Spark • You need to evaluate if Spark could be an appropriate fit for your use cases or career growth One or more of the following Friday, January 22, 16
  • 4. In a nutshell, why spark? • Engine for efficient large-scale processing. It’s faster than Hadoop MapReduce • Spark can complement your existing Hadoop investments such as HDFS and Hive • Rich ecosystem including support for SQL, Machine Learning, Steaming and multiple language APIs such as Scala, Python and Java Friday, January 22, 16
  • 5. Introduction • Ok, so where should I start? Friday, January 22, 16
  • 6. Spark Essentials • Resilient Distributed Datasets (RDD) • Transformers • Actions • Spark Driver Programs and SparkContext To begin, you need to know: Friday, January 22, 16
  • 7. Resilient Distributed Datasets (RDDs) • RDDs are Spark’s primary abstraction for data interaction (lazy, in memory) • RDDs are an immutable, distributed collection of elements separated into partitions • There are multiple types of RDDs • RDDs can be created from an external data sets such as Hadoop InputFormats, text files on a variety of file systems or existing RDDs via a Spark Transformations Friday, January 22, 16
  • 8. Transformations • RDD functions which return pointers to new RDDs (remember: lazy) • map, flatMap, filter, etc. Friday, January 22, 16
  • 9. Actions • RDD functions which return values to the driver • reduce, collect, count, etc. Friday, January 22, 16
  • 10. Spark RDDs, Transformations, Actions Diagram Load from External Source Example: textFile Transformations Actions RDDs Output Value(s) Example: count, collect 5, ['a','b', 'c'] Friday, January 22, 16
  • 11. Spark Driver Programs and Context • Spark driver is a program that declares transformations and actions on RDDs of data • A driver submits the serialized RDD graph to the master where the master creates tasks. These tasks are delegated to the workers for execution. • Workers are where the tasks are actually executed. Friday, January 22, 16
  • 12. Driver Program and SparkContext Image borrowed from http://spark.apache.org/docs/latest/cluster-overview.html Friday, January 22, 16
  • 13. References • For course information and discount coupons, visit http:// www.supergloo.com/ • Learning Spark Book Summary http://www.amazon.com/ Learning-Spark-Summary-Lightning-Fast-Deconstructed- ebook/dp/B019HS7USA/ Friday, January 22, 16