SlideShare una empresa de Scribd logo
1 de 10
1
Spark Introduction
Ben Liu – arkilis@gmail.com 2015
2OUTLINE
• What is Spark
• Why Spark
• Structure of Spark
• Run Spark
3WHAT IS SPARK
Spark Definition
• Big Data Analytic Engine
• Cluster Computing Framework
4
2009 2013
2011 NOW
WHAT IS SPARK
Spark History
Initialized by one Microsoft Paper and
created by UC Berkeley AMP Lab in 2009
Open Source
Donated to the Apache Software
Foundation in 2013
Current stable version: 1.5.2
5WHY SPARK
Spark Benefits
• Speed
– It claimed to be 100x faster than Hadoop. The
reason is instead of putting all the data on
hard, Spark temporarily store them on RAM.
– (Ben: however, it is not that faster, not as high as 100x during the use, at least on some applications.)
• Easy to use
6WHY SPARK
Three main distributed computing frameworks comparison
Hadoop Spark Storm
Source Google UC Berkeley AMP Lab Twitter
Open source
Date
2007 2011 2011
Support
Language
Java, and many
others
Scala, Java, Python Java, Clojure
Time Lagging High Seconds Real-time
Scenario • Low real-time
• large volume
of big data
• One batch
• Medium size Data
block
• More real-time
• Small Data Trunk
• High real-time
Used by Facebook,
Google
Google, Taobao Twitter, Sina Weibo
Source: based on what I found from wiki and google over time.
7SPARK STRUCTURE
API
Allow users to
interact with SQL
like queries.
Allow users to
process data in real
time and batch.
Allow users to
process data using
Machine Learning.
Allow users to
build/transform/rea
son graph data.
4
1
1
2
2
3 4
https://databricks.com/spark/about
8RUN SPARK
Spark Start Mode
• Local (Mainly for testing)
• Standalone
• Mesos (popular)
• YARN (popular)
9SPARK EXAMPLE
Run
Running cmd:
./spark-submit --master local ../ben_python/e0.py 40
Super Simple example (local mode):
https://github.com/arkilis/spark_python
10
Results Review

Más contenido relacionado

La actualidad más candente

Intro to word press
Intro to word pressIntro to word press
Intro to word press
Dan Phiffer
 
Template Languages for OpenStack - Heat and TOSCA
Template Languages for OpenStack - Heat and TOSCATemplate Languages for OpenStack - Heat and TOSCA
Template Languages for OpenStack - Heat and TOSCA
Cloud Native Day Tel Aviv
 

La actualidad más candente (20)

Intro to word press
Intro to word pressIntro to word press
Intro to word press
 
Wikipedia Cloud Search Webinar
Wikipedia Cloud Search WebinarWikipedia Cloud Search Webinar
Wikipedia Cloud Search Webinar
 
Alex Magnay - Azure Infrastructure as Code with Hashicorp Terraform
Alex Magnay - Azure Infrastructure as Code with Hashicorp TerraformAlex Magnay - Azure Infrastructure as Code with Hashicorp Terraform
Alex Magnay - Azure Infrastructure as Code with Hashicorp Terraform
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
 
Azure SQL Database: 12 Things to Know
Azure SQL Database: 12 Things to KnowAzure SQL Database: 12 Things to Know
Azure SQL Database: 12 Things to Know
 
Lessons Learnt from Guanyu
Lessons Learnt from GuanyuLessons Learnt from Guanyu
Lessons Learnt from Guanyu
 
Is Serverless The New Swiss Cheese? - AWS Seattle User Group
Is Serverless The New Swiss Cheese? - AWS Seattle User GroupIs Serverless The New Swiss Cheese? - AWS Seattle User Group
Is Serverless The New Swiss Cheese? - AWS Seattle User Group
 
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Terraform for azure: the good, the bad and the ugly -
Terraform for azure: the good, the bad and the ugly - Terraform for azure: the good, the bad and the ugly -
Terraform for azure: the good, the bad and the ugly -
 
Streaming Data Analytics with Amazon Kinesis Firehose and Redshift
Streaming Data Analytics with Amazon Kinesis Firehose and RedshiftStreaming Data Analytics with Amazon Kinesis Firehose and Redshift
Streaming Data Analytics with Amazon Kinesis Firehose and Redshift
 
Template Languages for OpenStack - Heat and TOSCA
Template Languages for OpenStack - Heat and TOSCATemplate Languages for OpenStack - Heat and TOSCA
Template Languages for OpenStack - Heat and TOSCA
 
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
 
Cloudsolutionday 2016: Compliance and cost controlling on AWS
Cloudsolutionday 2016: Compliance and cost controlling on AWSCloudsolutionday 2016: Compliance and cost controlling on AWS
Cloudsolutionday 2016: Compliance and cost controlling on AWS
 
AWSを利用した開発者・データを扱う人向けの資料
AWSを利用した開発者・データを扱う人向けの資料AWSを利用した開発者・データを扱う人向けの資料
AWSを利用した開発者・データを扱う人向けの資料
 
Which Freaking Database Should I Use?
Which Freaking Database Should I Use?Which Freaking Database Should I Use?
Which Freaking Database Should I Use?
 
Hadoopsummit16 myui
Hadoopsummit16 myuiHadoopsummit16 myui
Hadoopsummit16 myui
 
Dr. Elephant – Achieving Quicker, Easier, and Cost-Effective Big Data Analyti...
Dr. Elephant – Achieving Quicker, Easier, and Cost-Effective Big Data Analyti...Dr. Elephant – Achieving Quicker, Easier, and Cost-Effective Big Data Analyti...
Dr. Elephant – Achieving Quicker, Easier, and Cost-Effective Big Data Analyti...
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean Wampler
 

Destacado (6)

Pp llati
Pp llatiPp llati
Pp llati
 
Amusement park erin rys
Amusement park erin rysAmusement park erin rys
Amusement park erin rys
 
John locke and China
John locke and ChinaJohn locke and China
John locke and China
 
Toan canh vn
Toan canh vnToan canh vn
Toan canh vn
 
Flipzu Mobile - Intro
Flipzu Mobile - IntroFlipzu Mobile - Intro
Flipzu Mobile - Intro
 
Presentatie PBG 09-11
Presentatie PBG 09-11Presentatie PBG 09-11
Presentatie PBG 09-11
 

Similar a spark

Similar a spark (20)

Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Spark SQL & Machine Learning - A Practical Demonstration
Spark SQL & Machine Learning - A Practical DemonstrationSpark SQL & Machine Learning - A Practical Demonstration
Spark SQL & Machine Learning - A Practical Demonstration
 
Spark and Hadoop Technology
Spark and Hadoop Technology Spark and Hadoop Technology
Spark and Hadoop Technology
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topics
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Jaws - Data Warehouse with Spark SQL by Ema Orhian
Jaws - Data Warehouse with Spark SQL by Ema OrhianJaws - Data Warehouse with Spark SQL by Ema Orhian
Jaws - Data Warehouse with Spark SQL by Ema Orhian
 
Jaws - Data Warehouse with Spark SQL by Ema Orhian
Jaws - Data Warehouse with Spark SQL by Ema OrhianJaws - Data Warehouse with Spark SQL by Ema Orhian
Jaws - Data Warehouse with Spark SQL by Ema Orhian
 
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
 
Building machine learning applications locally with spark
Building machine learning applications locally with sparkBuilding machine learning applications locally with spark
Building machine learning applications locally with spark
 
SparkFramework
SparkFrameworkSparkFramework
SparkFramework
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the Ugly
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
 

spark

  • 1. 1 Spark Introduction Ben Liu – arkilis@gmail.com 2015
  • 2. 2OUTLINE • What is Spark • Why Spark • Structure of Spark • Run Spark
  • 3. 3WHAT IS SPARK Spark Definition • Big Data Analytic Engine • Cluster Computing Framework
  • 4. 4 2009 2013 2011 NOW WHAT IS SPARK Spark History Initialized by one Microsoft Paper and created by UC Berkeley AMP Lab in 2009 Open Source Donated to the Apache Software Foundation in 2013 Current stable version: 1.5.2
  • 5. 5WHY SPARK Spark Benefits • Speed – It claimed to be 100x faster than Hadoop. The reason is instead of putting all the data on hard, Spark temporarily store them on RAM. – (Ben: however, it is not that faster, not as high as 100x during the use, at least on some applications.) • Easy to use
  • 6. 6WHY SPARK Three main distributed computing frameworks comparison Hadoop Spark Storm Source Google UC Berkeley AMP Lab Twitter Open source Date 2007 2011 2011 Support Language Java, and many others Scala, Java, Python Java, Clojure Time Lagging High Seconds Real-time Scenario • Low real-time • large volume of big data • One batch • Medium size Data block • More real-time • Small Data Trunk • High real-time Used by Facebook, Google Google, Taobao Twitter, Sina Weibo Source: based on what I found from wiki and google over time.
  • 7. 7SPARK STRUCTURE API Allow users to interact with SQL like queries. Allow users to process data in real time and batch. Allow users to process data using Machine Learning. Allow users to build/transform/rea son graph data. 4 1 1 2 2 3 4 https://databricks.com/spark/about
  • 8. 8RUN SPARK Spark Start Mode • Local (Mainly for testing) • Standalone • Mesos (popular) • YARN (popular)
  • 9. 9SPARK EXAMPLE Run Running cmd: ./spark-submit --master local ../ben_python/e0.py 40 Super Simple example (local mode): https://github.com/arkilis/spark_python