SlideShare una empresa de Scribd logo
1 de 63
APACHE-SPARK	

LARGE-SCALE DATA PROCESSING ENGINE
Bartosz Bogacki <bbogacki@bidlab.pl>
CTO, CODER, ROCK CLIMBER
• current: 	

• Chief Technology Officer at Bidlab
• previous:	

• IT Director at
InternetowyKantor.pl SA
• Software Architect / Project
Manager at Wolters Kluwer
Polska
• find out more (if you care):	

• linkedin.com/in/bartoszbogacki
WE PROCESS MORETHAN
200GB OF LOGS DAILY
Did I mention that…
?
WHY?
• To discover inventory and potential	

• To optimize traffic	

• To optimize campaigns	

• To learn about trends	

• To calculate conversions
APACHE SPARK !
HISTORY
• 2013-06-19 Project enters Apache incubation	

• 2014-02-19 Project established as an ApacheTop
Level Project.	

• 2014-05-30 Spark 1.0.0 released
• "Apache Spark is a (lightning-) fast and
general-purpose cluster computing system"	

• Engine compatible with Apache Hadoop	

• Up to 100x faster than Hadoop 	

• Less code to write, more elastic	

• Active community (117 developers
contributed to release 1.0.0)
KEY CONCEPTS
• Spark /YARN / Mesos resources compatible	

• HDFS / S3 support built-in	

• RDD - Resilient Distribiuted Dataset	

• Transformations & Actions	

• Written in Scala,API for Java / Scala / Python
ECOSYSTEM
• Spark Streaming	

• Shark	

• MLlib (machine learning)	

• GraphX	

• Spark SQL
RDD
• Collections of objects	

• Stored in memory (or disk)	

• Spread across the cluster	

• Auto-rebuild on failure
TRANSFORMATIONS
• map / flatMap	

• filter	

• union / intersection / join / cogroup	

• distinct	

• many more…
ACTIONS
• reduce / reduceByKey	

• foreach	

• count / countByKey	

• first / take / takeOrdered	

• collect / saveAsTextFile / saveAsObjectFile
EXAMPLES
val s1=sc.parallelize(Array(1,2,3,4,5))
val s2=sc.parallelize(Array(3,4,6,7,8))
val s3=sc.parallelize(Array(1,2,2,3,3,3))
!
s2.map(num => num * num)
// => 9, 16, 36, 49, 64
s1.reduce((a,b) => a + b)
// => 15
s1 union s2
// => 1, 2, 3, 4, 5, 3, 4, 6, 7, 8
s1 subtract s2
// => 1, 5, 2
s1 intersection s2
// => 4, 3
s3.distinct
// => 1, 2, 3
EXAMPLES
val set1 = sc.parallelize(Array[(Integer,String)]
((1,”bartek"),(2,"jacek"),(3,"tomek")))
val set2 = sc.parallelize(Array[(Integer,String)]
((2,”nowak”),(4,"kowalski"),(5,"iksiński")))
!
set1 join set2
// =>(2,(jacek,nowak))
set1 leftOuterJoin set2
// =>(1,(bartek,None)), (2,(jacek,Some(nowak))), (3,
(tomek,None))
set1 rightOuterJoin set2
// =>(4,(None,kowalski)), (5,(None,iksiński)), (2,
(Some(jacek),nowak))
EXAMPLES
set1.cogroup(set2).sortByKey()
// => (1,(ArrayBuffer(bartek),ArrayBuffer())), (2,
(ArrayBuffer(jacek),ArrayBuffer(nowak))), (3,
(ArrayBuffer(tomek),ArrayBuffer())), (4,
(ArrayBuffer(),ArrayBuffer(kowalski))), (5,
(ArrayBuffer(),ArrayBuffer(iksiński)))
!
set2.map((t) => (t._1, t._2.length))
// => (2,5), (4,8), (5,8)
!
val set3 = sc.parallelize(Array[(String,Long)]
(("onet.pl",1), ("onet.pl",1), ("wp.pl",1))
!
set3.reduceByKey((n1,n2) => n1 + n2)
// => (onet.pl,2), (wp.pl,1)
HANDS ON
RUNNING EC2 	

SPARK CLUSTER
./spark-ec2 -k spark-key -i spark-key.pem
-s 5
-t m3.2xlarge
launch cluster-name
--region=eu-west-1
SPARK CONSOLE
LINKING WITH SPARK
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.0.0</version>
</dependency>
If you want to use HDFS	

!
groupId = org.apache.hadoop
artifactId = hadoop-client
version = <your-hdfs-version>
If you want to use Spark Streaming	

!
groupId = org.apache.spark
artifactId = spark-streaming_2.10
version = 1.0.0
INITIALIZING
• SparkConf conf = new SparkConf()
.setAppName("TEST")
.setMaster("local");	

• JavaSparkContext sc = new
JavaSparkContext(conf);
CREATING RDD
• List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);	

• JavaRDD<Integer> distData = sc.parallelize(data);
CREATING RDD
• JavaRDD<String> logLines = sc.textFile("data.txt");
CREATING RDD
• JavaRDD<String> logLines = sc.textFile(”hdfs://
<HOST>:<PORT>/daily/data-20-00.txt”);	

• JavaRDD<String> logLines = sc.textFile(”s3n://my-
bucket/daily/data-*.txt”);
TRANSFORM
JavaRDD<Log> logs =
logLines.map(new Function<String, Log>() {
public Log call(String s) {
return LogParser.parse(s);
}
}).filter(new Function<Log, Boolean>(){
public Integer call(Log log) {
return log.getLevel() == 1;
}
});
ACTION :)
logs.count();
TRANSFORM-ACTION
List<Tuple2<String,Integer>> result = 	
	 sc.textFile(”/data/notifies-20-00.txt”)	
	 .mapToPair(new PairFunction<String, String, Integer>() {	
	 	 	 @Override	
	 	 	 public Tuple2<String, Integer> call(String line) throws Exception {	
	 	 	 	 NotifyRequest nr = LogParser.parseNotifyRequest(line);	
	 	 	 	 return new Tuple2<String, Integer>(nr.getFlightId(), 1);	
	 	 	 }	
	 	 })	
	 .reduceByKey(new Function2<Integer, Integer, Integer>(){	
	 	 	 @Override	
	 	 	 public Integer call(Integer v1, Integer v2) throws Exception {	
	 	 	 	 return v1 + v2;	
	 	 	 }})	
	 .sortByKey()	
.collect();
FUNCTIONS, 	

PAIRFUNCTIONS, 	

ETC.
BROADCASTVARIABLES
• "allow the programmer to keep a read-only
variable cached on each machine rather than
shipping a copy of it with tasks"
Broadcast<int[]> broadcastVar =
sc.broadcast(new int[] {1, 2, 3});
!
broadcastVar.value();
// returns [1, 2, 3]
ACCUMULATORS
• variables that are only “added” to through an associative
operation (add())	

• only the driver program can read the accumulator’s value
Accumulator<Integer> accum = sc.accumulator(0);
!
sc.parallelize(Arrays.asList(1, 2, 3, 4)).foreach(x ->
accum.add(x));
!
accum.value();
// returns 10
SERIALIZATION
• All objects used in your code have to be
serializable	

• Otherwise:
org.apache.spark.SparkException: Job aborted: Task not
serializable: java.io.NotSerializableException
USE KRYO SERIALIZER
public class MyRegistrator implements KryoRegistrator {	
	 @Override	
	 public void registerClasses(Kryo kryo) {	
	 	 kryo.register(BidRequest.class);	
	 	 kryo.register(NotifyRequest.class);	
	 	 kryo.register(Event.class);	
}	
}
sparkConfig.set(	
	 "spark.serializer", "org.apache.spark.serializer.KryoSerializer");	
sparkConfig.set(	
	 "spark.kryo.registrator", "pl.instream.dsp.offline.MyRegistrator");	
sparkConfig.set(	
	 "spark.kryoserializer.buffer.mb", "10");
CACHE !
JavaPairRDD<String, Integer> cachedSet = 	
	 sc.textFile(”/data/notifies-20-00.txt”)	
	 .mapToPair(new PairFunction<String, String, Integer>() {	
	 	 	 @Override	
	 	 	 public Tuple2<String, Integer> call(String line) throws Exception
	 	 	 {	
	 	 	 	 NotifyRequest nr = LogParser.parseNotifyRequest(line);	
	 	 	 	 return new Tuple2<String, Integer>(nr.getFlightId(), 1);	
	 	 	 }	
	 	 }).cache();
RDD PERSISTANCE
• MEMORY_ONLY	

• MEMORY_AND_DISK	

• MEMORY_ONLY_SER	

• MEMORY_AND_DISK_SER	

• DISK_ONLY	

• MEMORY_ONLY_2, MEMORY_AND_DISK_2, …	

• OFF_HEAP (Tachyon, ecperimental)
PARTITIONS
• RDD is partitioned	

• You may (and probably should) control number
and size of partitions with coalesce() method	

• By default 1 input file = 1 partition
PARTITIONS
• If your partitions are too big, you’ll face:
[GC 5208198K(5208832K), 0,2403780 secs]
[Full GC 5208831K->5208212K(5208832K), 9,8765730 secs]
[Full GC 5208829K->5208238K(5208832K), 9,7567820 secs]
[Full GC 5208829K->5208295K(5208832K), 9,7629460 secs]
[GC 5208301K(5208832K), 0,2403480 secs]
[Full GC 5208831K->5208344K(5208832K), 9,7497710 secs]
[Full GC 5208829K->5208366K(5208832K), 9,7542880 secs]
[Full GC 5208831K->5208415K(5208832K), 9,7574860 secs]
WARN storage.BlockManagerMasterActor: Removing
BlockManager BlockManagerId(0, ip-xx-xx-xxx-xxx.eu-
west-1.compute.internal, 60048, 0) with no recent heart
beats: 64828ms exceeds 45000ms
RESULTS
• result.saveAsTextFile(„hdfs://<HOST>:<PORT>/
out.txt")	

• result.saveAsObjectFile(„/result/out.obj”)	

• collect()
PROCESS RESULTS 	

PARTITION BY PARTITION
for (Partition partition : result.rdd().partitions()) {	
	 List<String> subresult[] = 	
	 	 result.collectPartitions(new int[] { partition.index() });	
	 	
	 for (String line : subresult[0])	
	 {	
	 	 System.out.println(line);	
	 }	
}
SPARK STREAMING
„SPARK STREAMING IS AN EXTENSION OFTHE
CORE SPARK APITHAT ENABLES 	

HIGH-THROUGHPUT, FAULT-TOLERANT
STREAM PROCESSING OF LIVE DATA STREAMS.”
HOW IT WORKS?
DSTREAMS
• continuous stream of data, either the input data
stream received from source, or the processed
data stream generated by transforming the input
stream	

• represented by a continuous sequence of RDDs
INITIALIZING
• SparkConf conf = new
SparkConf().setAppName("Real-Time
Analytics").setMaster("local");	

• JavaStreamingContext jssc = new
JavaStreamingContext(conf, new
Duration(TIME_IN_MILIS));;
CREATING DSTREAM
• JavaReceiverInputDStream<String> logLines =
jssc.socketTextStream(sourceAddr, sourcePort,
StorageLevels.MEMORY_AND_DISK_SER);
DATA SOURCES
• plainTCP sockets	

• Apache Kafka	

• Apache Flume	

• ZeroMQ
TRANSFORMATIONS
• map, flatMap, filter, union, join, etc.	

• transform	

• updateStateByKey
WINDOW OPERATIONS
• window	

• countByWindow / countByValueAndWindow	

• reduceByWindow / reduceByKeyAndWindow
OUTPUT OPERTIONS
• print	

• foreachRDD	

• saveAsObjectFiles	

• saveAsTextFiles	

• saveAsTextFiles
THINGSTO REMEMBER
USE SPARK-SHELLTO LEARN
PROVIDE ENOUGH RAM 	

TO WORKERS
PROVIDE ENOUGH RAM 	

TO EXECUTOR
SET FRAME SIZE / BUFFERS
ACCORDINGLY
USE KRYO SERIALIZER
SPLIT DATATO APPROPRIATE
NUMBER OF PARTITIONS
PACKAGEYOUR APPLICATION	

IN UBER-JAR
DESIGNYOUR DATA FLOW
AND…
BUILD A FRAMEWORKTO
PROCESS DATA EFFICIENTLY
IT’S EASIER WITH SCALA!
	 // word count example	
	 inputLine.flatMap(line => line.split(" "))	
	 	 .map(word => (word, 1))	
	 	 .reduceByKey(_ + _);
HOW WE USE SPARK?
HOW WE USE SPARK?
HOW WE USE SPARK?
THANKS!
we’re hiring !	

mail me: bbogacki@bidlab.pl

Más contenido relacionado

La actualidad más candente

Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized ViewsCarl Yeksigian
 
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsNeo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsMark Needham
 
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka featuresGrzegorz Duda
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Profiling Oracle with GDB
Profiling Oracle with GDBProfiling Oracle with GDB
Profiling Oracle with GDBEnkitec
 
Appengine Java Night #2b
Appengine Java Night #2bAppengine Java Night #2b
Appengine Java Night #2bShinichi Ogawa
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Appengine Java Night #2a
Appengine Java Night #2aAppengine Java Night #2a
Appengine Java Night #2aShinichi Ogawa
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
GDG DevFest 2015 - Reactive approach for slowpokes
GDG DevFest 2015 - Reactive approach for slowpokesGDG DevFest 2015 - Reactive approach for slowpokes
GDG DevFest 2015 - Reactive approach for slowpokesSergey Tarasevich
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Openstack grizzley puppet_talk
Openstack grizzley puppet_talkOpenstack grizzley puppet_talk
Openstack grizzley puppet_talkbodepd
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Developing your own OpenStack Swift middleware
Developing your own OpenStack Swift middlewareDeveloping your own OpenStack Swift middleware
Developing your own OpenStack Swift middlewareChristian Schwede
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 

La actualidad más candente (20)

Python database interfaces
Python database  interfacesPython database  interfaces
Python database interfaces
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsNeo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka features
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Profiling Oracle with GDB
Profiling Oracle with GDBProfiling Oracle with GDB
Profiling Oracle with GDB
 
Appengine Java Night #2b
Appengine Java Night #2bAppengine Java Night #2b
Appengine Java Night #2b
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Appengine Java Night #2a
Appengine Java Night #2aAppengine Java Night #2a
Appengine Java Night #2a
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
GDG DevFest 2015 - Reactive approach for slowpokes
GDG DevFest 2015 - Reactive approach for slowpokesGDG DevFest 2015 - Reactive approach for slowpokes
GDG DevFest 2015 - Reactive approach for slowpokes
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Openstack grizzley puppet_talk
Openstack grizzley puppet_talkOpenstack grizzley puppet_talk
Openstack grizzley puppet_talk
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Developing your own OpenStack Swift middleware
Developing your own OpenStack Swift middlewareDeveloping your own OpenStack Swift middleware
Developing your own OpenStack Swift middleware
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 

Destacado

B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsAmazon Web Services
 
Large-Scale Distributed Systems in Display Advertising
Large-Scale Distributed Systems in Display AdvertisingLarge-Scale Distributed Systems in Display Advertising
Large-Scale Distributed Systems in Display Advertisingbbogacki
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosDiscover Pinterest
 
Lessons learned from building Demand Side Platform
Lessons learned from building Demand Side PlatformLessons learned from building Demand Side Platform
Lessons learned from building Demand Side Platformbbogacki
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explainedStefan Urbanek
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsStefan Urbanek
 
Designing the perfect display monetization dashboard (public)
Designing the perfect display monetization dashboard (public)Designing the perfect display monetization dashboard (public)
Designing the perfect display monetization dashboard (public)Ian Thomas
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerIMC Institute
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 

Destacado (15)

B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 
Large-Scale Distributed Systems in Display Advertising
Large-Scale Distributed Systems in Display AdvertisingLarge-Scale Distributed Systems in Display Advertising
Large-Scale Distributed Systems in Display Advertising
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
 
Lessons learned from building Demand Side Platform
Lessons learned from building Demand Side PlatformLessons learned from building Demand Side Platform
Lessons learned from building Demand Side Platform
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explained
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects
 
Designing the perfect display monetization dashboard (public)
Designing the perfect display monetization dashboard (public)Designing the perfect display monetization dashboard (public)
Designing the perfect display monetization dashboard (public)
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainer
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Similar a Introduction to Apache Spark / PUT 06.2014

Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
Scala introduction
Scala introductionScala introduction
Scala introductionvito jeng
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Big Data Spain
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for TrainingBryan Yang
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGIMike Pittaro
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 

Similar a Introduction to Apache Spark / PUT 06.2014 (20)

Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Scala introduction
Scala introductionScala introduction
Scala introduction
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Czzawk
CzzawkCzzawk
Czzawk
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGI
 
Scala active record
Scala active recordScala active record
Scala active record
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 

Último

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 

Último (20)

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 

Introduction to Apache Spark / PUT 06.2014