SlideShare a Scribd company logo
1 of 46
2 0 1 7 . 0 6
給 初 學 者 的 S p a r k 教 學
P o p c o r n y ( 陸 振 恩 )
Who am I
• 陸振恩 (popcorny)
• Director of Engineering @TenMax
• 之前經歷
– 交大資科所
– 第四屆趨勢百萬程式競賽冠軍
– 聯發科技 (2005- 2010)
– SmartQ (2011 – 2014)
– cacaFly/TenMax (2014-present)
• FB: https://fb.me/popcornylu
2
Target Audience
• 有基本的Java寫作能力
• 最好有Java8 Stream或是其他語言function language相關的基本概念 (map,
flatMap, filter, reduce, …)
• 還不會寫Spark,或是看過Spark的書還沒動手做的
3
Outline
• 了解Spark的基本常識
• 介紹Spark DataFrame/SQL
• 寫一個Spark Application
4
• 了解Spark的基本常識
• 介紹Spark DataFrame/SQL
• 寫一個Spark Application
5
Introduction to Spark
• Spark是一個分散式運算引擎
• MapReduce框架
• 以RDD為基礎 (Resilient Distributed Datasets)
6
Spark適合什麼
• 適合
– 大資料量的批次資料處理
– 流運算
– 各種資料量的ETL及資料分析
• 不適合
– RDBMS就可以解決你的需求的時候
7
Big Data ArchitectureDistributedFileSystem
DistributedFileSystem
Resource Manager
Computation Framework
Application Framework
Application
8
Hadoop Application
Pig / Hive
Hadoop ArchitectureHDFS
HDFS
YARN
Hadoop MapReduce V2
9
Spark ArchitectureDFS
DFS
YARN
Spark Application
Spark DataFrame,SQL / Stream / Mllib / GraphX
10
Spark
Spark Application
Driver
Executor Executor
Executor Executor
Spark
context
Executor
Application
spark-submit
application.jar
Cluster 11
Node
JVM Process
Spark RDD
• Resilient Distributed Dataset
• 可以把它想成是Java的Stream,只是分散式的版本
• 特色
– Lazy Evaluation: 只有在action被觸發時,才會真正運算,否則只是
建立關聯而已。
– Partitioned: 資料可以分成很多可以平行處理的partition
– Cachable: 運算過的資料可以暫存在executor中。
– Reusable: RDD可以被重複使用,相較的Java Stream就只能用一
次。
12
Spark RDD
• 處理資料不外乎Input, Transformation, Output
• 或是稱為ETL (Extract, Transformation, Load)
• 而Spark中
– Input是由spark context產生的RDD
– 由RDD可以產生一系列的transformation
– 最後執行一個action,會啟動整個pipeline,並且產生output到
action所對應的地方
13
Input
• 都是從spark context取得input的RDD
• sc.parallelize(list): 把一個list送到spark cluster
• sc.textFile(path): 從path取得一個文字檔
14
Simple Operations
• map(func): 一對一的轉換
T  U
• flatMap(func): 一對多的轉換
T  0..* U
• mapPartitions(func) : 多對多的轉換
T0..*  0..* U
• filter(func) : 過濾器
T  0..1T
15
Shuffle Operations (Single Source)
• groupByKey([numTasks]): 把同樣的key的資料串成一個list
(K, V)  (K, Iterable<V>)
• reduceByKey(func, [numTasks]): 把同樣的資料reduce起來
(K, V)  (K, V),
reducer (V,V)  V
• aggregateByKey(zeroValue, seqOp, combOp, [numTasks]): 把同樣的資料
reduce起來,但是透過accumulator
(K, V)  (K, U),
seqOp (U,V) -> U,
combOp (U,U)  U
• sortByKey([ancending], [numTasks]): 根據key排序
(K, V)  (K, V)
16
Shuffle Operations (Two Sources)
• cartesian(otherDataset, [numTasks]): 把兩邊的資料n x m種的完全配對。例
如撲克牌的4個花色 x 13個數字可以配對成整副牌。
T, U  (T, U)
• join(otherDataset, [numTasks]): 把同key的資料join起來,支援inner join,
left/right/full outer join
(K, V), (K, W)  (K, (V, W))
• cogroup(otherDataset, [numTasks]): 類似gropuByKey,只是是兩個
sources的版本
(K, V), (K, W)  (K, (Iterable<V>, Iterable<W>))
17
Repartition Operations
• repartition(numParitions): 單純shuffle
• coalesce(numParitions): 不會shuffle,只是減少partition數量
18
Actions
• 寫檔案
– saveAsTextFile(path)
• 傳回driver
– first(): 取得第一筆
– take(n): 取得前n筆
– collect(): 取得所有的結果
– count(): 算結果有幾筆
– reduce(func): 用一個reducer去收資料
• 直接在exectuor內部執行
– foreach(func): 直接在executor中一個一個item callback
19
Word Count
20
RDD Graph
21
RDD Graph
TASK
STAGE
JOB
22
Shuffle
• 資料交換的動作
• 資料必須要先有key, value
• 用key來分群
• 同一個key的一定被分到同一個partition
• 這東西其實就是MapReduce在做的事情
23
Shuffle
24Source: MapReduce Shuffle原理 与 Spark Shuffle原理
Job, Stage, Task
• Application由spark-submit產生
• Job由action operation產生
• Stage由shuffle operation產生,不同stage可以有不同的task數量。
• Task由shuffle operation的tasks或由input partition來決定數量,為平行
處理中最小不可切割的任務。
Cluster Application Job Stage Task
1 * 1 * 1 * 1 *
25
Operations
groupByKey
reduceByKey
aggregateByKey
repartition
map
flatMap
mapPartitions
filter
cartesian
join
cogroup
foreach
foreachPartitions
DistributedFileSystem
DistributedFileSystem
sc.textFile
sc.xxxFile
Driver Program
saveAsTextFile
saveAsXxxFile
sc.parallelized collect first
take count
reduce
26
• 了解Spark的基本常識
• 了解Spark DataFrame/SQL常用操作
• 寫一個Spark Application
27
Spark DataFrame & Dataset
• DataFrame
– 就像是RDBMS的table
– 有Schema,並且可以是巢狀的
– Dataset<Row>
– 一筆資料由很多columns所組成
• Dataset
– Dataset<T>
– Typed dataset
28
Reader and Writer
• Input/output的來源
– RDD
– File
• 支援的格式
– CSV
– Json
– Parquet (推薦)
29
DataFrame Operations
• select(column…)
• distinct()
• join(right, column)
• where(column)
• groupBy(columns…)
• agg(column...)
• orderBy(column…)
30
DataFrame Functions
• Import org.apache.spark.sql.functions
• Normal Functions
– col(name)
• Aggregation Functions
– min(column)
– max(column)
– count(column)
– sum(column)
– avg(column)
31
DataFrame Operations
32
• Data
DataFrame Operations
33
• SQL
Select year, region, sum(people_total) as people_total
from population group by year, region order by people_total desc
• Spark Dataframe
DataFrame Schema
• 定義Schema
– JavaBean, Encoder
– 程式化指定
– Metastore (Hive)
– 從檔案內容去推導schema
• 檢查Schema
– df.printSchema()
34
Spark SQL
• 用SQL語法來query dataframe
• SQL本身是一個declarative語言,所以內建優化引擎,把它變成phisicial的
dataframe operations
• Output則是另外一個dataframe
35
Spark SQL
36
1. 了解Spark的基本常識
2. 了解Spark DataFrame/SQL常用操作
3. 寫一個Spark Application
37
Spark Application
• 包裝在一個application jar
• 透過spark-submit來執行程式
• Submit需要指定master
• Master代表的是一個resource manager或說是cluster manager。
Submit之後會在整個resource manager取得所需要的資源
• Spark application透過spark context跟這些資源互動
38
Uber jar
• 因為spark application jar需要傳到各個executer執行,所以要怎麼把用到的
library也傳過去?
• 把所用到的jar檔解開,必且直接包在application jar,這種方法就叫做uber jar
• 或稱fat jar或shadow jar
39
Spark Template Project
• https://github.com/popcornylu/spark-wordcount
• Commands
– Application Jar:
./gradlew jar
spark-submit –master local[*] build/libs/spark-wordcount.jar
– Application uber jar
./gradlew shadowJar
spark-submit –master local[*] build/libs/spark-wordcount-
all.jar
40
Resource Manager
• Local
• Standalone cluster
• YARN cluster
• Mesos cluster
41
Spark Web UI
• 預設在跑spark application的時候可以啟動WebUI (port: 4040, 4041,….)
• 可以用來看Job, Stage, Task的進度
• Debug好工具
42
History Server
• WebUI只能看到正在執行的spark application
• 但是可以透過history server已經結束的application的紀錄
43
Configurations
• conf/log4.properties: Log Configuration。可以把預設log level從INFO改
成WARN
• conf/core-site.xml: File System Configuration。如果有用到DFS要在這
邊設定。
• conf/spark-default.xml: Default Application Configuration。例如預設的
master,或是預設要記錄history都要在這邊設定
• conf/spark-env.sh: Default Environment Variable。主要是各個daemon執
行的環境變數。
44
Recap
• Spark是一個分散式的運算引擎
• 由RDD所構成,有Input, Transformations, Action
• 執行一個Action換產生Job,一個Job可能有很多Stages,每個Stages有不一樣
的task數量
• Shuffle的原理
• Spark DataFrame跟Spark SQL
• 如何寫一個Spark Application
45
46

More Related Content

What's hot

Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Improving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVMImproving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVMHolden Karau
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
Apache Spark.
Apache Spark.Apache Spark.
Apache Spark.JananiJ19
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Databricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
給你一個使用 Laravel 的理由
給你一個使用 Laravel 的理由給你一個使用 Laravel 的理由
給你一個使用 Laravel 的理由Shengyou Fan
 
What's Wrong With Object-Oriented Programming?
What's Wrong With Object-Oriented Programming?What's Wrong With Object-Oriented Programming?
What's Wrong With Object-Oriented Programming?Yegor Bugayenko
 
Route 路由控制
Route 路由控制Route 路由控制
Route 路由控制Shengyou Fan
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 

What's hot (20)

Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Scala Intro
Scala IntroScala Intro
Scala Intro
 
Improving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVMImproving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVM
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Apache Spark.
Apache Spark.Apache Spark.
Apache Spark.
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
給你一個使用 Laravel 的理由
給你一個使用 Laravel 的理由給你一個使用 Laravel 的理由
給你一個使用 Laravel 的理由
 
What's Wrong With Object-Oriented Programming?
What's Wrong With Object-Oriented Programming?What's Wrong With Object-Oriented Programming?
What's Wrong With Object-Oriented Programming?
 
Route 路由控制
Route 路由控制Route 路由控制
Route 路由控制
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 

Similar to 給初學者的Spark教學

Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeAngel Boy
 
Spark introduction - In Chinese
Spark introduction - In ChineseSpark introduction - In Chinese
Spark introduction - In Chinesecolorant
 
D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版Jackson Tian
 
D2_Node在淘宝的应用实践
D2_Node在淘宝的应用实践D2_Node在淘宝的应用实践
D2_Node在淘宝的应用实践Jackson Tian
 
Introduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaIntroduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaWang Haihua
 
Node.js在淘宝的应用实践
Node.js在淘宝的应用实践Node.js在淘宝的应用实践
Node.js在淘宝的应用实践taobao.com
 
Zh tw introduction_to_map_reduce
Zh tw introduction_to_map_reduceZh tw introduction_to_map_reduce
Zh tw introduction_to_map_reduceTrendProgContest13
 
Binary exploitation - AIS3
Binary exploitation - AIS3Binary exploitation - AIS3
Binary exploitation - AIS3Angel Boy
 
Study4.TW .NET Conf 2018 - Fp in c#
Study4.TW .NET Conf 2018  - Fp in c#Study4.TW .NET Conf 2018  - Fp in c#
Study4.TW .NET Conf 2018 - Fp in c#Chieh Kai Yang
 
分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocessbabel_qi
 
[students AI workshop] Pytorch
[students AI workshop]  Pytorch[students AI workshop]  Pytorch
[students AI workshop] PytorchTzu-Wei Huang
 
Linux Tracing System 浅析 & eBPF框架开发经验分享
Linux Tracing System 浅析 & eBPF框架开发经验分享Linux Tracing System 浅析 & eBPF框架开发经验分享
Linux Tracing System 浅析 & eBPF框架开发经验分享happyagan
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoopTianwei Liu
 
Hadoop学习总结
Hadoop学习总结Hadoop学习总结
Hadoop学习总结ordinary2012
 
ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1medcl
 
Python 于 webgame 的应用
Python 于 webgame 的应用Python 于 webgame 的应用
Python 于 webgame 的应用勇浩 赖
 
R統計軟體 -安裝與使用
R統計軟體 -安裝與使用R統計軟體 -安裝與使用
R統計軟體 -安裝與使用Person Lin
 
基于Symfony框架下的快速企业级应用开发
基于Symfony框架下的快速企业级应用开发基于Symfony框架下的快速企业级应用开发
基于Symfony框架下的快速企业级应用开发mysqlops
 
開發流程與工具介紹
開發流程與工具介紹開發流程與工具介紹
開發流程與工具介紹Shengyou Fan
 
Ruby Rails 老司機帶飛
Ruby Rails 老司機帶飛Ruby Rails 老司機帶飛
Ruby Rails 老司機帶飛Wen-Tien Chang
 

Similar to 給初學者的Spark教學 (20)

Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledge
 
Spark introduction - In Chinese
Spark introduction - In ChineseSpark introduction - In Chinese
Spark introduction - In Chinese
 
D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版
 
D2_Node在淘宝的应用实践
D2_Node在淘宝的应用实践D2_Node在淘宝的应用实践
D2_Node在淘宝的应用实践
 
Introduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaIntroduction of Spark by Wang Haihua
Introduction of Spark by Wang Haihua
 
Node.js在淘宝的应用实践
Node.js在淘宝的应用实践Node.js在淘宝的应用实践
Node.js在淘宝的应用实践
 
Zh tw introduction_to_map_reduce
Zh tw introduction_to_map_reduceZh tw introduction_to_map_reduce
Zh tw introduction_to_map_reduce
 
Binary exploitation - AIS3
Binary exploitation - AIS3Binary exploitation - AIS3
Binary exploitation - AIS3
 
Study4.TW .NET Conf 2018 - Fp in c#
Study4.TW .NET Conf 2018  - Fp in c#Study4.TW .NET Conf 2018  - Fp in c#
Study4.TW .NET Conf 2018 - Fp in c#
 
分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess
 
[students AI workshop] Pytorch
[students AI workshop]  Pytorch[students AI workshop]  Pytorch
[students AI workshop] Pytorch
 
Linux Tracing System 浅析 & eBPF框架开发经验分享
Linux Tracing System 浅析 & eBPF框架开发经验分享Linux Tracing System 浅析 & eBPF框架开发经验分享
Linux Tracing System 浅析 & eBPF框架开发经验分享
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
 
Hadoop学习总结
Hadoop学习总结Hadoop学习总结
Hadoop学习总结
 
ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1
 
Python 于 webgame 的应用
Python 于 webgame 的应用Python 于 webgame 的应用
Python 于 webgame 的应用
 
R統計軟體 -安裝與使用
R統計軟體 -安裝與使用R統計軟體 -安裝與使用
R統計軟體 -安裝與使用
 
基于Symfony框架下的快速企业级应用开发
基于Symfony框架下的快速企业级应用开发基于Symfony框架下的快速企业级应用开发
基于Symfony框架下的快速企业级应用开发
 
開發流程與工具介紹
開發流程與工具介紹開發流程與工具介紹
開發流程與工具介紹
 
Ruby Rails 老司機帶飛
Ruby Rails 老司機帶飛Ruby Rails 老司機帶飛
Ruby Rails 老司機帶飛
 

More from Chen-en Lu

TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
 
網路廣告的基本架構
網路廣告的基本架構網路廣告的基本架構
網路廣告的基本架構Chen-en Lu
 
From Java Stream to Java DataFrame
From Java Stream to Java DataFrameFrom Java Stream to Java DataFrame
From Java Stream to Java DataFrameChen-en Lu
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Chen-en Lu
 
Gradle起步走: 以CLI Application為例 @ JCConf 2014
Gradle起步走: 以CLI Application為例 @ JCConf 2014Gradle起步走: 以CLI Application為例 @ JCConf 2014
Gradle起步走: 以CLI Application為例 @ JCConf 2014Chen-en Lu
 
Introduction to rtb and retargeting
Introduction to rtb and retargetingIntroduction to rtb and retargeting
Introduction to rtb and retargetingChen-en Lu
 

More from Chen-en Lu (6)

TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
網路廣告的基本架構
網路廣告的基本架構網路廣告的基本架構
網路廣告的基本架構
 
From Java Stream to Java DataFrame
From Java Stream to Java DataFrameFrom Java Stream to Java DataFrame
From Java Stream to Java DataFrame
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
 
Gradle起步走: 以CLI Application為例 @ JCConf 2014
Gradle起步走: 以CLI Application為例 @ JCConf 2014Gradle起步走: 以CLI Application為例 @ JCConf 2014
Gradle起步走: 以CLI Application為例 @ JCConf 2014
 
Introduction to rtb and retargeting
Introduction to rtb and retargetingIntroduction to rtb and retargeting
Introduction to rtb and retargeting
 

給初學者的Spark教學