Más contenido relacionado La actualidad más candente (20) Similar a Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 - (20) Más de Yoshiyasu SAEKI (11) Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -3. • 2011/04
• 2015/09
•
• Druid (KDP, 2015)
• RDB NoSQL ( , 2016; : HBase )
• ESP8266 Wi-Fi IoT (KDP, 2016)
•
• (WebDB Forum 2014)
• Spark Streaming (Spark Meetup December 2015)
• Kafka AWS Kinesis (Apache Kafka Meetup Japan #1; 2016)
• (FutureOfData; 2016)
• Queryable State for Kafka Streams (Apache Kafka Meetup Japan #2; 2016)
3
8. Apache Spark
Lost executor X on xxxx: remote
Akka client disassociated
Container marked as failed:
container_xxxx on host: xxxx. Exit status: 1
Container killed by YARN for
exceeding memory limits
shutting down JVM since 'akka.jvm-exit-on-fatal-error'
is enabled for ActorSystem[Remote]
13. $ spark-submit
--MEMORY_OPTIONS1
--MEMORY_OPTIONS2
--MEMORY_OPTIONS3
--conf ADDITIONAL_OPTIONS1
--conf ADDITIONAL_OPTIONS2
--class jp.co.recruit.app.Main
spark-project-1.0-SNAPSHOT.jar
Apache Spark
14. Apache Spark : Heap
On-heap
--executor-memory XXG or
--conf spark.executor.memory=XXG
--conf spark.memory.offHeap.size=XXX
Disk Off-heap
15. Apache Spark : Executor
Disk
On-heap Off-heap
On-heap Off-heap
Executor
Executor
OS Other Apps
16. Apache Spark : Container
Disk
On-heap Off-heap
On-heap Off-heap
Executor
Executor
OS Other Apps
Mesos / YARN Container
Overhead
17. Apache Spark : Overhead
On-heap
--executor-memory XXG or
--conf spark.executor.memory=XXG
Disk Off-heap Overhead
--conf spark.mesos.executor.memoryOverhead
--conf spark.yarn.executor.memoryOverhead
=max(XXG/10 or 384MB)
23. Apache Spark : Project Tungsten
Project Tungsten
Disk Off-heapOn-heap
25. Apache Spark : User Memory
Off-heap300MBDisk
--conf spark.memory.fraction=0.6
Memory Fraction
User
Memory
•
•
• Memory Fraction
26. Apache Spark : Execution Storage
Off-heap300MBDisk
User
Memory
--conf spark.memory.storageFraction=0.5
Storage
Fraction
Execution
Fraction
27. Apache Spark : Execution Storage
Off-heap300MBDisk
User
Memory
Storage
Fraction
Execution
Fraction
•
• Broadcast Accumulator
• Shuffle Join Sort Aggregate
•
28. Apache Spark : Unified Memory
Off-heap300MBDisk
User
Memory
Storage
Fraction
Execution
Fraction
39. JVM : Garbage Collection
-XX:+UseConcMarkSweepGC
// GC
-XX:+UseParNewGC
// GC
-XX:+CMSParallelRemarkEnabled
// GC Remark
-XX:+DisableExplicitGC
// GC(System.gc())
40. JVM : Garbage Collection
-XX:+HeapDumpOnOutOfMemoryError
// OoME
-XX:+PrintGCDetails
// GC
-XX:+PrintGCDateStamps
//
-XX:+UseGCLogFileRotation
// GC
41. JVM
$ spark-submit
--executor-memory 8GB
--num-executors 20
--executor-cores 2
--conf
"spark.executor.extraJavaOptions=..."
--spark.memory.offHeap.enabled=true
--spark.memory.offHeap.size=1073741824
--class jp.co.recruit.app.Main
spark-project-1.0-SNAPSHOT.jar
!
42. How we can help ourselves
not to stop our applications
46. RDD 1
• SizeEstimator
$ spark-shell
> import org.apache.spark.util.SizeEstimator
> SizeEstimator.estimate("1234")
res0: Long = 48
> val rdd = sc.makeRDD(
(1 to 100000).map(e => e.toString).toSeq)
> SizeEstimator.estimate(rdd)
res2: Long = 7246792
47. RDD 2
• Web UI Storage panel
> SizeEstimator.estimate(rdd)
res2: Long = 7246792
> rdd.persist(StorageLevel.MEMORY_ONLY)
48. RDD
> orders = sc.textFile("lineorder.csv")
orders: org.apache.spark.rdd.RDD[String] = ...
> result = orders.map(...)
result: org.apache.spark.rdd.RDD[String] = ...
> orders.persist(StorageLevel.MEMORY_ONLY)
> result.persist(StorageLevel.MEMORY_AND_DISK)
51. 16/12/09 14:34:06 WARN MemoryStore: Not enough
space to cache rdd_1_39 in memory! (computed
44.4 MB so far)
16/12/09 14:34:06 WARN BlockManager: Block
rdd_1_39 could not be removed as it was not
found on disk or in memory
16/12/09 14:34:06 WARN BlockManager: Putting
block rdd_1_39 failed
59. : Executor
• [A] Storage Fraction = RDD
• [B] Execution Fraction = A
• [C] On-heap = (A + B) / 0.6 + 300MB // 0.6 User Memory
• [D] Off-heap = RDD
• [E] Overhead = max(C * 0.1, 384MB) //
• [F] 1 Container (Executor)
• [G] OS
• [H]
(C + D + E) * F + G < H
60. : Driver ?
Driver Memory Overhead
--conf spark.mesos.
driver.memoryOverhead
--conf spark.yarn.
driver.memoryOverhead
--driver-memory
--conf spark.driver.memory
--conf spark.driver.maxResultSize=1G
Action (collect, reduce, take )
!
Driver