Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Spark day 2017 - Spark on Kubernetes

1.707 visualizaciones

Publicado el

Spark on Kubernetes

Publicado en: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Spark day 2017 - Spark on Kubernetes

  1. 1. • Senior Software Engineer of SK Telecom • Commercial Products • Big Data Discovery Solution (~’17) • Hadoop DW (~’15) • PaaS(CloudFoundry) (~’13) • Iaas (OpenStack) (~’13) • Mail to : jerryjung@apache.org 2
  2. 2. Kubernetes Spark deployment using Kubernetes Spark on Kubernetes Demo 3
  3. 3. Open Source Automation Framework for deploying, managing, and scaling applications. 4
  4. 4. Kubernetes provides a common API and self-healing framework which automatically handles machine failures and application deployments, logging, and monitoring. 5
  5. 5. 6 https://github.com/kubernetes/kubernetes/blob/master/docs/design/architecture.md
  6. 6. https://thenewstack.io/kubernetes-an-overview/ 7
  7. 7. https://thenewstack.io/kubernetes-an-overview/ 8
  8. 8. 9 https://thenewstack.io/kubernetes-an-overview/
  9. 9. Clusters - set of compute, storage, network resource Pods - colocated group of application containers that share volumes and a networking stack Replication Controllers - ensure a specific number of pods, manage pods, status updates Services - cluster wide service discovery 10
  10. 10. Node #1 192.168.0.2 Pod #1 10.0.0.2 Node #5 192.168.0.6 Volume Network Pod #2 10.0.0.3 Volume Network Pod #8 10.0.0.9 Volume Network 8080 8080 8080 11
  11. 11. Support for Event Stream Processing Fast Data Queries in Real Time Improved Programmer Productivity Fast Batch Processing of Large Data Set 12
  12. 12. Driver Process that contains the SparkContext Executor Process that executes one or more Spark tasks Master Process that manages applications across the cluster Worker Process that manages executors on a particular node 13 http://spark.apache.org/docs/latest/cluster-overview.html
  13. 13. Driver Program SparkContext Cluster Manager Worker Node Executor Worker Node Executor Worker Node Executor 14
  14. 14. http://freecontent.manning.com/running-spark-an-overview-of-sparks-runtime-architecture/ 15 cluster mode client mode
  15. 15. https://www.slideshare.net/grahaindia/new-features-of-kubernetes-v120-beta DaemonSet 16
  16. 16. StatefulSets http://blog.kubernetes.io/2017/01/running-mongodb-on-kubernetes-with-statefulsets.html 17 $(statefulset name)-$(ordinal)
  17. 17. https://github.com/Comcast/kube-yarn 18
  18. 18. Node #1 …. #n HDFS DN HDFS NN HDFS DN…………… YARN NM YARN RM YARN NM…………… zeppelin pod spark submit 19
  19. 19. 2020
  20. 20. Node #1 …. #n HDFS DN HDFS NN HDFS DN…………… YARN NM YARN RM YARN NM…………… zeppelin X pod spark submit 21
  21. 21. 22
  22. 22. 23
  23. 23. 24
  24. 24. https://github.com/kubernetes/kubernetes/issues/34377 https://issues.apache.org/jira/browse/SPARK-18278 25
  25. 25. https://spark-summit.org/2017/events/apache-spark-on-kubernetes/ 26
  26. 26. https://spark-summit.org/2017/events/apache-spark-on-kubernetes/ 27
  27. 27. SPARK-18278 - Spark on Kubernetes Design Proposal.pdf 28
  28. 28. 29
  29. 29. Node Manager # 1…N external shuffle plugin RDD
 (IntermediateFile) RDD
 (IntermediateFile) External Shuffle 30 Executor Long-Running ETL jobs Interactive application or Server Any application with large shuffles Executor
  30. 30. <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property> 1. shuffle plugin add jar 2. yarn-site.xml add plugin 31
  31. 31. spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.dynamicAllocation.minExecutors 50 spark.dynamicAllocation.maxExecutors 100 spark.dynamicAllocation.initialExecutors 50 spark.dynamicAllocation.schedulerBacklogTimeout 5s spark.dynamicAllocation.executorIdleTimeout 60 http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation cf) Mesos - Coarse-Grained Mode 3. edit spark-default.conf 32
  32. 32. https://spark-summit.org/2017/events/apache-spark-on-kubernetes/ 3333 1 2 3
  33. 33. https://www.slideshare.net/cfregly/spark-on-kubernetes-advanced-spark-and-tensorflow- meetup-jan-19-2017-anirudh-ramanthan-from-google-kubernetes-team 34 1
  34. 34. bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi --master k8s://https://{k8s address} --kubernetes-namespace default --conf spark.executor.instances=5 --conf spark.app.name=spark-pi --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.2.0-SNAPSHOT.jar 3535
  35. 35. https://www.slideshare.net/cfregly/spark-on-kubernetes-advanced-spark-and-tensorflow- meetup-jan-19-2017-anirudh-ramanthan-from-google-kubernetes-team 36 2
  36. 36. apiVersion: extensions/v1beta1 kind: DaemonSet metadata: labels: app: spark-shuffle-service spark-version: 2.1.0 name: shuffle spec: template: metadata: labels: app: spark-shuffle-service spark-version: 2.1.0 spec: volumes: - name: temp-volume hostPath: path: '/var/tmp' # change this path according to your cluster configuration. containers: - name: shuffle image: kubespark/spark-shuffle:v2.1.0-kubernetes-0.2.0 37
  37. 37. bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.GroupByTest --master k8s://https://{k8s address} --kubernetes-namespace default --conf spark.app.name=group-by-test --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.kubernetes.shuffle.namespace=default --conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1.0" local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.2.0-SNAPSHOT.jar 10 40000 2 38
  38. 38. https://spark-summit.org/2017/events/apache-spark-on-kubernetes/ 39 3
  39. 39. --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: spark-resource-staging-server spec: replicas: 1 --- apiVersion: v1 kind: Service metadata: name: spark-resource-staging-service spec: type: NodePort selector: resource-staging-server-instance: default ports: - protocol: TCP port: 10000 targetPort: 10000 nodePort: 31000 40
  40. 40. bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi --master k8s://{k8s address} --kubernetes-namespace default --conf spark.executor.instances=5 --conf spark.app.name=spark-pi --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 --conf spark.kubernetes.resourceStagingServer.uri=http://{node ip}:31000 examples/jars/spark-examples_2.11-2.1.0-k8s-0.2.0-SNAPSHOT.jar 41
  41. 41. 4242
  42. 42. 43
  43. 43. https://spark-summit.org/2017/events/hdfs-on-kubernetes-lessons-learned/ 4444
  44. 44. https://spark-summit.org/2017/events/hdfs-on-kubernetes-lessons-learned/ 4545
  45. 45. https://spark-summit.org/2017/events/hdfs-on-kubernetes-lessons-learned/ 4646
  46. 46. https://spark-summit.org/2017/events/hdfs-on-kubernetes-lessons-learned/ 4747 https://github.com/apache-spark-on-k8s/kubernetes-HDFS
  47. 47. 48
  48. 48. 49
  49. 49. 50

×