Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Deep-Dive-with-Cloud-Monitoring-with-Amazon-EKS-and-Prometheus

105 visualizaciones

Publicado el

在上了雲端後,監控一直是非常重要的議題, 而 Prometheus 作為 CNCF 的畢業專案,其成熟度及受歡迎的程度可見一班,使用 Prometheus 為主要監控元件,就成為了開源軟體的主要首選。本分享會闡述如何在 AWS EKS 上面安裝 Prometheus 以及使用 Prometheus 監控 AWS 的元件,包含 AWS EKS 及其 on demand EC2 上面的 application。

  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Sé el primero en recomendar esto

Deep-Dive-with-Cloud-Monitoring-with-Amazon-EKS-and-Prometheus

  1. 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Deep Dive in Cloud Monitoring with Amazon EKS and Prometheus Pahud Hsieh Specialist SA, Serverless Amazon Web Services Kakashi Liu Infra Lead UmboCV
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EKS in the Past Year ● Started in us-east-1 and us-west-2 ● Released VPC CNI 1.0 ● HIPPA Support ● Released AMI build scripts on Github ● Released VPC CNI 1.1 ● Enabled GPU Support ● Support API Aggregation ● Support HPA ● Support eu-west-1 ● CLI support for writing the kubeconfig ● Support for Admission Controllers
  3. 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EKS in the Past Year ● Released VPC CNI 1.2 ● Allow for additional VPC CIDR ranges ● Support for us-east-2 ● Official support for ALB Ingress ● Container Marketplace ● CloudMap Integration ● Support for AWS App Mesh ● Support for eu-central1, ap-southeast-1, ap-southeast-2, ap- northeast-1 ● Support for ap-northeast-2 ● Added the SLA
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Immediately after that ● Achieved ISO and PCI compliance ● Support for ap-south-1, eu-west-2, eu-west-3 ● Released VPC CNI 1.3 ● Added a new qiuckstart ● Allowed private API Endpoints ● Launched an App Mesh controller at GA ● Public Preview for Windows nodes ● Deep Learning container launch ● Added 1.2 with a new cluster update API ● Released CSI Drivers for FSx and EFS ● Control plane logs ● Public Preview of A1 instances ● Released a Machine Learning Benchmark tool
  5. 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T CloudWatch Container Insights(preview)
  6. 6. Dimensions for Kubernetes • Clusters • Nodes • Services • Namespaces • Pods
  7. 7. Pod Metrics • pod_cpu_reserved_capacity • pod_cpu_utilization • pod_cpu_utilization_over_pod_li mit • pod_memory_reserved_capacity • pod_memory_utilization • pod_memory_utilization_over_p od_limit • pod_network_rx_bytes • pod_network_tx_bytes
  8. 8. Other Metrics • cluster_failed_node_count • cluster_node_count • namespace_number_of_runni ng_pods • node_cpu_limit • node_cpu_reserved_capacity • node_cpu_usage_total • node_cpu_utilization • node_filesystem_utilization • node_memory_limit • node_memory_reserved_capa city • node_memory_utilization • node_memory_working_set • node_network_total_bytes • node_number_of_running_containers • node_number_of_running_pods • service_number_of_running_pods Reference - https://amzn.to/2HFtHDt
  9. 9. Threshold and Alarm Actions
  10. 10. Amazon EKS and Prometheus Prometheus Why Prometheus? Community Number of integrations Ease of use Why not Prometheus? Manage it yourself Complexity in large setups Possibility: Hybrid Approach Use Prometheus to collect metrics that are exposed on /metrics endpoints Send a subset of critical metrics to Amazon CloudWatch or a third party solution.
  11. 11. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Hello! I am kakashi - Infra Lead @Umbo CV - Co-organizer @Golang Taipei Gathering
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Traditional Solutions Umbo Light
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda Why monitoring Umbo CV Monitoring pipeline Prometheus: Why and What Prometheus with EKS Use cases
  16. 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why monitoring
  17. 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why monitoring Alerting Long-term trends
  18. 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Umbo CV Monitoring pipeline
  19. 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring types Infrastructure Application
  20. 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Application monitoring EC2 Metrics Store container container exporter exporter exporter /metrics EC2 /metrics Collect Alert Expose Metrics
  21. 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus: Why and What ● Graduates Within CNCF. ● Can handle multi-dimensional metrics. ● Performance: can ingest millions of samples per second. ● Powerful query language: PromQL. ● Built-in alerting tool and service discovery mechanism.
  22. 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus metrics EC2 /metrics EC2 /metrics User request http_requests_total{code=200, path="/api/user"} 10 metric_name labels value
  23. 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T PromQL example Total requests / second sum(rate(http_requests_total[5m])) Total 5xx requests / second sum(rate(http_requests_total{code=~"5.*"}[5 m])) Current percentage of errors across all instances sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m]))
  24. 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting rule alert: Percentage_Of_Errors_Is_High expr: sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m])) > 5 for: 5m labels: severity: critical
  25. 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus with EKS
  26. 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus ❤ EKS ● Monitoring system is critical. ● Running Prometheus on Kubernetes can easily achieve HA. ● Prometheus operator makes it ever easier ○ Automated management and upgrades of Prometheus. ○ Native k8s configuration.
  27. 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Install Prometheus on EKS by helm 1. Install Promethues Operator chart 2. Verify $ helm install --name prom --namespace monitoring stable/prometheus-operator $ kubectl --namespace monitoring get pods NAME READY STATUS RESTARTS AGE alertmanager-prom-op-alertmanager-0 2/2 Running 0 1m prometheus-prom-op-prometheus-0 3/3 Running 1 1m prom-op-grafana-5c59ddfb9d-zqfqt 2/2 Running 0 2m prom-op-kube-state-metrics-76786cc9b4-8q4bj 1/1 Running 0 2m prom-op-prometheus-node-exporter-6jclc 1/1 Running 0 2m prom-op-prometheus-node-exporter-bxr49 1/1 Running 0 2m prom-op-prometheus-operato-operator-6cbf5d5cfd-z6fz4 1/1 Running 0 2m
  28. 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus Operator CRD ● Prometheus & AlertManager ○ Define Prometheus and AlertManager deployment. ● ServiceMonitor ○ Used to specify how metric of k8s services can be scraped. ● PrometheusRule ○ Can be loaded by a Prometheus instance containing Prometheus alerting and recording rules.
  29. 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T EKS cluster monitoring
  30. 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T EKS application monitoring through ServiceMonitor apiVersion: monitoring.coreos.com/v1 kind: Servicemonitor metadata: name: api-servicemonitor spec: selector: matchLabels: app: api-server Labels: app: api-server Labels: app: api-server2
  31. 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting by PrometheusRule apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule spec: groups: - name: api.rules rules: - alert: Percentage_Of_Errors_Is_High expr: sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m])) > 5 for: 5m labels: severity: critical
  32. 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Dashboard for EKS cluster
  33. 33. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  34. 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring camera detection pipeline Media Serve r CV Detectio n API Serve r
  35. 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring camera detection pipeline Media Serve r CV Detectio n API Serve r # of frames # cv requests # of events
  36. 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Service discovery Media Serve r CV Detectio n API Serve r Scraping through EC2 service discovery
  37. 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Service discovery Media Server CV Detection API Server Scraping global: scrape_interval: 1s evaluation_interval: 1s scrape_configs: - job_name: 'node' ec2_sd_configs: - region: eu-east-1 access_key: <ACCESS_KEY_HERE> secret_key: <SECRET_KEY_HERE> port: 9273
  38. 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Application metrics Media Serve r CV Detectio n API Serve r ms_frames_total{env="production", service="ms", cameraId="ID-123456"} 1000 # of frames # of cv requests cvreqest_total{env="production", service="cv", cameraId="ID-123456"} 300 # of events event_total{env="production", service="cv", cameraId="ID-123456"} 5 # of frames # of cv request # of events
  39. 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Dashboard
  40. 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule spec: groups: - name: camera.rules rules: - alert: FpsLow annotations: message: "{{ $labels.cameraid }} fps is lower than 2fps" expr: sum(rate(ms_frames_total{env="production", cameraId=".+"}[10m])) < 2 for: 30mins labels: severity: critical
  41. 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×