"Google Cloud and Microservices - a practical approach on GCP" - Salmaan Rashid

© 2018 Google LLC. All rights reserved.
How i learned to stop worrying and learned to love the mesh
Microservices on GCP
The information, scoping, and pricing data in this presentation is for evaluation/discussion purposes only and is non-binding. For reference purposes,
Google's standard terms and conditions for professional services are located at: https://enterprise.google.com/terms/professional-services.html.
https://github.com/salrashid123
https://medium.com/@salmaan.rashid/

Topic
s
Microservices on GCP1
Your Utility Belt2
Service Mesh3
Demo4

Microserves on GCP
Motivation to use a service mesh

● Rapid release cycle
● "Data ownership"
● Single Responsibility
● Discovery, bootstrapping
● Rate Control
● Security
○ Identity, connectivity
● Observability
● Independent/decoupled
...Microservices

chaos, connectivity, and clarity

● Cloud Run
○ Managed; 0->N->0
○ Automatic Auth, IAM
● Cloud Functions
○ Automatic Auth, IAM
● App Engine (original flavor)
○ Automatic Auth
● GKE
○ well..GKE is managed
○ Your app needs some assembly
choices, choices
● GKE+Istio
○ Helps with management
● GKE+Istio+Knative
○ Helps even more (too alpha)
● Cloud Services Platform
○ All inclusive vacation
● Provided Services
○ Cloud Scheduler (cron)
○ Cloud Tasks
○ Pub/Sub

● Cloud Logging
○ Structured (jsonPayload, protoPayload)
○ Unstructured (textPayload)
● Container Logs
○ just write to stdout/stderr 😊
○ Write via Logging API 😞*
○ Log grouped by resource type, source
○ gke_cluster, pod, container
● Request->Log correlation
○ "parent->child"
● Logs to Metrics
○ User defined alertable metric derived
from logs
log.Printf("Found ENV lookup backend ip: %v port: %vn",
backendHost, backendPort)
Logging

● What can you monitor?
● Application Monitoring
○ Your app metrics, request metrics
● System Monitoring:
○ GKE (cluster, node), Loadbalancer, GCE (VM),
GAE
● Built in Metric by type: eg: a Cloud Run requests
○ "type": "run.googleapis.com/request_count",
○ Metric shows each request
○ How do you break down requests by its
response_code? Use its Metric Labels to filter
● Labels
○ Filter subset (eg, "response code=500, for
route=66")
Monitoring
{
"name": "projects//metricDescriptors/run.googleapis.com/request_count",
"labels": [
{
"key": "response_code",
"description": "Response code of a request."
},
{
"key": "response_code_class",
"description": "Response code class of a request."
},
{
"key": "route",
"description": "Route name that forwards a request."
}
],
"metricKind": "DELTA",
"valueType": "INT64",
"unit": "1",
"description": "Number of requests reaching the revision.",
"displayName": "Request Count",
"type": "run.googleapis.com/request_count",
}

● What do you want to monitor?
● Service Level (Objectives | Indicator| Agreement)
○ SLI: measure metrics for user happiness :)
○ SLO: SLI + target goal over window
○ ↑ (SLO) →more﹩to operate
○ SLA: lawyer stuff
○ SRE Fundamentals
● Setup a Dashboard
● Setup Alerts based on Dashboard/SL*
○ PagerDuty,Email, Phone, Slack, etc
● Incident Dashboard to ACK/Resolve/Track
● UptimeChecks:
○ Send HTTP requests to your external IP
○ Check latency, response_code from
datacenters around the world!
Monitoring + Alerts ● Creating Dashboard with Istio+Stackdriver
Create a monitoring dashboard
1. Head over to Stackdriver Monitoring and create a Stackdriver Workspace.
2. Navigate to Dashboards > Create Dashboard in the left sidebar.
3. In the new Dashboard, click Add Chart and the following metric:
● Metric: Server Response Latencies
(istio.io/service/server/response_latencies)
● Group By: destination_workload_name
● Aligner: 50th percentile
● Reducer: mean
● Alignment Period: 1 minute
● Type: Line

● Trace a HTTP/gRPC request end-to-end*
○ User → yourService
○ yourService → yourOtherService
○ yourService → GCP APIs
● Trace _WITHIN_ a GCP request:
○ What went on within the GCP API request
○ What query did my spanner system invoke and
how long did it take?
● Make it generic!
○ OpenCensus: run it anywhere, add you own
tracers (sample helloworld in reference section!)
Tracing

● Need to use Logging API to traces and logs
together :(
● Trick is to embed the parent traceID as the
"trace" field.
ctx := span.SpanContext()
tr := ctx.TraceID.String()
lg := client.Logger("spannerlab")
trace := fmt.Sprintf("projects/%s/traces/%s", projectId, tr)
lg.Log(logging.Entry{
Severity: severity,
Payload: fmt.Sprintf(format, v...),
Trace: trace,
SpanID:
ctx.SpanID.String(),
})
Tracing+Logging

● Live Heap, CPU, Thread info
● Collects metrics and emits to GCP
● Memory issues, CPU, etc
● Stackdriver CPU statistics and Profiler: identify
over/under provisioned systems.
● Profile and iterate code; use traffic splitting to A/B test!
Profiling

● Live Debug of your running app
● Does NOT _stop_ your application at a breakpoint (just
not how it works!)
● Observe parameters at any breakpoint given a
reference to the source code (on github, Cloud Repo,
bitbucket).
● Insert log parameters for propagation.
● Need to start application as instrumented; do not
enable by default! (only canary/test with small% traffic)
● Observe parameters at any breakpoint given a
reference to the source code (on github, Cloud Repo,
bitbucket).
● Java, Python :) .... golang :(
Debug

Service mesh overview
Motivation to use a service mesh

● Maintaining resilience, discovery, and routing logic in code for independent services written in different
languages becomes incredibly complex and expensive to operate
● The role of a service mesh is to overlay your services with a management framework
Microservices create API management challenges

routing/traffic shaping
advanced load balancing
service discovery
circuit breaking
timeouts/retries
rate limiting
metrics/logging/tracing
fault injection
A service mesh differs from an
edge/API service in that a service
mesh is an infrastructure built for
service-to-service communication
and resiliency with zero business
logic
Service mesh features

How to manage all this?
Which version?
Version 2.0
Which instance?
Service to Service Communication
Service
(Caller)
Service
(Provider)
Quota Exhausted?
Authorized?
Wait for response?
Retry on Failure?
Secure?
Who’s calling?
Version 1.0
Without changing the service implementation!
Are my services
healthy?

Service Management
Service
(Caller)
Service
(Provider)
Proxy Proxy
Lookup
Routing
Timeout
Circuit Breaker
Policy Enforcement
TLS Termination
ThrottlingIn Out In Out
Service proxies intercept outbound and inbound service calls transparent to the service implementation.
The outbound proxy manages routing and error handling strategies, such as retries and circuit breakers.
The inbound proxy validates the service call based on credentials, available quota etc.
Management & Configuration

A service mesh architecture is comprised
of two parts:
Control plane — configures the service
proxies and manages the mesh
Data plane — acts as a service proxy and
communicates service behavior back to the
control plane
Service
container
Service proxy
container
Service
container
Service proxy
container
Kubernetes cluster
Pod Pod
Pods/Containers
Control plane
Data plane
Service mesh
conceptual overview

Istio — Overview *click*

Istio — Overview *2x click*

Istio — Overview *3x click*

● Stackdriver — Metrics - Prometheus
● Stackdriver — Logging - Mixer, Fluentd
● Stackdriver — Tracing - Jaeger
● Stackdriver — Debugging
● Stackdriver — Topology - Kiali
* or..bring your own
Monitor Istio
with GCP*
Service Proxy
Container
Bookinfo
Service
Container
Service
Container
Bookinfo
Service
Container
Kubernetes Engine
Pod Pod
istio-mixer
Istio Control Plane
Data Plane
Prometheus/Grafana
Metrics
Zipkin
Traces
Stackdriver
Telemetry reports

HelloWorld: https://35.224.11.70/
● Simple, frontend->backend
● No Cloud Service Mesh
● Progressive traffic splitting
● Fault Injection
● Tracing
● Profiling
● Logging
● Monitoring
● Turn to page 27
HipsterShop: http://35.222.251.20/
● Complex, frontend>?->?->?
● Cloud Services Mesh Monitoring
● Cloud Services Mesh Topology
● Tracing
● Monitoring
● Logging
● Turn to page 28
choose your own adventure

HelloWorld: https://35.224.11.70/
● fe: frontend (v1|v2)
● be: backend (v1|v2)
○ v2 has built in 1000ms latency
● Routing/Splitting
○ user-> fe(v1)
○ user->fe(v1)->be(v1)
○ user->fe(v1|v2)->be(v1)
○ user->fe(v1|v2)->be(v1|v2)
● Logging
○ JSON Struct logging
● Monitoring
○ Response Rates
● Tracing: End-to-end Tracing
● Error: Custom Errors
● Profiler: CPU, HEAP
● Debugger: no-golang :(
HelloWorld!

HipsterShop: http://35.222.251.20/
● Sorry, out of stock
Hipstershop

● Using Stackdriver* with golang on istio.
● "Hipstershop"
● Google Cloud Trace context propagation and metrics graphs with
Grafana+Prometheus and Stackdriver
● SRE Fundamentals
Stuff for reference

"Google Cloud and Microservices - a practical approach on GCP" - Salmaan Rashid

Recomendados

Recomendados

Más contenido relacionado

Más de Grid Dynamics

Más de Grid Dynamics (20)

Último

Último (20)

"Google Cloud and Microservices - a practical approach on GCP" - Salmaan Rashid

Notas del editor