As businesses start increasingly relying on Kubernetes, the need to scale services based on the business demand becomes more important. While the traditional methods like scaling based on the CPU and memory are important, expressing different business metrics in CPU and memory isn’t always straightforward. In this light, auto-scaling based on custom metrics in Kubernetes is going to be immensely helpful.
With the support for custom metrics, services can be scaled dynamically based on the request count or the error count of a particular service. This helps services respond smoothly to sudden bursts and traffic variations ensuring business continuity, also allowing resources allocated optimally among different services.
With its new release, the WSO2 Microgateway supports scaling based on custom metrics, enabling enterprises to scale the runtimes based on request count, error rate, requests in the pipeline, and more.
This slide deck will cover:
- The importance of selecting business-related metrics
- Custom metric support in WSO2 Microgateway
- A demo on auto-scaling WSO2 Microgateway based on request count
On-demand webinar: https://wso2.com/library/webinars/adaptive-scaling-of-microgateways-on-kubernetes/
5. ● The ability to scale dynamically as the traffic varies
● As the demand rises, more instances are provisioned
● Made possible by
○ The availability of abundant computing power
○ On-demand provisioning
● Before IaaS was available
○ Instances had to be provisioned beforehand
○ Resources were sitting idle even when there was no traffic.
What is Adaptive Scaling?
5
6. ● On-demand provisioning through IaaS
○ Allowed provisioning infrastructure without long delays
○ Manually allocating infrastructure
● Autoscaling took this a step further
○ Instances were provisioned automatically
○ Could respond to sudden bursts
○ Cluster would automatically scale up and down according to the traffic
● Many IaaS providers provide a way to autoscale
○ AWS autoscaling groups
○ GCP, Azure and many others have similar capabilities
● Regardless of the framework autoscaling generally offers some advantages
What is Adaptive Scaling
6
7. ● Autoscaling provisions infrastructure dynamically
○ Saves the application from sudden spikes without losing traffic
○ Gathers traffic without provisioning resources upfront
Benefits of Adaptive Scaling
7
8. ● Makes services more
responsive
○ More and more services are
embracing microservices
○ Interaction between services
are complex
○ Ensuring responsiveness of
dependant services is critical
Benefits of Adaptive Scaling
8
9. ● Allows using resources optimally
○ Allows many services to share a single resource pool
○ Scaling only the services needed
Benefits of Adaptive Scaling
9
11. ● Cluster autoscaling
⦿ Adds more nodes to a K8 cluster
⦿ Usually done with an extension to underlying IaaS provider
● Pod auto scaling
⦿ Scales pods in a fixed resource pool
⦿ Two variations
● Vertical Pod Autoscaling
⦿ Increases the computing power of a single pod
● Horizontal Pod Autoscaling
⦿ Increases the number of pods
● The new feature works with Horizontal Pod Autoscaling
Scaling Options in Kubernetes
11
12. ● Handled by Horizontal Pod
Autoscaler
⦿ Pulls metrics from different APIs
⦿ Pulling happens at a predefined
frequency
⦿ Pod Autoscaler adjusts the pods
● A target value can be specified
● Autoscaler ensures the metric is kept
at the target level.
How Horizontal Pod Autoscaling Works
12
13. ● HPA configuration specifies the
values to scale the cluster on
● A target value can be specified
● Autoscaler ensures the metric is
maintained at the target level
How Horizontal Pod Autoscaling Works
13
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
14. ● Autoscaling in K8s is based on metrics
⦿ Resource metrics
⦾ CPU and Memory
⦾ Pulled through metrics-server
⦿ Custom metrics
⦾ Can be defined per Pod or for
other Objects
⦾ Request rate, error rate, etc.
⦿ External metrics
⦾ Metric set from an external
cluster
Providing Metrics for Autoscaling
14
16. ● With this release microgateway
publishes a set of metrics
⦿ Metrics like error count, total
requests, response delays are
gathered
⦿ Metrics are periodically pulled by
the Prometheus server
⦿ HPA consumes these through
Prometheus Adaptor
● Supports scaling microgateway with
custom metrics
What is Supported in This release
16
17. ● Two types of metrics are published through the microgateway
⦿ Metrics indicating about the microgateway
⦿ Metrics indicating about backend services
● Gateway level metrics can be used to scale the microgateway and service level
metrics to scale backend
● Going through different deployment models help understand this better
Scaling at Different Levels
17
Gateway
Per_Request_Duration_mean
Request_Duration_Total_mean
http_inprogress_requests_value
Http_requests_total_value
http_response_time_seconds
Service
ballerina_http_Caller_1XX_requests_total_value
ballerina_http_Caller_inprogress_requests_value
ballerina_http_Caller_requests_total_value
ballerina_http_Caller_response_time_seconds
ballerina_http_Caller_response_time_seconds_max
18. ● Shared gateway
⦿ Gateway as the first hop after Ingress Gateway
⦿ Single gateway for all microservices
Shared Gateway Mode - Scaling the Microgateway
18
19. ● Sidecar
⦿ Gateway sits in the same pod as the microservice
● Metrics pushed through gateway can be used to scale the pod
● Underlying service doesn’t need to be instrumented
Sidecar Mode - Scaling the Microservice
19