Kubernetes auto-scaling on relative resource usage

Posted Aug 2, 2020 Updated Jan 20, 2024

By Dorian Beganovic 1 min read

I am working on auto-scaling in Kubernetes and I was seeing something weird in my system and only now I realised that auto-scaling based on relative metrics (like CPU usage) really has to be done on requests instead of limits since the requested resource usage is guaranteed but the area between a limit and the requests usage is not guaranteed and depends on Kubernetes scheduler, if there are overcommits and etc (described well in this article)

So if the CPU usage for the auto-scaler is calculated as:

sum(cpu_usage_per_container/cpu_limits_per_container)

then if you don’t have enough resources on each node to satisfy all the limits, when you scale up the number of instances, you really only scale up the limits without necessarily giving the application a boost due to potential overcommits and scheduling/prioritisation complexities that brings.

In the end I changed the definition of the deployment resource I want to scale so that limits = requests but the most important bit is to calculate CPU usage based on the requested resources and not limits.

  
containers:
  - name: webapp
    image: ...
    imagePullPolicy: ...
    resources:
      # ensure limits are the same as requests !!
      # the scheduler doesn't guarantee that the limit resources will be honored
      limits:
        memory: "200Mi"
        cpu: "200m"
      requests:
        memory: "200Mi"
        cpu: "200m"

load-balancers

kubernetes

This post is licensed under CC BY 4.0 by the author.

Trending Tags