Skip to content

Automatic Scaling

As part of the nais platform, the Application resource supports two types of scaling:

  • CPU based scaling
  • Scaling based on Kafka Consumer Lag

CPU based scaling

If you specify a minimum and maximum number of replicas in your Application resource, the default behavior is to scale up when your CPU usage exceeds 50% of your requested usage. To change the threshold, set replicas.scalingStrategy.cpu.thresholdPercentage to a different value.

Scaling based on Kafka Consumer Lag

If you want to use Kafka Consumer Lag as a scaling metric, you have to specify the following fields in your Application resource:

replicas:
  min: <minimum-number-of-replicas>
  max: <maximum-number-of-replicas>
  scalingStrategy:
    kafka:
      topic: <topic-name>
      consumerGroup: <consumer-group-name>
      threshold: <threshold>

The threshold is the maximum offset lag before scaling up. Keep in mind that for Kafka, the maximum number of replicas is limited by the number of partitions in the topic. If you have a topic with 10 partitions, you can only scale up to 10 replicas.

Combining scaling strategies

If you define both a CPU threshold and a Kafka Consumer Lag threshold, the application will scale up if either of the thresholds are exceeded.

Custom scaling

Warning

In order to use custom scaling policies and rules, make sure you disable default NAIS HPA by setting the .spec.replicas.disableAutoScaling field to true.

Scaling based on custom metrics

A custom metric is based on a direct value or a rate over time. To make the custom metric available for scaling, you have to label it with either hpa="value" or hpa="rate"

Example metric output:

# HELP active_sessions number of active sessions
# TYPE active_sessions gauge
active_sessions{hpa="value"} 100
# HELP documents_received how many documents have we received
# TYPE documents_received counter
documents_received{hpa="rate"} 69

Once the metric is labelled correctly, it can be used in a HorizontalPodAutoscaler Kubernetes object. Refer to the Kubernetes documentation for details.

In the example below, the amount of replicas will be increased once the average of active_sessions exceeds 150 across all currently running pods.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-example
  namespace: team-namespace
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deployment-name
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: active_sessions
      target:
        type: AverageValue
        averageValue: "150"

Scaling based on external metrics

External metrics are provided by the platform for services external to the application, i.e. Kafka lag. If you want your application to scale based on external metrics, replace the metrics section of the previous example with the one below.

This example will scale up your application if the maximum lag of your consumer group exceeds 120 seconds.

 metrics:
  - type: External
    external:
      metric:
        name: kafka_consumergroup_group_max_lag_seconds
        selector:
          matchLabels:
            topic: your-topic
            group: your-consumer-group
      target:
        type: AverageValue
        averageValue: "120"

Available metrics

Use this command to see a list of available external metrics:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .