If you have problems getting your pods running you should check out the official documentation from Kubernetes:
PS: Applications in the context above is not the NAIS applications.
Debugging a NAIS applications resource is done with kubectl describe application $app_name
.
Debugging Memory Leaks¶
If you experience memory leaks in Java processes you can get heap dumps either automatically on OOM or on-demand.
Automatically on OOM¶
Set JAVA_OPTS
to -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
The /tmp
volume is maintained through restarts, so if your app is restarting because of OOM, the heap dumps can be retrieved from there.
Manually on-demand¶
You can use jmap
to create a heap dump of a running Java process.
Find a pod and exec jmap
in it (assuming PID 1 is the Java process):
Getting the heap dump¶
You can use kubectl cp
to get the files from the pod to your local computer:
You can inspect the heap dumps with tools like JProfiler, VisualVM or IntelliJ.
kubectl
debug¶
kubectl: version
This feature requires kubectl
version 1.28 or later.
Nais requires the flag --profile=restricted when using kubectl debug
and the flag is only supported in kubectl
1.28+.
At this time, this is the only way to run the ephemeral containers as non-root and without any capabilities.
Ephemeral container is a temporary container that can be added to an existing pod to perform a specific task, such as
debugging, monitoring, or troubleshooting. Unlike regular containers, ephemeral containers do not persist beyond their
immediate use, and they are not part of the pod’s desired state. Therefore, ephemeral containers can be used for
debugging and monitoring when a pod is live in a cluster, without changing the pod spec or restarting its pod. Ephemeral
container is useful when kubectl exec
is insufficient because a container
has crashed or a container image doesn't include debugging utilities.
To run an ephemeral container in a pod, use the kubectl debug
command.
kubectl: --image
The specified --image cant have more capabilities than the pod it is attached to and must be able to run as non-root.
The following example starts a shell in a new ephemeral container named debugger-id
in the my-pod-name
pod:
kubectl debug -it my-pod-name --image="cgr.dev/chainguard/busybox:latest" --profile=restricted -- sh
Once the ephemeral container is created, you will be presented with a shell prompt. Then run some diagnostic commands and inspect the container’s environment, or modify the container’s configuration to debug the issue.
kubectl debug -it my-pod-name --image="cgr.dev/chainguard/busybox:latest" --profile=restricted -- sh
Defaulting debug container name to debugger-lrmqq.
If you don't see a command prompt, try pressing enter.
~ $
Attach kubectl
¶
kubectl attach
kubectl attach my-pod-name -c debugger-dx4dr -i -t
The -i flag causes kubectl debug to attach to the container by default.
You can prevent this by specifying --attach=false.
If your session becomes disconnected you can reattach using kubectl attach
Don't forget to clean up after yourself:
You can read more about
the kubectl debug
command.
Exec Logs and Describe kubectl
¶
With ephemeral containers, you can temporarily add another container to a running pod without having to create a new pod.
kubectl debug
can be used as an additional debug tool
for running pods alongside kubectl describe
and to some extent kubectl logs
and kubectl exec
Unlike exec
and logs
, debug
can also be used when your pod is in crashLoopBackOff or for some other reason dont start.
Security hardened images¶
The support for restricted ephemeral debug containers also gives your team ability to take advantage of distroless and other security hardened images such as distroless.
This is a great way to reduce the attack surface of your application and improve security.
FAQ¶
I get an HTTP 503 Service Unavailable error when visiting the ingress for my application, why?¶
Answer
This indicates that your application is not ready to serve traffic. This is usually due to one of the following:
- The application is not deployed to the cluster
- The application is not up and running. This can be caused by a problem with the application itself, for example:
- The application doesn't respond to any configured health checks
- The application only has a single pod or replica, and that pod is not running
- The application is configured incorrectly (e.g. has missing required dependencies, has the wrong image, etc.)
- The application attempts to write files to the filesystem, which is mostly read-only by default
See also troubleshooting for deployments.
My application gets an HTTP 504 Gateway Timeout error when attempting to communicate with another application, why?¶
Answer
If you're using service discovery, ensure that the access policies for both applications are correctly set up.
Otherwise, ensure that the other application is running and responding to requests in a timely manner (see also ingress customization for timeout configuration).
Created: 2022-01-10