Troubleshooting user queries
Openshift admin role involves most of the time spent on user queries and addressing the alerts. Often the users will ping in slack channels and report issues with the errors or we get the alerts through slack channel. The respective resources in that shift is responsible to address those.
We have several categories under this topic, I will just talk about few .
- POD Issues
- Logging
POD issues :
To start with POD issues and what we do to address :
1. Pod Fails to Start:
Check Logs:
–> oc logs <pod_name> to identify errors or exceptions.
–> Resource Constraints: Ensure the pod’s resource requests and limits are appropriate
Image Availability: Check if correct image name is provided as in registry
Security Context: Check for SCC ( security context constraint )
2. Pod crashloop backoff :
Crash Loops: Investigate if the application inside the pod is crashing and restarting repeatedly. oc logs <pod-name> : Check logs for crash reports
Resource Constraints: Insufficient resources might cause the pod to be terminated and restarted repeatedly. Openshift tries to restart the POD to see if the POD can get required memory. If its not getting required resources after few restarts, the POD will start crashing
3. Networking Issues:
Service Connectivity: Check for NetworkPolicy ( allow/deny the network connectivity between POD-POD, POD-service, External access )
4. Volume Mounting Problems:
Mount failures will result in POD failures
Permissions: Check file and directory permissions if the pod has issues writing to mounted volumes
Infra changes: At times, infra team making changes on FS, restart the OS with improper FS access mode can also result in above issues
5. OOM Killed Out of Memory :
Check for resources being utilized, available.
6. Image Pull Errors:
This happens if the private registry authentication is incorrect.
Image Availability: Verify the image repository, tag, and digest. Images might be removed or unavailable.
7. Node Issues:
Node failures can also affect the POD status . Check with infra team on the heath of Nodes
Logging :
Openshift provides extensive logging capabilities to monitor, troubleshoot. In Kubernetes, we do not have these as native options rather we need to go for external options.