Red Hat Openshift Admin Day to Day Activities-Part III

Troubleshooting user queries

Openshift admin role involves most of the time spent on user queries and addressing the alerts. Often the users will ping in slack channels and report issues with the errors or we get the alerts through slack channel. The respective resources in that shift is responsible to address those.

We have several categories under this topic, I will just talk about few .

  • POD Issues
  • Logging

POD issues :

To start with POD issues and what we do to address :

1. Pod Fails to Start:

Check Logs:

–> oc logs <pod_name> to identify errors or exceptions.

–> Resource Constraints: Ensure the pod’s resource requests and limits are appropriate

Image Availability: Check if correct image name is provided as in registry

Security Context: Check for SCC ( security context constraint )

2. Pod crashloop backoff :

Crash Loops: Investigate if the application inside the pod is crashing and restarting repeatedly. oc logs <pod-name> : Check logs for crash reports

Resource Constraints: Insufficient resources might cause the pod to be terminated and restarted repeatedly. Openshift tries to restart the POD to see if the POD can get required memory. If its not getting required resources after few restarts, the POD will start crashing

3. Networking Issues:

Service Connectivity: Check for NetworkPolicy ( allow/deny the network connectivity between POD-POD, POD-service, External access )

4. Volume Mounting Problems:

Mount failures will result in POD failures

Permissions: Check file and directory permissions if the pod has issues writing to mounted volumes

Infra changes: At times, infra team making changes on FS, restart the OS with improper FS access mode can also result in above issues

5. OOM Killed Out of Memory :

Check for resources being utilized, available.

6. Image Pull Errors:

This happens if the private registry authentication is incorrect.

Image Availability: Verify the image repository, tag, and digest. Images might be removed or unavailable.

7. Node Issues:

Node failures can also affect the POD status . Check with infra team on the heath of Nodes

Logging :

Openshift provides extensive logging capabilities to monitor, troubleshoot. In Kubernetes, we do not have these as native options rather we need to go for external options.