Red Hat Openshift Admin Day to Day activities : Part-II [Banking Domain]

Article on ‘Openshift admin day to day activities : Part-I’ ( https://www.linkedin.com/posts/activity-7121099149671895040-BJwQ?utm_source=share&utm_medium=member_desktop) covered activities of a student working in UK. This part-II is from a student/professional working in banking domain – Overall 8 years of experience, relevant experience would be 3.5 years – L3 admin.

At high level, he gets below request through Jira as user story :

  1. Deployment failures
  2. Certificate Management
  3. Troubleshooting user queries in terms of k8s objects related
  4. Ingress traffic related
  5. Egress connectivity related
  6. Managing multi cluster upgrades
  7. Manage the clusters running in 2 datacenter
  8. Support private / public cloud environment
  9. Service mesh upgrades
  10. Logging related query
  11. Monitoring enablement and queries
  12. Production outage troubleshooting
  13. Incident/change implementation and jira task and other image related vulnerability fixes

Considering the experience level, the number of issues and tasks assigned to the individual is high. I have just detailed couple of tasks in this article. To avoid lengthy pages, will split the tasks and cover in upcoming articles.

Deployment Failures :

Similar to Kubernetes or any other middleware technologies, OpenShift deployment failures can also occur due to various reasons, ranging from issues in your application code to problems with configurations, resources, or the OpenShift platform itself.

– Often developers and Engineering team get into a discussion of pointing to each other on the root cause

– Few common issues outlined below :

1. Check Application Logs:

–> oc logs <pod-name>

–> oc get events

–> oc describe pod <pod-name>

2. Resource Constraints:

–> Ensure pods have sufficient resources – CPU, memory allocated.

3. Image Pull Issues:

–> Network issue between cluster and registry

–> Verify that the container images specified in your deployment configuration exist

–> Check image names, repositories, and authentication requirements

4. Network Policies:

–> oc get networkpolicy , oc describe networkpolicy -o yaml

–> Network policies are all about allow and deny the request for PODs

–> Check if its restrict communication between pods and services

–> Check if the PODs are allowed to communicated to other end points

5. Environment Variables:

–> Check configuration files and secret references

–> Incorrect references may lead to failures

6. Service Endpoints and Ports:

–> oc describe service <service-name>

–> Check for endpoints ( POD ip address )

–> check if required ports are exposed

7. Volume Mounts and Persistent Storage:

–> oc describe deployment <deployment-name>

–> check the volume details , check for volume mount

8. OpenShift Cluster Health:

–> oc get nodes

–> oc describe node <node-name>

–> oc adm node-logs -u kubelet <node-name>

–> Check for kubelet, crio service status

–> Check inbuilt openshift dashboard

Certificate Management :

– There are multiple certificates in Openshift to ensure secure communication between components

– Certificates of API server, ETCD, Router, Registry, Metrics, Console, Kubelet, to be managed

– All these certificate renewals, rotations are one of the key tasks

– Better to automate the renewal through any customized script