Sunday, March 2, 2025

Kubernetes Troubleshooting with Scripts

 Automating Daily Kubernetes Troubleshooting with Nested Scripts

Kubernetes troubleshooting can be a time-consuming task, especially when dealing with recurring issues. Instead of manually running multiple kubectl commands every day, you can automate the process using nested scripts. This blog will guide you through setting up a structured troubleshooting workflow.


Why Automate Kubernetes Troubleshooting?

  • Saves Time: Automating frequent checks helps reduce repetitive tasks.
  • Ensures Consistency: Standardized scripts ensure that every troubleshooting step is performed correctly.
  • Reduces Human Error: Automating log collection and resource monitoring minimizes missed issues.
  • Faster Issue Resolution: Automated scripts provide instant insights into cluster health.

Setting Up the Automation

1. Create a Master Script (troubleshoot.sh)

This script serves as the entry point and executes all necessary checks.

#!/bin/bash

echo "Starting Kubernetes Troubleshooting..."

# Load environment variables if needed
source ~/.bashrc

# Run nested scripts
./check_pods.sh
./check_logs.sh
./check_resources.sh

echo "Troubleshooting completed!"

2. Checking Pods (check_pods.sh)

This script lists pods in error states and fetches relevant logs.

#!/bin/bash

echo "Checking for pods in error state..."
kubectl get pods --all-namespaces | grep -E 'CrashLoopBackOff|Error|Evicted'

echo "Fetching details for problematic pods..."
for pod in $(kubectl get pods --all-namespaces --field-selector=status.phase!=Running -o jsonpath='{.items[*].metadata.name}'); do
  ns=$(kubectl get pod $pod -o jsonpath='{.metadata.namespace}')
  echo "=== Logs for $pod in namespace $ns ==="
  kubectl logs -n $ns $pod --tail=50
done

3. Checking Logs (check_logs.sh)

This script gathers logs for failing pods in a specific namespace.

#!/bin/bash

NAMESPACE="default"  # Change this to your target namespace

echo "Fetching logs for failing pods in namespace $NAMESPACE..."
for pod in $(kubectl get pods -n $NAMESPACE --field-selector=status.phase!=Running -o jsonpath='{.items[*].metadata.name}'); do
  echo "Logs for pod: $pod"
  kubectl logs -n $NAMESPACE $pod --tail=100
  echo "-----------------------------------"
done

4. Checking Resource Usage (check_resources.sh)

Monitor CPU and memory usage across nodes and pods.

#!/bin/bash

echo "Checking resource usage..."
kubectl top nodes
kubectl top pods --all-namespaces

Making Scripts Executable

Before running the scripts, grant execution permission:

chmod +x troubleshoot.sh check_pods.sh check_logs.sh check_resources.sh

Automating with Cron Jobs

To schedule the troubleshooting script to run daily, add a cron job:

crontab -e

Add the following line to execute the script every day at 8 AM:

0 8 * * * /path/to/troubleshoot.sh >> /var/log/k8s_troubleshoot.log 2>&1

Conclusion

By leveraging nested scripts for Kubernetes troubleshooting, you can:

  • Reduce the manual effort required for daily checks.
  • Ensure consistent monitoring of cluster health.
  • Detect and resolve issues faster.

This approach not only enhances efficiency but also improves overall reliability in managing Kubernetes clusters. 🚀 Happy Automating!

No comments:

Post a Comment

Troubleshooting Docker Image Format: Ensuring Docker v2 Instead of OCI

  Troubleshooting Docker Image Format: Ensuring Docker v2 Instead of OCI Introduction While working with Docker 27+ , I encountered an iss...