Incident Response in Kubernetes Environments

Incident Response in Kubernetes Environments

Incident response in Kubernetes requires adapted procedures accounting for container ephemerality and cluster dynamics. Traditional response assuming persistent systems fails when containers disappear within minutes. Response procedures must emphasize rapid evidence collection and automated containment to handle Kubernetes scale and speed.

Initial response actions focus on containing threats while preserving evidence. Network policies can isolate compromised pods without destroying them. Admission controllers prevent new instances of compromised images. These automated responses execute faster than manual intervention, critical given container deployment speeds.

Evidence collection requires immediate action before container termination. Response runbooks should include commands for capturing container states, logs, and network connections. Forensic containers with debugging tools can attach to compromised pod namespaces for investigation. Volume snapshots preserve persistent data for offline analysis.

# Incident response automation workflow
apiVersion: v1
kind: ConfigMap
metadata:
  name: incident-response-playbook
  namespace: security-system
data:
  isolate-pod.sh: |
    #!/bin/bash
    NAMESPACE=$1
    POD=$2
    INCIDENT_ID=$3
    
    # Create isolation NetworkPolicy
    cat <<EOF | kubectl apply -f -
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: incident-isolation-${INCIDENT_ID}
      namespace: ${NAMESPACE}
      labels:
        incident.id: ${INCIDENT_ID}
        incident.type: isolation
    spec:
      podSelector:
        matchLabels:
          $(kubectl get pod ${POD} -n ${NAMESPACE} -o jsonpath='{.metadata.labels}' | jq -r 'to_entries[] | "\(.key): \(.value)"')
      policyTypes:
      - Ingress
      - Egress
      egress:
      # Allow only DNS for basic functionality
      - to:
        - namespaceSelector:
            matchLabels:
              name: kube-system
        ports:
        - protocol: UDP
          port: 53
    EOF
    
    # Capture forensic data
    kubectl exec -n ${NAMESPACE} ${POD} -- ps auxf > /forensics/${INCIDENT_ID}/processes.txt
    kubectl exec -n ${NAMESPACE} ${POD} -- netstat -tulpn > /forensics/${INCIDENT_ID}/network.txt
    kubectl exec -n ${NAMESPACE} ${POD} -- ls -la /proc/*/exe > /forensics/${INCIDENT_ID}/executables.txt
    
    # Create forensic pod for detailed investigation
    kubectl run forensics-${INCIDENT_ID} \
      --image=cilium/alpine-curl:latest \
      --namespace=${NAMESPACE} \
      --overrides='{
        "spec": {
          "shareProcessNamespace": true,
          "containers": [{
            "name": "forensics",
            "image": "cilium/alpine-curl:latest",
            "command": ["sh", "-c", "while true; do sleep 3600; done"],
            "securityContext": {
              "capabilities": {
                "add": ["SYS_PTRACE", "SYS_ADMIN"]
              }
            }
          }]
        }
      }'

---
# Automated response webhook
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: incident-response-webhook
webhooks:
- name: block.compromised.images
  clientConfig:
    service:
      name: incident-response-webhook
      namespace: security-system
      path: "/block"
    caBundle: ${CA_BUNDLE}
  rules:
  - operations: ["CREATE", "UPDATE"]
    apiGroups: ["apps", ""]
    apiVersions: ["v1"]
    resources: ["deployments", "pods", "replicasets"]
  admissionReviewVersions: ["v1", "v1beta1"]
  sideEffects: None
  failurePolicy: Fail