Someone deleted a ConfigMap in production. The app crashed. Everyone's panicking. "Who did this?"

Silence. Nobody knows. There are no logs.

This is why you enable audit logging before you need it.

What Audit Logs Capture

Kubernetes auditing records all requests to the API server. Who did what, when, and how.

From the docs: "Kubernetes auditing provides a security-relevant, chronological set of records documenting the sequence of actions in a cluster."

That means:

  • User authentication attempts
  • Resource creation/modification/deletion
  • RBAC permission checks
  • Exec into pods
  • Secret access
  • Everything that touches the API

Setting It Up (Self-Managed Clusters)

You need two things: an audit policy and API server flags.

The audit policy defines what to log:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log pod exec and attach at RequestResponse level
- level: RequestResponse
  resources:
  - group: ""
    resources: ["pods/exec", "pods/attach"]

# Log secret access at Metadata level (don't log contents!)
- level: Metadata
  resources:
  - group: ""
    resources: ["secrets"]

# Log all other requests at Request level
- level: Request
  resources:
  - group: ""
  - group: "apps"
  - group: "batch"

Audit levels:

  • None: Don't log
  • Metadata: Log request metadata (user, timestamp, resource) but not request/response body
  • Request: Log metadata + request body
  • RequestResponse: Log everything

Important: Never log Secrets at RequestResponse level. You'll have plaintext secrets in your audit logs. Metadata level is enough to know who accessed them.

API server flags:

--audit-policy-file=/etc/kubernetes/audit-policy.yaml
--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxbackup=10
--audit-log-maxsize=100

Managed Clusters

Good news: EKS, GKE, and AKS provide audit logging out of the box. You just need to enable it.

GKE: Audit logs go to Cloud Logging automatically. GKE audit logging docs explain how to access them.

EKS: Enable control plane logging for the "audit" log type. Logs go to CloudWatch.

AKS: Enable diagnostic settings to send audit logs to Log Analytics.

What to Look For

Datadog's guide on key audit logs highlights important events:

Privileged pod creation:

{
  "verb": "create",
  "objectRef": {"resource": "pods"},
  "requestObject": {
    "spec": {
      "containers": [{
        "securityContext": {"privileged": true}
      }]
    }
  }
}

Exec into pods:

{
  "verb": "create",
  "objectRef": {"resource": "pods", "subresource": "exec"},
  "user": {"username": "jane@example.com"}
}

Secret access:

{
  "verb": "get",
  "objectRef": {"resource": "secrets", "name": "database-credentials"},
  "user": {"username": "unknown-service-account"}
}

Automated Detection

Don't just collect logs—alert on suspicious patterns:

  • Service accounts accessing secrets they don't normally access
  • Exec into production pods outside maintenance windows
  • Creation of privileged pods
  • ClusterRoleBinding changes
  • Failed authentication attempts

Tools like Falco can monitor audit logs and alert in real-time:

- rule: K8s Secret Access
  desc: Detect any access to cluster secrets
  condition: >
    kevt and secret and
    kactivity in (get, list)
  output: >
    Secret accessed (user=%ka.user.name secret=%ka.target.name)
  priority: WARNING

Storage and Retention

Audit logs grow fast. Plan for:

  • Volume: Busy clusters generate gigabytes daily
  • Retention: Compliance often requires 90+ days
  • Search: You need to actually query these logs

Send them to a log aggregation system (Elasticsearch, Loki, CloudWatch, etc.) with proper retention policies and alerting.

The "Before You Need It" Part

Here's the thing: you won't realize you need audit logs until something goes wrong. By then it's too late.

Set it up now:

  1. Enable audit logging
  2. Ship logs to a searchable store
  3. Set up basic alerts for high-risk events
  4. Test by querying who did what yesterday

When someone asks "who deleted production?" you'll have the answer.

Sources: