Someone deleted a ConfigMap in production. The app crashed. Everyone's panicking. "Who did this?"
Silence. Nobody knows. There are no logs.
This is why you enable audit logging before you need it.
What Audit Logs Capture
Kubernetes auditing records all requests to the API server. Who did what, when, and how.
From the docs: "Kubernetes auditing provides a security-relevant, chronological set of records documenting the sequence of actions in a cluster."
That means:
- User authentication attempts
- Resource creation/modification/deletion
- RBAC permission checks
- Exec into pods
- Secret access
- Everything that touches the API
Setting It Up (Self-Managed Clusters)
You need two things: an audit policy and API server flags.
The audit policy defines what to log:
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log pod exec and attach at RequestResponse level
- level: RequestResponse
resources:
- group: ""
resources: ["pods/exec", "pods/attach"]
# Log secret access at Metadata level (don't log contents!)
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
# Log all other requests at Request level
- level: Request
resources:
- group: ""
- group: "apps"
- group: "batch"
Audit levels:
None: Don't logMetadata: Log request metadata (user, timestamp, resource) but not request/response bodyRequest: Log metadata + request bodyRequestResponse: Log everything
Important: Never log Secrets at RequestResponse level. You'll have plaintext secrets in your audit logs. Metadata level is enough to know who accessed them.
API server flags:
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxbackup=10
--audit-log-maxsize=100
Managed Clusters
Good news: EKS, GKE, and AKS provide audit logging out of the box. You just need to enable it.
GKE: Audit logs go to Cloud Logging automatically. GKE audit logging docs explain how to access them.
EKS: Enable control plane logging for the "audit" log type. Logs go to CloudWatch.
AKS: Enable diagnostic settings to send audit logs to Log Analytics.
What to Look For
Datadog's guide on key audit logs highlights important events:
Privileged pod creation:
{
"verb": "create",
"objectRef": {"resource": "pods"},
"requestObject": {
"spec": {
"containers": [{
"securityContext": {"privileged": true}
}]
}
}
}
Exec into pods:
{
"verb": "create",
"objectRef": {"resource": "pods", "subresource": "exec"},
"user": {"username": "jane@example.com"}
}
Secret access:
{
"verb": "get",
"objectRef": {"resource": "secrets", "name": "database-credentials"},
"user": {"username": "unknown-service-account"}
}
Automated Detection
Don't just collect logs—alert on suspicious patterns:
- Service accounts accessing secrets they don't normally access
- Exec into production pods outside maintenance windows
- Creation of privileged pods
- ClusterRoleBinding changes
- Failed authentication attempts
Tools like Falco can monitor audit logs and alert in real-time:
- rule: K8s Secret Access
desc: Detect any access to cluster secrets
condition: >
kevt and secret and
kactivity in (get, list)
output: >
Secret accessed (user=%ka.user.name secret=%ka.target.name)
priority: WARNING
Storage and Retention
Audit logs grow fast. Plan for:
- Volume: Busy clusters generate gigabytes daily
- Retention: Compliance often requires 90+ days
- Search: You need to actually query these logs
Send them to a log aggregation system (Elasticsearch, Loki, CloudWatch, etc.) with proper retention policies and alerting.
The "Before You Need It" Part
Here's the thing: you won't realize you need audit logs until something goes wrong. By then it's too late.
Set it up now:
- Enable audit logging
- Ship logs to a searchable store
- Set up basic alerts for high-risk events
- Test by querying who did what yesterday
When someone asks "who deleted production?" you'll have the answer.
Sources: