Are You Prepared for k8s Incident Response?

Posted by Matt Hamilton on February 21, 2020
Matt Hamilton
Find me on:

Untitled_Artwork 2

Often how it feels to keep a Kubernetes cluster afloat

Picture this: The security team sends you responsible disclosure report that links to a PasteBin post with your production secrets. You rotate private encryption keys, database passwords, and other secrets. On review, the security team found no evidence of external intrusion and subsequently requests to review the Kubernetes cluster logs to determine if it was an insider attack. Legal says that your organization complies with PCI, so you should have logs. You need to hand the Kubernetes API logs off to the security team so they can do a review.

Could you furnish the logs to reveal the attacker? Better yet, could you demonstrate that YOU weren’t the attacker?

In this post, I'll be reviewing how to collect Kubernetes API logs and how to get alerts in Slack for sensitive cluster events.

Collecting Kubernetes API Audit Logs

Most organizations collect pod logs for application debugging and telemetry. However, many do not collect Kubernetes node logs, and even fewer collect Kubernetes audit logs. When I’ve asked cluster administrators about Audit policy objects, I can see the gears turning as they consider what they’re missing by not having these policies established.

Upon explaining what these policies provide and why they’re necessary, the response I’ve often heard is: “it’s on the roadmap” – something we hear frequently. In my last two years of performing Kubernetes cluster security reviews for the fortune 100, only organizations with strict PCI compliance policies had sufficient audit logs.

Kubernetes allows administrators to provide an audit.k8s.io policy object, which defines what API actions are logged and their verbosity level. For the purposes of incident response, a minimum of Metadata-level audit logs should be retained for all cluster events.

While this will not retain full request or response bodies, it will record what API actions were performed by who. Ideally, full request logs should be retained to provide detailed introspection into past API actions performed by users.

Audit logs produced by Kubernetes look like this:

To record audit logs, pass the --audit-policy-file flag to the kube-apiserver with an audit.k8s.io policy file. See Kubernetes Auditing for examples to draw the rest of the owl. Logs should be shipped and collected in centralized logging to be available and reviewed by the security team.

Generating Actionable Alerts

Now that you have audit logs, are you ready to get fancy? Ready to use AI and ML for anomaly detection all while smoking a cigar and having a whiskey? Just kidding – hopefully you can give your head a good shake to roll your eyes out from the back of your head.

Once we’ve begun recording audit logs from Kubernetes, the next step is to get actionable alerts in Slack when security-related events occur in the cluster. For this, we like Sysdig’s CNCF incubated open-source Falco project.

By default, Falco will produce a lot of noise. Review the alerts in a testing environment and trim the fat from the Falco configuration so that only actionable alerts are generated. The whole point of using Falco is so that administrators can be alerted of infrequent or suspicious behaviors in the cluster – high-quality alerts are critical here. 

With this, you should now have centralized logging which enables a historical review of Kubernetes API events for incident response. Additionally, the team should be getting messages in Slack when sensitive operations occur in the cluster, enabling quick response to anomalous events.

It’s great that Kubernetes and other open-source projects provide all the necessary tooling to make this possible, however, building incident-response and alerting capabilities is often considered “low priority” by organizations who are struggling just to keep the ship afloat. 

If this is something that’s on your roadmap but you lack the resources necessary to implement it, learn more about Soluble and how we can help. 

Topics: Kubernetes, Incident Response, Falco, Auditing

Matt Hamilton

Written by Matt Hamilton

Matt Hamilton (OSCP), is a principal security researcher at Soluble, where he focuses on Kubernetes security research. He was formerly with Bishop Fox, where he worked on black-box penetration testing, application assessments, source code review, and mobile application review for clients, which included large global organizations and high-tech start-ups. Matt is responsible for more than a dozen CVEs. Matt was a founding member of OpenToAll, an online team for security competitions whose purpose is to mentor newcomers to the security community. He is a responsible disclosure advocate, and loves the Go programming language.