In early June, I identified a trivial-to-exploit Denial of Service in Keycloak, by which an unauthenticated attacker could deny service to a Keycloak server... with only a cell phone 📱.
Here on the research team at Soluble, we're always looking at the security of open-source projects that run in Kubernetes. We pay particular attention to CNCF projects, as the fruits of our research help to secure applications running in Kubernetes environments all over the world.
TLDR: the Keycloak webserver held connections open too long, and the default database connector pool size was far too small. This allowed an unauthenticated attacker to "take down" the site.
This issue was reported to Red Hat by the Soluble research team, and a fix was implemented in Red Hat SSO on August 18, Keycloak v11.0.1 on August 19, and OpenShift Application Runtimes 1.0 on September 2.
The technical details are available below, including a very simple shell script PoC that you can use to test and validate your own SSO installations.
A classic HTTP DoS - send a POST with a Content-Length value larger than the actual body size. Depending on webserver and reverse proxy configuration, one potential outcome is that the server will hold the connection open, often up to two minutes, waiting for the client (to send the rest of the data). Attackers can abuse this by just never sending that additional data, and draining the server's connection pool while using minimal bandwidth.
Many webservers have a hard-coded bottleneck somewhere, and you have to poke things to find where that breaking point is. I saw this particular issue all the time as a consulting penetration tester. I find it very amusing that this particular issue still plagues the internet as much as it does -- Gigabit connections are now becoming commonplace, yet for many servers it's possible to deny service not with a multi-TB SYN flood, but with a curl script and a cellular connection.
Below is a PoC script that demonstrates CVE-2020-10758:
Note that if you have any reverse proxies in front of Keycloak (such as Nginx, Cloudflare, or a GCP load balancer), their mere presence may mitigate this issue. I encourage you to test your own installation with the script above to check it out for yourself.
Keycloak v11.0.1 changed the default configuration values of the HTTP and HTTPS listener timeout, from 120 to 30 seconds. That version also included an increase of the default connection pool size, from 20 to 100 connections.
My 2c, most organizations should look into further increasing the database connection pool. A thirty-second HTTP timeout is reasonable, but could be lowered if your organization is sensitive to this class of attack. If Keycloak provides Auth services to your organization's critical infrastructure, you should take steps to ensure that your configuration robustly handles the script above.
6/02/2020 - Vulnerability identified by Matt Hamilton @ Soluble
6/02/2020 - Vulnerability reported on Red Hat's Keycloak issue tracker
8/18/2020 - Vulnerability patched in Red Hat Enterprise SSO
8/19/2020 - Vulnerability patched in Keycloak v11.0.1
9/02/2020 - Vulnerability patched in Red Hat OpenShift Application Runtimes 1.0 (Thorntail)
9/03/2020 - Public disclosure
🍻 Cheers to the Keycloak team at Red Hat