top of page

Defeating Kubernetes Privilege Escalation: A Cloud Threat Detection & Incident Response Case Study



Attackers are constantly searching for new ways to target cloud environments and escalate initial access into full administrative privileges.


In recent months, Gem's cloud security research team has observed a rise in attempts to escalate privileges from access to Kubernetes clusters to cloud control planes, creating significant risks for many organizations. As containers often lack the visibility and prevention controls commonly applied to traditional compute resources, this expansion of the cloud attack surface can be especially challenging. 


A recent attack defeated by the Gem team highlights the importance of across-the-cloud heuristic detections and immediate response capabilities to combat these rising threats. To protect the victim organization’s identity, certain details of the attack have been modified, however every stage of the presented case study was performed by real attackers and responders. 


Threat Detection

The investigation was triggered by two specific attacker TTPs detected in a sensitive production AWS environment: 

  1. An EC2 instance IAM role used from an irregular source IP address in AWS.

  2. Apparent reconnaissance actions being performed by the same role querying multiple EC2 instances and Security Groups.


While each of these actions may not have been classified as highly suspicious on its own, heuristic analysis based on environmental inventory baselines  identified the role as one belonging to an EC2 instance running an EKS pod. This fact was key to the attack being detected early – actions which may be common or legitimate when performed by other roles in the environment were correctly identified as suspicious behavior for this specific IAM role. 


With this initial activity identified, defenders turned to a quick triage process and determined there was no legitimate reason for this role to be performing the detected actions. As the team jumped into a full investigation, the key questions became whether and how this role could have been compromised. Without a quick and decisive resolution to this mystery, the team faced a familiar dilemma in incident response: do we respond and contain the suspicious activity before knowing all the facts? Responding quickly has the obvious benefit of potentially preventing further damage, however doing so without context often leads to ineffective containment measures which only serve to let attackers know they’re being investigated and speed up their attack. This is where rapid contextualized investigation became crucial. 


Investigation and Response

Pulling the thread of initial suspicious events, the team was able to leverage context from CloudTrail logs, VPC Flow logs, and forensic artifacts from the EC2 instance, to quickly put the pieces of the puzzle together. Due to a default configuration left in place, EKS pods were allowed to connect to the Instance Metadata Service (IMDS) on their host EC2 instance. Shortly before reconnaissance activity began, leveraging the compromised EC2 instance role, local OS logs showed the Instance Metadata Service being accessed from the instance – enabling the attacker accessing it to escalate machine access to control of the IAM role. 


Forensic and log evidence further revealed that the EC2 instance was running an open-source application containing a recently published remote code execution CVE. This CVE was successfully exploited by a presumed scan just hours before the EC2 instance IAM role was compromised. As it happened, this open-source application was specifically run by the DevOps team behind a Security Group disallowing any access from the internet. However, this changed on the day of the attack. An internet facing service was unfortunately deployed to the same subnet as the vulnerable open-source application, leading to the modification of the same Security Group controlling access to both machines.



Figure 1: High-level attack flow generated by Gem's automated triage and investigation timelining


While this misconfiguration enabled attackers access to the organization's cloud infrastructure, its rapid detection had a silver lining: the entire attack was now clearly understood to have only taken place within a few hours. Attackers successfully scanned and exploited the vulnerable EKS application, quickly escalated privileges to the EC2 instance IAM role, then began performing reconnaissance. The environment was compromised quickly, but equally rapid detection and contextualized triage enabled the immediate eradication of malicious access before any sensitive data was accessed.


Most importantly – any harm to production or critical assets was prevented in time by rotating the compromised credential, cleaning the EC2 instance, and removing the vulnerable application.


Conclusion

This case study serves to highlight the importance of rapid, heuristic, accurate, and contextualized detection and response in the cloud. In addition to the obvious takeaways of implementing effective vulnerability management and segregating production applications, we must accept the fact that mistakes and misconfigurations can still happen. The risk is especially high from attacks leveraging newly released CVEs or sparsely monitored applications, such as this attack targeting Kubernetes. The ability to centrally detect and rapidly triage complex anomalous events in the cloud is therefore not only a “nice to have”, but a vital requirement of a successful cloud security strategy. 


Learn more

Unlike shift-left tools focused on static vulnerabilities and compliance, Gem's agentless platform continuously analyzes real-time telemetry to help SecOps teams stop active attacks.


The platform pulls all your cloud logs (AWS, Azure, GCP, Okta, etc.) into its data lake and continuously correlates thousands of events — from across the control, identity, compute, data, and network planes — to rapidly detect threats and automate triage, investigations, containment, and forensics.


Learn more about:


bottom of page