GKE is a Google Cloud service that offers a managed Kubernetes cluster, the nodes of the clusters are running on Google Cloud VM instances, the control plane and network is fully managed by GKE.
GKE offers a sandboxing feature (https://cloud.google.com/kubernetes-engine/docs/concepts/sandbox-pods), based on gVisor (https://gvisor.dev/docs/) it protects the host kernel from untrusted code. This sandboxing offers a very good isolation and allow SaaS business to execute unknown code submitted by their users.
I tried to use this feature to run isolated workloads and found that the isolation was not entirely effective and that the access to the metadata API was possible under certain conditions.
By default, in a Kubernetes cluster all pods are able to communicate, GKE recommends to use Network Policy to restrict the network traffic between pods (https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster#restrict_with_network_policy).
When running untrusted code, it is a good practice to isolate your clients from each other and from your own services.
With this feature, it is easy to define a policy and attach it to a group of pods and restrict the network access for theses pods.
Google Cloud team documents how to harden the workload isolation using GKE sandbox (https://cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods#sandboxed-application), and gives some hints on how to configure and test the access to the metadata.
To validate that the filtering is properly enabled, you can launch a new pod and run the following command:
curl -s "http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"
This command is failing as described in the documentation because there is filtering denying the access to the metadata API.
By default the instance metadata API server is not supposed to be accessible from any sandboxed pod.
When testing the network isolation for untrusted pods, I tried to configure the network policy on the cluster and applied some network filtering rules for the pods that I wanted to isolate.
After more testing, I found out that I was able to query the metadata API, it appears that the network filtering applied for the gVisor sandboxed pod by the GKE team was entirely disabled when the network policy was activated.
Since this sandboxing feature is supposed to run untrusted code, this would give an attacker access to sensitive informations about the node, project and Kubernetes cluster.
The bug was reported to the VRP team and quickly fixed, I was able to mitigate this by manually filtering the
169.254.169.254 IP in the network policy applied to theses pods.
You can follow the steps here : https://cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods
gcloud container clusters create cluster-name --enable-network-policy
gcloud container node-pools create gvisor \ --cluster=cluster-1 \ --node-version=1.16.13-gke.401 \ --machine-type=e2-standard-2 \ --image-type=cos_containerd \ --sandbox type=gvisor --zone europe-west1-c
# sandbox-metadata-test.yaml apiVersion: apps/v1 kind: Deployment metadata: name: fedora labels: app: fedora spec: replicas: 1 selector: matchLabels: app: fedora template: metadata: labels: app: fedora spec: runtimeClassName: gvisor containers: - name: fedora image: fedora command: ["/bin/sleep","10000"]
kubectl exec -it pod-name /bin/sh
curl "http://169.254.169.254/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google" ... ALLOCATE_NODE_CIDRS: "true" API_SERVER_TEST_LOG_LEVEL: --v=3 ...
With this metadata exposure bug, an attacker may gain access to sensitive information about the node, project and Kubernetes cluster.
Depending of the configuration, this could lead to:
Even when the isolation is properly working you have many ways to protect yourself against this kind of metadata exposure.
A few recommandation for running untrusted code in GKE:
dmesgif you are running inside the sandbox