Hardening AWS attestation (GetWebIdentityToken + EKS Pod Identity)
Problem
The shipped AWS attestor (ATTESTOR=aws-sts) derives the agent id from the STS
RoleSessionName, which the workload sets itself (the operator passes
AWS_ROLE_SESSION_NAME=<agentId>). AWS attests role possession, not the agent
id — a compromised agent could assume the shared role under a different session
name and impersonate another agent. We want the agent id to be cluster-attested
(unforgeable by the workload) while staying readable by the platform.
Spike result (decisive)
We probed AWS’s new outbound web identity federation API,
sts:GetWebIdentityToken, from a pod running under EKS Pod Identity, calling
it with no caller tags. The returned (AWS-signed) JWT payload:
{ "aud": "spawnly-spike", "sub": "arn:aws:iam::ACCT:role/spawnly-podid-spike", "iss": "https://<uuid>.tokens.sts.global.api.aws", "https://sts.amazonaws.com/": { "principal_id": "arn:aws:iam::ACCT:role/spawnly-podid-spike", "principal_tags": { "kubernetes-pod-name": "podid-getwebid-spike-scwqs", "kubernetes-pod-uid": "f7a73daa-…", "kubernetes-namespace": "default", "kubernetes-service-account": "podid-spike", "eks-cluster-name": "spawnly", "eks-cluster-arn": "arn:aws:eks:us-east-1:ACCT:cluster/spawnly" } }}GetWebIdentityToken propagates the EKS Pod Identity session tags into the JWT
as principal_tags — including kubernetes-pod-name, which we did not pass.
This is the ideal:
- Attested — EKS sets
kubernetes-pod-name/uid; the workload can’t forge them. Caller-supplied--tagsland in a separaterequest_tagsfield (confirmed in Phase 1), so the verifier readingprincipal_tagscannot be spoofed. - Readable — it’s a normal JWT claim, validated against the account STS
issuer’s public JWKS (
<iss>/.well-known/jwks.json, RS256). - STS-native — an STS-issued token; the agent id is a tag on the STS principal (exactly the original goal).
- Cheap — one shared ServiceAccount + one Pod Identity association covers every agent; each pod’s token still carries its own attested pod name. No per-agent IAM roles, no operator IAM-mutation power, no cross-check.
Why this beats the alternatives
| Approach | Attested | Platform-readable | Per-agent IAM churn |
|---|---|---|---|
aws-sts today (RoleSessionName) | ❌ self-asserted | ✅ (ARN) | none |
| Design A (per-agent IAM role) | ✅ | ✅ (ARN) | high (role/SA per agent + operator IAM power) |
| Design B (cluster-signed SA token claim) | ✅ | ✅ (token) | none, but not STS-native |
| GetWebIdentityToken + Pod Identity | ✅ (EKS-set tag) | ✅ (JWT claim) | none |
Design
ATTESTOR=aws-stsweb (new). Requires the account-level outbound web identity
federation feature and EKS Pod Identity.
- Operator runs each agent pod as a shared ServiceAccount (e.g.
spawnly-agent) that has a Pod Identity association to an IAM role. Pods are named deterministically<agentId>-pod(already the case). - EKS Pod Identity injects credentials and stamps the session with
kubernetes-pod-name=<agentId>-pod(+ uid/ns/sa/cluster). - Sidecar calls
sts:GetWebIdentityToken(audience="spawnly", signingAlgorithm=RS256)— no--tags— and presents the returned JWT asclient_assertion(jwt-bearer). - Verifier (registry self-registration + IdentityServer token minting)
validates the JWT against the STS issuer JWKS, checks
aud=="spawnly", then derivesagentId = principal_tags["kubernetes-pod-name"]minus the-podsuffix. For defense in depth it also assertsprincipal_tagskubernetes-namespace/kubernetes-service-account/eks-cluster-arnmatch the expected values.
Security-critical: the verifier MUST read ["https://sts.amazonaws.com/"].principal_tags
(EKS-set, attested), never request_tags (caller-set, self-asserted). The
AgentId-consistency invariant holds automatically: registry and IS extract the
same kubernetes-pod-name, and it equals the orchestrator’s pre-registered
aw.Name.
Consistency invariant (keep Go ↔ C# in lock step). The Go
registrant.identityFromTags and the C# StsWebCredentialVerifier must derive
byte-identical AgentID/Subject/Issuer from the same principal_tags. Two
subtleties to preserve when editing either side:
- The
Subjectis path-style (<eks-cluster-arn>/agent/<agentId>) so downstream act-chain handling recovers the agentId via the last path segment. - A missing or empty
eks-cluster-arnfalls back to the literal"eks"on both sides. (Go has no Go test project gap here; the C# side has no unit-test project today — if one is added, lock this with an empty-vs-missing case.)
Threat model / parity with SPIRE. EKS (the control plane + Pod Identity
agent) attests the pod identity, the same trust root SPIRE uses (kubelet/node).
A container can only obtain its own pod’s session, so it can only ever present
its own attested kubernetes-pod-name. Residual: short-TTL token replay before
expiry, mitigated by TTL + aud binding — not worse than SPIRE.
Implementation plan
Phase 0 — Prerequisites
- Confirm
aws-sdk-go-v2/service/stsexposesGetWebIdentityToken;go get -uthe sts module if needed. (If the Go SDK lags, fall back to a SigV4-signed HTTP call — but verify the SDK first.) - Account:
aws iam enable-outbound-web-identity-federation(one-time; idempotent). Capture the issuer viaaws iam get-outbound-web-identity-federation-info(IssuerIdentifier). This becomesSTSWEB_ISSUER.
Phase 1 — Sidecar credential source (Go)
internal/attestor/stsweb.go:StsWebSource{ audience string }whoseFetchcallsGetWebIdentityToken(Audience=[audience], SigningAlgorithm=RS256, DurationSeconds=3600, no Tags) and returnsCredential{Value: *out.WebIdentityToken, AssertionType: JWTBearerAssertionType}. Creds come from Pod Identity via the default credential chain.- Wire
case "aws-stsweb"incmd/agent-sidecar/main.go, readingSTSWEB_AUDIENCE(defaultspawnly). - Unit-test the credential shape with a faked STS client.
Phase 2 — Registry verifier (Go)
internal/registrant/stsweb.go:StsWebVerifierthat validates the bearer JWT against the STS issuer JWKS (reuse thejwxJWKS cache as inoidc.go), checksaud, extracts["https://sts.amazonaws.com/"].principal_tags.kubernetes-pod-name, strips-pod→AgentID; asserts ns/sa/cluster claims;Issuer="aws-stsweb".- Config:
STSWEB_ISSUER,STSWEB_AUDIENCE, expectedSTSWEB_NAMESPACE/STSWEB_SERVICE_ACCOUNT/STSWEB_CLUSTER_ARN. cmd/registry/main.go: addcase "aws-stsweb"(verifier) and theattestorDefaultmapping (aws-stsweb→aws-stsweb).- Confirm
validAgentIDstill accepts the derived id.
Phase 3 — IdentityServer verifier (C#)
identityserver/StsWebCredentialVerifier.cs : IAgentCredentialVerifier: validate the JWT against the STS issuer JWKS (pattern ofSpireSvidValidator), checkaud, extractprincipal_tags.kubernetes-pod-name→AgentId(strip-pod), assert ns/sa/cluster;Issuer="aws-stsweb".Program.cs:case "aws-stsweb"selecting it, readingSTSWEB_*env.AgentClientSecretValidatoralready acceptsjwt-bearer— no change.
Phase 4 — Operator injector + selector (Go)
internal/operator/identity.go:StsWebInjector{ ServiceAccount, Region, Audience }— setsserviceAccountNameand stampsATTESTOR=aws-stsweb,AWS_REGION,STSWEB_AUDIENCEon the sidecar. No IRSA annotation, no AWS_ROLE_SESSION_NAME (Pod Identity owns the session; the EKS webhook injects the AWS creds env automatically).cmd/operator/main.go:case "aws-stsweb"building it.
Phase 5 — Infra + scripts
- Terraform (
deploy/aws/terraform/):aws_eks_addon "eks-pod-identity-agent".- Agent IAM role: trust
pods.eks.amazonaws.comwithsts:AssumeRole+sts:TagSession; inline policysts:GetWebIdentityToken. (Replaces the IRSA web-identity trust on the agent role.) aws_eks_pod_identity_association(cluster,default,spawnly-agent, role).- Enable outbound web identity federation: use a native resource if the
provider supports it; otherwise a
null_resourcelocal-execcallingaws iam enable-outbound-web-identity-federation, with the issuer read back via anexternaldata source. Exposeoutput "stsweb_issuer". - Keep
enable_cluster_creator_admin_permissions; also add anaccess_entriesblock mapping the SSO admin role (and/orspawnly-terraform) so the SSO access-entry mismatch can’t recur.
deploy.sh: setATTESTOR=aws-stswebon operator/registry/identity-server; injectSTSWEB_ISSUER(from the TF output orget-outbound-web-identity-federation-info),STSWEB_AUDIENCE=spawnly, and the expected ns/sa/cluster-arn; create the plainspawnly-agentSA (no IRSA annotation). Drop the IRSAserviceaccount.yamlrole-arn step.up.sh: afterterraform apply, ensure outbound federation is enabled and read the issuer; add an access-entry self-heal (create/associate admin for the running caller ARN, handling the SSOassumed-role→role-ARN conversion) sokubectlalways works post-apply regardless of SSO; passSTSWEB_ISSUERtodeploy.sh.down.sh: unchanged for teardown (the addon, association, and role are now Terraform-managed and destroyed with the cluster). Note (do not auto-run) thataws iam disable-outbound-web-identity-federationis the optional account-level revert — left enabled by default since it’s a harmless account capability.
Phase 6 — Verify on a cluster
- Extend
smoke-test.shto assert the agent registered withissuer=aws-stsweband that the agent id came from the attested pod name. - Spoof test: spawn an agent whose sidecar also passes a bogus
--tags kubernetes-pod-name=someone-else; confirm it lands inrequest_tagsand is ignored — the verifier still derives the real id fromprincipal_tags.
Phase 7 — Docs + deprecate
- Update
attestation.md/attestation-aws.mdfor theaws-stswebpath. - Mark
aws-sts(GetCallerIdentity / RoleSessionName) legacy: readable but self-asserted; keep behind the selector for non-Pod-Identity environments.
Status
Implemented and verified end-to-end on EKS (issuer=aws-stsweb,
token_issued, work_ok, no SPIRE), including a spoof test proving a forged
request_tags.kubernetes-pod-name is ignored. Resolved during implementation:
Go SDK v1.43.3 already has GetWebIdentityToken; outbound federation is enabled
via up.sh (not Terraform); the Go verifier needs
jws.WithInferAlgorithmFromKey(true) because the AWS STS JWKS RSA key omits alg.
ECR now lives in its own Terraform root (deploy/aws/ecr) so images persist
across down.sh/up.sh — down.sh destroys only the cluster root. (Done; was a
backlog item.)