Architecture Score
8.5 / 10

Phantom Architecture

Verdict: Architecturally Sound

The trust model is correct: webhook = UX convenience, cryptographic attestation = security boundary. The explicit acknowledgment that cluster-admin, system:masters, and the cloud provider can all bypass the webhook — and the design that makes this bypass irrelevant — is the strongest architectural decision.

How It Works

Phantom is a Kubernetes operator that injects a sidecar via mutating webhook. The sidecar fetches secrets from an EU-hosted OpenBao/Vault instance directly into process memory — secrets never touch etcd, never enter Kubernetes Secrets, and the cloud provider never holds the keys.

Key Strengths

  • Secrets never touch etcd. Eliminates an entire class of attacks (etcd dump, backup exfiltration, KMS compulsion). The correct approach for managed Kubernetes where you have zero control over the control plane.
  • Three-tier caching is well-designed. Hot cache → sealed local cache → grace period progression is operationally sound. Sealed cache key derivation from SA token + cluster HMAC is reasonable.
  • Circuit breaker on the webhook is the right pattern for fail-closed security products. The override escape hatch (namespace label) is correctly positioned as an auditable last resort.
  • Canary injection via namespace labels is operationally mature thinking for a product that modifies every pod in the cluster.

Known Concerns

  • gVisor as “optional lightweight sandbox” is undersold. Without it, a root-level attacker on the node can read process memory via /proc/[pid]/mem. The “optionally” qualifier weakens the story.
  • The sidecar is a single point of failure per pod. If phantom-proxy crashes and the sealed cache is expired, the application loses access to all secrets. Consider a direct (attested) fallback path to OpenBao.
  • Env var patching requires applications to use environment variables or a specific socket protocol. Applications that read secrets from files need a different mechanism — solvable but not addressed.

Technology Choices: Correct

Go for the webhook/operator/sidecar is the standard choice with first-class Kubernetes client libraries. OpenBao as the external secrets source is the right call. AMD SEV-SNP / Intel TDX for attestation is the correct hardware trust anchor.

Trust Model

“Webhook = UX, Crypto = Security” — Correct and Well-Reasoned

The document’s analysis of who can bypass the webhook and why that doesn’t matter (secrets aren’t in Kubernetes, attestation gates key release) is technically sound.

One Gap

The trust model assumes OpenBao is outside the cloud provider’s jurisdiction. If a customer misconfigures OpenBao to run inside the US cloud, the entire model collapses. The architecture should enforce or verify OpenBao’s location as part of the attestation flow.

Security Flows

1. Initial Bootstrap — Trust Establishment

Before any secrets flow, the cluster must be registered with EU OpenBao. This is a one-time setup per cluster.

sequenceDiagram
    participant Admin as Platform Admin
    participant OB as EU OpenBao
    participant K8s as K8s Cluster
    participant PH as Phantom Operator

    rect rgb(30, 40, 55)
    Note over Admin,OB: One-time setup (EU side)
    Admin->>OB: Enable Kubernetes auth method
    Admin->>OB: Register cluster (API server URL + CA cert)
    Admin->>OB: Create policies (namespace → secret paths)
    Admin->>OB: Generate bootstrap token (time-limited)
    end

    rect rgb(30, 45, 40)
    Note over Admin,PH: One-time setup (cluster side)
    Admin->>K8s: helm install phantom --set openbao.addr=... --set openbao.token=...
    PH->>PH: Create ServiceAccount, MutatingWebhookConfig, CRDs
    PH->>OB: Authenticate with bootstrap token (mTLS)
    OB->>K8s: Validate cluster identity via TokenReview API
    OB-->>PH: Confirm trust. Issue renewable accessor token
    PH->>PH: Discard bootstrap token. Use accessor token for renewals
    end

    Note over K8s,OB: Trust established. Bootstrap token is now useless.
      

Bootstrap Token Lifecycle

The bootstrap token is short-lived (e.g., 10 minutes) and used only once to register the cluster with OpenBao. After initial authentication, the operator uses Kubernetes ServiceAccount tokens for ongoing auth. If the bootstrap token is intercepted, it expires before it can be reused. OpenBao’s Kubernetes auth backend validates tokens via the cluster’s TokenReview API — a stolen token from a different cluster is rejected.

2. Pod Startup — Secret Injection

What happens every time a labeled pod is created.

sequenceDiagram
    participant Dev as Developer
    participant API as K8s API Server
    participant WH as Phantom Webhook
    participant Pod as App Container
    participant SC as Phantom Sidecar
    participant OB as EU OpenBao

    Dev->>API: kubectl apply (Deployment)
    API->>WH: AdmissionReview (pod spec)
    WH->>WH: Check namespace labels, compatibility
    WH-->>API: Mutated spec (sidecar + init container injected)
    API->>Pod: Schedule pod

    rect rgb(30, 45, 40)
    Note over Pod,OB: Secret injection (happens before app starts)
    SC->>SC: Init container copies wrapper binary to shared volume
    SC->>OB: Auth with pod ServiceAccount token (mTLS)
    OB->>OB: Validate SA token via TokenReview
    OB->>OB: Check policies (namespace + SA → allowed secret paths)
    OB-->>SC: Return secrets (encrypted in transit)
    SC->>SC: Store in hot cache (in-memory) + sealed cache (tmpfs)
    SC->>Pod: Inject as env vars / tmpfs files / Unix socket
    Pod->>Pod: App starts with secrets in process memory
    end

    Note over Pod: Secrets never in etcd. Never on disk. Never in K8s API.

    loop Every 4 minutes
    SC->>OB: Renew lease + check for rotation
    OB-->>SC: Updated secrets (if rotated)
    SC->>Pod: Hot-reload updated secrets
    end
      

3. CLOUD Act Subpoena

What happens when a US legal order compels the cloud provider to hand over data.

sequenceDiagram
    participant USG as US Government
    participant CP as Cloud Provider
    participant K8s as K8s Cluster
    participant OB as EU OpenBao

    USG->>CP: CLOUD Act subpoena: produce all customer data

    rect rgb(55, 30, 30)
    Note over CP,K8s: Provider complies (they must)
    CP->>K8s: Dump etcd
    K8s-->>CP: etcd contents (pods, deployments, configmaps...)
    Note over CP: No secrets found in etcd
    CP->>K8s: Snapshot VM memory
    K8s-->>CP: Memory dump (encrypted if TEE, otherwise readable)
    CP->>K8s: Copy persistent volumes
    K8s-->>CP: Volume data (no secret material)
    end

    CP-->>USG: Deliver: etcd dump + memory + volumes

    rect rgb(30, 40, 55)
    Note over USG,OB: Cannot reach EU OpenBao
    USG->>OB: Request secrets/keys?
    OB-->>USG: EU jurisdiction. Requires EU court order.
    Note over OB: US legal process has no authority here
    end

    Note over USG: Without keys from OpenBao, extracted data is incomplete.
    Note over USG: Memory contents (if no TEE) contain only short-lived tokens that have expired.
      

4. eBPF Memory Access Detection

How the eBPF DaemonSet detects attempts to read protected process memory.

sequenceDiagram
    participant Att as Attacker (node access)
    participant Kernel as Linux Kernel
    participant eBPF as eBPF DaemonSet
    participant SC as Phantom Sidecar
    participant Alert as Alert Pipeline

    Att->>Kernel: ptrace(PTRACE_ATTACH, pid)
    Kernel->>eBPF: sys_ptrace hook fires
    eBPF->>eBPF: Check target PID against protected pod list
    eBPF-->>Alert: ALERT: ptrace on protected pod (pid, namespace, caller)

    Att->>Kernel: open("/proc/{pid}/mem")
    Kernel->>eBPF: sys_openat hook fires
    eBPF->>eBPF: Path matches /proc/*/mem for protected PID
    eBPF-->>Alert: ALERT: /proc/mem read attempt

    Att->>Kernel: process_vm_readv(pid, ...)
    Kernel->>eBPF: sys_process_vm_readv hook fires
    eBPF-->>Alert: ALERT: cross-process memory read

    Note over Alert: Alerts → SaaS dashboard + SIEM + PagerDuty
    Note over SC: Meanwhile: secrets are short-lived tokens (15-min TTL)
      

5. Bootstrap Token — Where Does the First Secret Come From?

The bootstrap token is the one secret that cannot come from OpenBao (because you need it to connect to OpenBao). It must be communicated out-of-band:

  1. Admin generates a time-limited token in OpenBao CLI: openbao token create -ttl=10m -use-limit=1 -policy=phantom-bootstrap
  2. Token is passed directly to Helm: helm install phantom --set openbao.bootstrapToken=hvs.xxx
  3. Phantom operator uses it once to register via Kubernetes auth method
  4. Token expires (10 min) and is never persisted in K8s

The token never touches etcd

It’s a Helm value passed as an environment variable to the operator pod, used in-memory, then discarded. Even if etcd is dumped during the 10-minute window, the token is a Helm release secret (encoded, not a K8s Secret). After first use, OpenBao invalidates it.

6. Key Transfer Flows

6a. Initial Provisioning

sequenceDiagram
    participant Pod as New Pod
    participant SC as Phantom Sidecar
    participant Cache as Sealed Cache (tmpfs)
    participant OB as EU OpenBao

    Pod->>SC: Container starts
    SC->>SC: Check hot cache (empty - first run)
    SC->>SC: Check sealed cache (empty - first run)
    SC->>OB: Auth with SA token + request secrets (mTLS)
    OB->>OB: Validate SA token via TokenReview
    OB->>OB: Check policy: namespace/SA → allowed paths
    OB-->>SC: Secrets + lease ID + TTL
    SC->>SC: Store in hot cache (in-memory, 5 min TTL)
    SC->>Cache: Encrypt with HKDF(cluster_key, pod_uid) → sealed cache
    SC->>Pod: Inject secrets (env vars / tmpfs / socket)
    Note over Pod: App starts. Secrets in process memory only.
      

6b. Secret Rotation

sequenceDiagram
    participant Admin as Admin / CI
    participant OB as EU OpenBao
    participant SC as Phantom Sidecar
    participant Pod as App Process

    Admin->>OB: Rotate secret (new version)
    Note over SC: Renewal loop runs every 4 min
    SC->>OB: Renew lease + check version
    OB-->>SC: New secret value + new lease
    SC->>SC: Update hot cache
    SC->>SC: Update sealed cache (re-encrypt)
    SC->>Pod: Signal secret change (SIGHUP or socket notification)
    Pod->>Pod: Reload config with new secret
    Note over Pod: Zero downtime. Old secret zeroed from memory.
      

6c. Node Restart / Pod Reschedule

sequenceDiagram
    participant K8s as K8s Scheduler
    participant Pod as Rescheduled Pod
    participant SC as Phantom Sidecar
    participant Cache as Sealed Cache (tmpfs)
    participant OB as EU OpenBao

    K8s->>Pod: Schedule pod on new node
    SC->>SC: Check hot cache (empty - new pod)
    SC->>Cache: Check sealed cache (empty - new pod, new tmpfs)
    SC->>OB: Auth with SA token (mTLS)
    alt OpenBao reachable
        OB-->>SC: Fresh secrets + new lease
        SC->>Pod: Inject secrets. App starts normally.
    else OpenBao unreachable (outage)
        SC->>SC: No cache, no OpenBao
        SC->>Pod: Block startup. Clear error: "Cannot reach OpenBao"
        Note over Pod: Pod stays in Init. No silent failure.
        Note over Pod: Existing pods on other nodes still serve from cache.
    end
      

Existing pods survive restarts

If a pod is restarted on the same node (container crash, OOM), the sealed cache on tmpfs may still exist (same pod UID). The sidecar decrypts the sealed cache and serves secrets immediately, then refreshes from OpenBao in the background. Only cross-node rescheduling requires a fresh fetch.

7. Break Glass — Webhook Disabled or Bypassed

What happens when the MutatingWebhookConfiguration is deleted, modified, or bypassed.

sequenceDiagram
    participant Att as Attacker / Admin
    participant API as K8s API Server
    participant eBPF as eBPF DaemonSet
    participant Op as Phantom Operator
    participant Alert as Alert Pipeline
    participant Pods as Existing Pods

    Att->>API: Delete MutatingWebhookConfiguration
    API->>API: Webhook removed

    par Detection (immediate)
        eBPF->>eBPF: Watch: webhook config changed
        eBPF-->>Alert: CRITICAL: Webhook deleted (who, when, kubectl context)
        Op->>Op: Reconciliation loop detects missing webhook
        Op->>API: Re-create MutatingWebhookConfiguration
        Note over Op: Webhook restored within seconds
    end

    Note over Pods: Existing pods UNAFFECTED (secrets already in memory)
    Note over API: New pods created during gap: deployed WITHOUT sidecar
    Note over API: After restore: new pods get sidecar again

    rect rgb(55, 30, 30)
    Note over Att,API: What the attacker gains
    Note right of Att: ❌ Cannot extract secrets from running pods (not in etcd)
    Note right of Att: ❌ Cannot access OpenBao (no valid SA token from outside)
    Note right of Att: ⚠️ New pods during gap run without protection
    Note right of Att: ⚠️ If also has node access: can read unprotected pod memory
    end
      

Defense-in-depth: webhook deletion is visible, recoverable, and limited

The operator’s reconciliation loop re-creates the webhook within seconds. The eBPF DaemonSet and Kubernetes audit logs record who deleted it and when. Even during the gap, existing pods retain their secrets and OpenBao remains inaccessible to the attacker. The window of exposure is new pods only, during a brief gap, with full audit trail.

8. MITM Attack Surfaces

Two critical network paths where man-in-the-middle attacks could compromise the system.

8a. EU OpenBao ↔ US Cluster (Cross-Jurisdiction)

sequenceDiagram
    participant SC as Phantom Sidecar (US)
    participant Net as Network Path
    participant MITM as Potential MITM
    participant OB as EU OpenBao

    rect rgb(55, 30, 30)
    Note over Net,MITM: Attack surface: internet/VPN between jurisdictions
    end

    SC->>Net: TLS ClientHello
    Note over SC,OB: Protection: mTLS with pinned certificates
    SC->>OB: Client cert (signed by Phantom CA) + SA token
    OB->>OB: Verify client cert chain
    OB->>OB: Verify SA token via TokenReview
    OB-->>SC: Secrets (encrypted in TLS tunnel)

    rect rgb(55, 30, 30)
    Note over MITM: MITM sees: encrypted traffic only
    Note over MITM: Cannot forge client cert (needs Phantom CA private key)
    Note over MITM: Cannot forge server cert (pinned in sidecar config)
    Note over MITM: Can: block traffic (DoS) → triggers grace period
    Note over MITM: Can: traffic analysis (volume, timing, frequency)
    end
      

8b. DaemonSet ↔ Sidecar (Intra-Cluster)

sequenceDiagram
    participant DS as eBPF DaemonSet
    participant Node as Node Kernel
    participant SC as Phantom Sidecar
    participant Pod as App Process

    Note over DS,Node: DaemonSet operates at kernel level (eBPF programs)
    Note over DS,SC: No network communication needed

    rect rgb(30, 45, 40)
    Note over DS,Pod: eBPF hooks are kernel-space, not network-based
    DS->>Node: Attach eBPF programs to syscall tracepoints
    Node->>DS: Events: ptrace, /proc/mem reads, process_vm_readv
    Note over DS: No MITM possible — eBPF is in-kernel, not over network
    end

    rect rgb(30, 45, 40)
    Note over SC,Pod: Sidecar ↔ App is localhost (same pod network namespace)
    SC->>Pod: Secrets via env vars (set before process start)
    SC->>Pod: Or: secrets via Unix domain socket (filesystem, not network)
    SC->>Pod: Or: secrets via tmpfs mount (shared volume)
    Note over SC,Pod: No MITM possible — same pod, no network traversal
    end

    rect rgb(55, 30, 30)
    Note over Node: Remaining risk: compromised node kernel
    Note over Node: If attacker has root on node: can intercept eBPF, read tmpfs
    Note over Node: Mitigation: TEE (optional) or eBPF tamper detection
    end
      

MITM surface is narrow

Cross-jurisdiction (EU↔US): mTLS with pinned certificates. Attacker can DoS but not intercept. Intra-cluster (DaemonSet↔Sidecar): No network path exists to MITM — eBPF is kernel-space, secrets are injected via env vars/socket/tmpfs within the same pod. The only real attack is a compromised node kernel, mitigated by optional TEE.

Technical Feasibility

Phantom Complexity Breakdown

  • Mutating webhook: well-understood pattern, excellent Go libraries. 2-3 weeks for a senior Go engineer.
  • OpenBao integration (secret fetch, caching, renewal): 3-4 weeks. Three-tier cache adds complexity but is well-scoped.
  • Sidecar injection with mesh awareness: 4-6 weeks. Compatibility matrix (Istio, Linkerd, OTel, Dapr, GKE FUSE) is the time sink.
  • Circuit breaker + operator lifecycle: 2-3 weeks.
  • Cross-provider testing matrix: 4-6 weeks. The hidden cost — testing on GKE Standard, GKE Autopilot, EKS (EC2 + Fargate), and AKS.
  • Attestation (SEV-SNP/TDX): 6-8 weeks. Requires specialized knowledge.
  • Total: ~5-7 months for production-ready Phantom with attestation.

What Needs More Research

  1. Nitro Enclaves integration — fundamentally different from SEV-SNP/TDX. Needs PoC before committing.
  2. eBPF memory-access monitoring — what can eBPF detect that’s actionable? Detect-and-alert vs. detect-and-block?

Key Technical Limitations

What Phantom Cannot Do

  1. Cannot protect data processed in cleartext. Once the app decrypts a secret, data exists in cleartext in application memory. TEE mitigates this but isn’t universal.
  2. Cannot protect against a compromised application. If the application is malicious (supply chain attack), it has legitimate access to decrypted secrets.
  3. Cannot protect Kubernetes metadata. Pod names, labels, annotations, network policies — all visible to the cloud provider.
  4. Cannot protect against hardware-level attacks on TEEs. AMD SEV-SNP and Intel TDX have had side-channel vulnerabilities (CacheWarp, speculative execution).
  5. Cannot enforce key sovereignty after key release. Once a secret is released into the sidecar’s memory, it’s in the cloud provider’s infrastructure.
  6. Cannot protect against legal coercion of the customer. This product protects against US extraterritorial reach, not all legal compulsion.

Protection Model Breakdown

ScenarioProtected?Why
Cloud provider dumps etcdYesSecrets are never in etcd
Cloud provider reads node memory (no TEE)NoSecrets in cleartext in process memory
Cloud provider reads node memory (with TEE)Yes (probably)TEE encrypts memory, but side-channel attacks exist
CLOUD Act subpoena for cloud providerYesProvider has no keys or plaintext to hand over
Compromised application exfiltrates secretsNoApp has legitimate access to decrypted secrets
OpenBao in EU is compromisedNoAll secrets exposed at the source
MITM on OpenBao connection (no TEE)PartialTLS protects transit, but endpoint isn’t verified without attestation
Kubernetes API server audit logsPartialPod specs logged (env var names, not values if using socket refs)
Node-level debugger / ptraceNo (without TEE)Standard OS access allows memory inspection

Attacks NOT Defended Against

  • Supply chain attacks on application container images
  • Side-channel attacks on TEE implementations (timing, power analysis, cache-based)
  • Social engineering of personnel with access to OpenBao
  • Insider threats from the customer’s own team
  • Network-level DDoS preventing connectivity to OpenBao
  • Container escape followed by host memory access (without TEE)
  • Coerced firmware updates on TEE hardware by cloud provider at government request

Cross-Provider Compatibility

Documented Provider Differences: Exceptionally Thorough

The level of detail on GKE private cluster firewall rules, AKS Admissions Enforcer behavior, EKS Fargate limitations, and marketplace packaging constraints is production-grade knowledge.

Phantom Cross-Provider Status

Phantom’s core architecture (webhook + sidecar + external secrets) works across all three providers with provider-specific code paths for networking, identity, and attestation.

Undocumented Issues to Address

  1. GKE Workload Identity Federation — default SA token behavior changes may affect sealed cache key derivation.
  2. EKS Pod Identity — sidecar’s OpenBao auth must support both traditional K8s auth and provider-specific identity federation.
  3. AKS Node Auto-Provisioning (Karpenter) — eBPF programs must handle different kernel versions within the same cluster.
  4. GKE Gateway API migration — network policies may need to understand Gateway API resources.
  5. EKS Access Entries — changes who can interact with the API server and bypass webhooks.
  6. Multi-tenant GKE clusters (GKE Enterprise) — fleet-level policies can override per-cluster webhook configurations.
  7. ARM / Graviton nodes — eBPF programs, sidecar images, and crypto must be multi-arch.
  8. Windows node pools — current architecture is Linux-only. Should be explicitly documented as unsupported.
  9. Spot / preemptible node eviction — sidecar must handle SIGTERM gracefully and clean up sealed cache.
  10. Network policies (Calico/Cilium) — sidecar needs explicit NetworkPolicy rules to reach OpenBao.

Scalability Analysis

Pod Scale Assessment

Component100 Pods1,000 Pods10,000 Pods
WebhookTrivialFineNeeds horizontal scaling or namespace sharding
Sidecar (per-pod)~5 GB (50MB each)~50 GB~500 GB — significant
OpenBao connections100 concurrent1,000 (within HA capacity)10,000 — pooling mandatory
eBPF programsNegligibleModerate (per-node)Same as 1K if node count is stable
OperatorSingle replicaSingle + leader electionMay need sharded reconciliation

OpenBao — Biggest Scalability Risk

OpenBao as Single External Dependency — Bottleneck Risk

10,000 pod restarts during rolling deployment = 10,000 OpenBao requests in a short window. With 3 secrets/pod at 5-min TTL: ~100 req/s steady-state. A 3-node HA cluster handles this, but deployment bursts could saturate it.

Missing Mitigations

  • Request coalescing — if 50 pods request the same secret simultaneously, OpenBao should be hit once, not 50 times.
  • Batch secret fetch — 3 secrets in one API call instead of 3 sequential calls reduces connection overhead 3x.
  • Staggered renewal — add jitter to TTL to spread renewal load.

eBPF Overhead at Scale

At 100 pods per node, a memory-access tracepoint on sys_read/sys_write could fire millions of times per second. Even incrementing a counter adds 50-200ns per syscall.

Recommendation

eBPF monitoring should be opt-in per namespace, not cluster-wide. The attestation + secret injection provides sufficient security without continuous syscall monitoring.

Sidecar Resource Overhead

30-50 MB
Memory per sidecar
500 GB
Reserved @ 10K pods
~50 ms
CPU burst on secret fetch
5-10 MB
Rust sidecar alternative

Tech Stack

Go — Right Choice for Phantom

Go is Correct For

  • Webhook server (first-class controller-runtime support)
  • Operator/controller (standard K8s operator pattern)
  • Sidecar proxy (network I/O, gRPC)

Consider Rust For

  • Sidecar if memory footprint becomes a scaling issue (5-10MB vs 30-50MB)
  • Cryptographic hot paths (hardware acceleration)

OpenBao vs Alternatives

AlternativeProsCons
OpenBao (chosen)Open-source fork, no BSL risk, proven at scale, transit engineYounger project, smaller plugin ecosystem
HashiCorp VaultBattle-tested, extensive ecosystemBSL license — legal risk for commercial product
CyberArk ConjurEnterprise pedigree, good K8s integrationLess flexible API, proprietary core
Cloud KMS (AWS/GCP/Azure)Native integration, managedDefeats the entire purpose
SOPS + Age/KMSSimple, file-basedNo dynamic secrets, no lease management
InfisicalModern UI, good K8s integrationLess proven at scale, SaaS-first

OpenBao is the Correct Choice

The only option that is: (a) open-source with permissive license, (b) proven at scale, (c) supports transit encryption + dynamic secrets + PKI, and (d) can be self-hosted in the customer’s jurisdiction.

eBPF vs Alternatives for Monitoring

AlternativeProsCons
eBPF (chosen)Kernel-level visibility, low overhead, no app changesKernel version dependencies, CO-RE complexity
ptrace-basedWorks everywhere10-100x performance overhead
seccomp-bpfBlocks syscalls, no overhead for allowed callsBinary allow/deny only, no monitoring
Falco (eBPF-based)Mature, rule-based, good K8s integrationAdditional dependency, overlap
auditdWell-understood kernel audit subsystemHigh overhead at scale, log-based

Recommendation

Make eBPF monitoring a Phase 2 feature, not part of the MVP. If customers demand runtime visibility, integrate with Falco rather than building a custom monitoring framework.

MVP Scope — “Secrets That Never Touch etcd”

Phantom Core Components

  1. Mutating admission webhook that injects phantom-proxy sidecar into labeled pods
  2. Sidecar that fetches secrets from external OpenBao and exposes them via: environment variables, Unix domain socket, and mounted tmpfs file
  3. In-memory cache with TTL-based renewal (skip sealed local cache for MVP)
  4. Pre-flight connectivity check (Job-based, writes to ConfigMap)
  5. Helm chart designed for EKS add-on constraints (no hooks, no lookup)
  6. Single-provider launch: GKE Standard (simplest webhook behavior)

What to Cut from v1

FeatureCut?Reason
TEE attestation (SEV-SNP/TDX)Cut from MVPCan be added as policy upgrade; injection works without it
Sealed local cache (tier 2)Cut from MVPIn-memory cache + grace period is sufficient initially
eBPF monitoringCut from MVPDefense-in-depth, not core value proposition
gVisor sandboxCut from MVPTEE provides better guarantees anyway
Circuit breakerIncludeCritical for production safety
Canary injectionCut from MVPNice-to-have, not launch-critical
Multi-provider supportGKE firstEKS in v1.1, AKS in v1.2

Critical Path to First Deployable Version

Week 1-2

Project scaffolding, CI/CD, Helm chart skeleton

Week 3-5

Mutating webhook (injection, namespace selection, fail-closed)

Week 5-7

Sidecar (OpenBao auth, secret fetch, env var injection, socket API)

Week 7-8

In-memory cache with TTL renewal, grace period

Week 8-9

Pre-flight connectivity check Job

Week 9-10

Circuit breaker implementation

Week 10-12

Integration testing on GKE Standard (public + private clusters)

Week 12-14

Documentation, Helm chart polish, beta program with 2-3 design partners

Week 14-16

GKE Marketplace submission, public launch

Timeline: ~4 months to MVP with 3-4 engineers

This assumes full-time focus and no TEE/eBPF work.

Comparison to Alternatives

Alt 1: Full Confidential Computing (Just Use TEEs)

AspectPhantom ApproachFull CC Approach
Secret protectionExternal OpenBao + attestationHardware memory encryption
ComplexityCustom webhook + sidecarNode pool config only
Cross-providerWorks everywhere (with caveats)GKE/AKS only; EKS different model
CostSoftware license + OpenBao ops6-10% perf overhead + higher instance cost
Protection scopeSecrets onlyAll memory, all computation

Trade-off: Full CC is simpler but more expensive and less available. Phantom works on standard VMs and adds CC as optional enhancement — correct positioning for reaching the broadest market.

Alt 2: Sovereign Cloud (Use EU Providers)

AspectPhantomSovereign Cloud
US access riskEliminated by cryptoEliminated by jurisdiction
Cloud maturityAWS/GCP/Azure (best-in-class)EU providers lag in services and scale
Migration effortInstall operator + OpenBaoFull infrastructure migration
Multi-region/globalYes (US clouds have global regions)Limited to EU regions

Alt 3: Client-Side Encryption Libraries

AspectPhantomClient-Side Libraries
Application changesZero (transparent)Requires code changes in every app
Language supportAny (sidecar-based)One library per language
CoverageAll pods automaticallyOnly integrated applications
Adoption frictionLow (label a namespace)High (modify every application)

Alt 4: VPN to On-Premises HSM

Technically works but adds significant operational complexity (VPN management, on-premises infrastructure, latency). Phantom’s managed OpenBao is operationally simpler. However, for customers with existing on-prem HSMs (banks, defense), this should be a supported deployment mode.

Technical Risks

High-Impact Risks

1. OpenBao Project Viability

Smaller contributor base than Vault. If the project loses momentum, you’re building on an under-maintained foundation. Mitigation: Abstract behind an interface; monitor activity; support upstream Vault as alternative backend.

2. TEE Vulnerability Disclosure

A major vulnerability in AMD SEV-SNP or Intel TDX (like CacheWarp) would undermine the attestation story. Mitigation: Position TEE as defense-in-depth, not sole guarantee. Maintain rapid response capability for advisories.

3. Cloud Provider API Changes

The three providers frequently change managed service behavior (AKS default egress removal, Kata CC sunset, GKE Autopilot restrictions). Mitigation: Aggressive compatibility testing in CI, pre-flight checks, and provider DevRel partnerships.

4. Webhook Stability Under Load

A crashed webhook will hold every pod Pending (fail-closed). Operationally catastrophic. Mitigation: Circuit breaker + bypass escape hatch. Add chaos testing to CI.

5. Secret Caching Correctness

Three-tier cache introduces eventual consistency. A rotated secret may be stale for up to 20 minutes — significant during breach response. Mitigation: Implement a “force rotation” signal from operator to sidecar that bypasses the cache.

Dependency Risks

DependencyRiskSeverity
OpenBaoProject momentum, fork sustainabilityHigh
AMD SEV-SNP / Intel TDXHardware vulnerabilities, firmware updatesMedium
controller-runtime (Go)Well-maintained by K8s SIGLow
cilium/ebpf (Go)Well-maintained, backed by Isovalent/CiscoLow
SPIFFE/SPIRECNCF graduated, active developmentLow
go-sev-guestSmaller project, Google-maintainedMedium

Architecture Improvements

Concrete changes that raise the architecture score to 8.5/10.

A1. DaemonSet Mode — Per-Node Secret Proxy

Offer a DaemonSet mode where one Phantom agent per node handles secrets for all pods via Unix domain socket.

┌──────────────────────────────────────┐
│  Node                                      │
│  [Pod A] [Pod B] [Pod C] [Pod D]           │
│     │       │       │       │              │
│     └───────┴───┬───────┘  UDS         │
│                │                           │
│       [Phantom DaemonSet, ~80MB]             │
│       [    Shared Cache         ]             │
│                │ mTLS                      │
└────────────────┴─────────────────────┘
                 │
          [  OpenBao EU  ]
AspectSidecar ModeDaemonSet Mode
Memory (100 nodes, 10K pods)~500 GB~8 GB
Pod isolationFull (per-pod process)Shared (node-level)
Blast radius of crash1 podAll pods on node
Secret cache deduplicationNo (same secret cached N times)Yes (one copy per node)
Best forHigh-security, <500 podsHigh-density, >1000 pods
A2. SecretProvider Interface Abstraction

Abstract the secrets backend behind a SecretProvider interface from day one to reduce OpenBao project risk and widen addressable market.

type SecretProvider interface {
    GetSecret(ctx context.Context, path string, identity PodIdentity) (*Secret, error)
    WatchSecret(ctx context.Context, path string) (<-chan SecretEvent, error)
    RevokeLeases(ctx context.Context, identity PodIdentity) error
    HealthCheck(ctx context.Context) error
}
ProviderPrioritySovereignty
openbaov1.0 (launch)Full (EU-hosted)
vaultv1.0 (launch)Full (customer-controlled)
aws-secrets-managerv1.2None (US jurisdiction)
gcp-secret-managerv1.2None (US jurisdiction)
local-filev1.0 (launch)N/A (dev/testing)
A3. Deterministic Compatibility Database

Replace the AI compatibility engine with a CI-verified YAML database of known Helm charts with tested injection results.

# compatibility-db/charts/bitnami/postgresql/16.4.0.yaml
chart:
  repository: bitnami
  name: postgresql
  versions_tested: ["16.4.0", "16.3.x", "15.x"]

injection:
  status: "compatible"    # compatible | partial | incompatible | untested
  mode: "sidecar"         # sidecar | daemonset | both

testing:
  method: "automated"
  platform: "gke-standard"
  k8s_versions: ["1.29", "1.30", "1.31"]

Advantages over AI: deterministic (same input → same output), auditable (CISOs can review), reproducible (CI link proves test), community-driven.

A4. Webhook-Free Mode via CSI Secret Store Driver

Webhook Mode (default)

  • Fully transparent
  • No app changes required
  • Env + file injection
  • Per-process isolation

CSI Mode (alternative)

  • No webhook dependency
  • Standard K8s pattern
  • Works on restricted platforms
  • Requires pod spec changes, file-based only
A5. Multi-Tenancy Architecture for Managed SaaS

Each customer gets their own OpenBao namespace with separate encryption keys, isolated metrics, and network-level separation.

LayerIsolation Mechanism
SecretsSeparate OpenBao namespace (/tenant-id/*), separate policies
Encryption keysPer-tenant unseal keys, separate HSM slots
AuthenticationPer-tenant Kubernetes auth mounts
NetworkOpenBao policy: tenant A’s token cannot read /tenant-b/*
AuditPer-tenant audit log bucket, customer-exportable
Metricstenant_id label on all metrics, per-tenant dashboards
BillingPer-tenant secret access counters, usage tracking
A6. Offline / Air-Gapped Deployment Mode

Fully self-contained deployment for government and defense customers.

ComponentOnline ModeAir-Gapped Mode
OpenBaoCloudCondom-managed SaaSCustomer-managed, on-prem
Unseal mechanismCloudCondom HSMCustomer’s on-prem HSM (PKCS#11)
Container imagesPublic registryCustomer’s Harbor/registry mirror
Compatibility DBAuto-updated from CDNManual update via USB/media transfer
UpdatesAutomated via HelmManual via air-gap transfer process
A7. StatefulSet with Persistent Secrets

Tie the sealed cache to a PersistentVolumeClaim and use the StatefulSet’s stable pod identity as part of the cache key derivation.

# Cache key derivation for StatefulSets
cache_key = HKDF-SHA256(
    ikm:  service_account_token,
    salt: cluster_hmac_key,
    info: "phantom-statefulset:" + statefulset_name + ":" + pod_ordinal
)

Score Impact Summary

ImprovementWeakness AddressedEffortScore Impact
A1. DaemonSet modeSidecar scalability2-3 weeks+0.3
A2. SecretProvider interfaceOpenBao lock-in risk1 week + ongoing+0.2
A3. Compatibility DBAI engine vaporware1-2 weeks+0.2
A4. CSI Secret Store modeWebhook-only limitation2 weeks+0.1
A5. Multi-tenancySaaS architecture gapDesign only+0.1
A6. Air-gapped modeGov/defense market gap1 week+0.05
A7. StatefulSet supportCache correctness gap1 week+0.05
Total+1.0

Verdict

Architecture Score: 8.5 / 10

Well-architected system with correct trust model, realistic provider knowledge, and honest assessment of limitations. Phantom is technically sound and buildable with 3-4 engineers in ~4 months.

Key Strengths

  1. Trust model is correct. “Webhook = UX, crypto = security boundary” is the right architecture.
  2. Exceptional provider-specific knowledge. Production-grade documentation on GKE/EKS/AKS quirks.
  3. Operationally mature design. Circuit breaker, canary injection, three-tier caching, pre-flight checks.
  4. Correct MVP prioritization. Starting with Phantom on managed K8s is the right call.

Key Weaknesses

  1. OpenBao SPOF risk inadequately addressed at scale (10K-pod burst scenario).
  2. EKS confidential computing story is weak. Nitro Enclaves are fundamentally different.
  3. Sidecar resource overhead at scale not addressed. 500GB reserved memory at 10K pods.
  4. No testing strategy. Missing fuzzing, property-based testing, formal verification for crypto paths.

Recommendations

  1. Ship Phantom alone. Nothing else in v1. Phantom on GKE Standard is the MVP. Add EKS in v1.1, AKS in v1.2.
  2. Add a DaemonSet mode as alternative to per-pod sidecars for customers with 1,000+ pods.
  3. Implement request coalescing and batch secret fetching to mitigate OpenBao bottleneck risk.
  4. Abstract the OpenBao dependency behind a SecretProvider interface from day one.
  5. Invest in a testing strategy proportional to security claims: fuzzing, property-based testing, integration tests on all providers.
  6. Be explicit about what’s not protected. Build honest security documentation that CISOs will trust.