feat(ssh): add identityContent for vault-resolved private keys (#522) #23

Merged
stack72 merged 2 commits from worktree-522 into main 2026-06-01 21:30:11 +00:00
Owner

Summary

Add an optional identityContent field to @swamp/ssh transport auth that accepts inline PEM private key content (e.g. from vault.get()), enabling vault-stored SSH keys to work with @swamp/ssh without external scripting or pre-loaded agents.

  • When identityContent is set, the extension writes the PEM body to a temp file (mode 0600), passes it as -i <tmpfile>, and removes the file in a finally block
  • Mutually exclusive with identityFile — setting both is a validation error, setting neither uses SSH agent (unchanged behavior)
  • Marked z.meta({ sensitive: true }) so swamp redacts it from logs/reports
  • Applied to exec, script, copy, and open methods; forward excluded (operates via ControlMaster, never passes -i)

Why this is correct

The problem: vault.get() resolves to PEM key content, not a file path. identityFile expects a path. Users storing SSH keys in vaults (1Password, HashiCorp Vault, AWS SM) had to either pre-load ssh-agent (fragile in CI) or write a custom extension model duplicating @swamp/ssh's execution logic.

The fix: A new field that accepts the content directly. The temp-file materialization happens in the operations layer (above the runner), so the runner never sees identityContent — it only sees the resolved identityFile path. Clean layering, no runner changes needed.

Key design decisions:

  • Mutual-exclusion .refine() lives on TransportSchema (the discriminated union), not on SshTransportSchema (the member) — putting it on the member would break Zod's discriminated union introspection
  • TransportOverrideSchema also validates mutual exclusion so per-host overrides can't set both
  • Temp files use Deno.makeTempFile with a swamp-ssh-key- prefix, cleaned up in finally even when throwOnHostFailures fires

Changes

File What
schemas.ts identityContent on SshTransportSchema + TransportOverrideSchema, .meta({ sensitive: true }), mutual-exclusion refine
hosts.ts identityContent in mergeSshTransport via pick()
operations.ts materializeTempKeys / cleanupTempKeys helpers, applied to exec/script/copy/open with finally cleanup
ssh.ts Version 2026.06.01.2, upgrade entry
manifest.yaml Version 2026.06.01.2, updated auth modes description
README.md New transport field, new auth subsection with vault example
schemas_test.ts 6 new tests: validation, mutual exclusion, sensitive meta
hosts_test.ts 3 new tests: merge propagation
ssh_test.ts 7 new tests: exec/script/copy/open materialization, mixed fleet, per-host override, failure cleanup

Test plan

  • 16 new unit + integration tests covering schema validation, host merge, all 4 methods (exec/script/copy/open), mixed fleet, per-host override, and failure-path cleanup
  • 212/212 existing tests pass (excluding 3 pre-existing runner_spawn_test.ts failures unrelated to this change)
  • deno check, deno lint, deno fmt, deno install --frozen all clean
  • End-to-end verification against a real openssh-server in Docker: exec (echo hello-from-identityContent → exit 0), script (VAL=42; echo value=$VAL → exit 0), copy (scp + cat round-trip → content matches), temp file cleanup (no swamp-ssh-key-* files remain)

Closes #522

Co-Authored-By: Blake Irvin blakeirvin@me.com

## Summary Add an optional `identityContent` field to `@swamp/ssh` transport auth that accepts inline PEM private key content (e.g. from `vault.get()`), enabling vault-stored SSH keys to work with `@swamp/ssh` without external scripting or pre-loaded agents. - When `identityContent` is set, the extension writes the PEM body to a temp file (mode `0600`), passes it as `-i <tmpfile>`, and removes the file in a `finally` block - Mutually exclusive with `identityFile` — setting both is a validation error, setting neither uses SSH agent (unchanged behavior) - Marked `z.meta({ sensitive: true })` so swamp redacts it from logs/reports - Applied to `exec`, `script`, `copy`, and `open` methods; `forward` excluded (operates via ControlMaster, never passes `-i`) ## Why this is correct **The problem:** `vault.get()` resolves to PEM key *content*, not a file path. `identityFile` expects a path. Users storing SSH keys in vaults (1Password, HashiCorp Vault, AWS SM) had to either pre-load `ssh-agent` (fragile in CI) or write a custom extension model duplicating `@swamp/ssh`'s execution logic. **The fix:** A new field that accepts the content directly. The temp-file materialization happens in the operations layer (above the runner), so the runner never sees `identityContent` — it only sees the resolved `identityFile` path. Clean layering, no runner changes needed. **Key design decisions:** - Mutual-exclusion `.refine()` lives on `TransportSchema` (the discriminated union), not on `SshTransportSchema` (the member) — putting it on the member would break Zod's discriminated union introspection - `TransportOverrideSchema` also validates mutual exclusion so per-host overrides can't set both - Temp files use `Deno.makeTempFile` with a `swamp-ssh-key-` prefix, cleaned up in `finally` even when `throwOnHostFailures` fires ## Changes | File | What | |------|------| | `schemas.ts` | `identityContent` on `SshTransportSchema` + `TransportOverrideSchema`, `.meta({ sensitive: true })`, mutual-exclusion refine | | `hosts.ts` | `identityContent` in `mergeSshTransport` via `pick()` | | `operations.ts` | `materializeTempKeys` / `cleanupTempKeys` helpers, applied to exec/script/copy/open with `finally` cleanup | | `ssh.ts` | Version `2026.06.01.2`, upgrade entry | | `manifest.yaml` | Version `2026.06.01.2`, updated auth modes description | | `README.md` | New transport field, new auth subsection with vault example | | `schemas_test.ts` | 6 new tests: validation, mutual exclusion, sensitive meta | | `hosts_test.ts` | 3 new tests: merge propagation | | `ssh_test.ts` | 7 new tests: exec/script/copy/open materialization, mixed fleet, per-host override, failure cleanup | ## Test plan - [x] 16 new unit + integration tests covering schema validation, host merge, all 4 methods (exec/script/copy/open), mixed fleet, per-host override, and failure-path cleanup - [x] 212/212 existing tests pass (excluding 3 pre-existing `runner_spawn_test.ts` failures unrelated to this change) - [x] `deno check`, `deno lint`, `deno fmt`, `deno install --frozen` all clean - [x] End-to-end verification against a real openssh-server in Docker: exec (`echo hello-from-identityContent` → exit 0), script (`VAL=42; echo value=$VAL` → exit 0), copy (scp + cat round-trip → content matches), temp file cleanup (no `swamp-ssh-key-*` files remain) Closes #522 Co-Authored-By: Blake Irvin <blakeirvin@me.com>
feat(ssh): add identityContent for vault-resolved private keys (#522)
Some checks failed
CI / workflows/s3-bootstrap - lockfile up to date (pull_request) Has been skipped
CI / workflows/gcs-bootstrap - lockfile up to date (pull_request) Has been skipped
CI / ssh - check (pull_request) Successful in 59s
CI / Dependency Audit (pull_request) Successful in 4m2s
CI / cve/dirtyfrag - check (pull_request) Has been skipped
CI / cve/dirtyfrag - lint (pull_request) Has been skipped
CI / cve/dirtyfrag - fmt (pull_request) Has been skipped
CI / cve/dirtyfrag - test (pull_request) Has been skipped
CI / ssh - fmt (pull_request) Successful in 1m5s
CI / cve/mini-shai-hulud - fmt (pull_request) Has been skipped
CI / ssh - lint (pull_request) Successful in 1m5s
CI / cve/mini-shai-hulud - check (pull_request) Has been skipped
CI / cve/mini-shai-hulud - lint (pull_request) Has been skipped
CI / cve/mini-shai-hulud - test (pull_request) Has been skipped
CI / cve/dirtyfrag - lockfile up to date (pull_request) Has been skipped
CI / cve/mini-shai-hulud - lockfile up to date (pull_request) Has been skipped
CI / model/hetzner-cloud - check (pull_request) Has been skipped
CI / ssh - test (pull_request) Successful in 1m14s
CI / model/digitalocean - check (pull_request) Has been skipped
CI / gcp models - lockfiles up to date (pull_request) Has been skipped
CI / cloudflare models - lockfiles up to date (pull_request) Has been skipped
CI / cloudflare models - sample check (pull_request) Has been skipped
CI / codegen - check (pull_request) Has been skipped
CI / ssh - lockfile up to date (pull_request) Successful in 1m31s
CI / codegen - fmt (pull_request) Has been skipped
CI / codegen - lint (pull_request) Has been skipped
CI / CI Security Review (pull_request) Has been skipped
CI / Adversarial Code Review (pull_request) Has been skipped
CI / Claude Code Review (pull_request) Failing after 9m13s
CI / Merge Gate (pull_request) Failing after 28s
cbabefc8f8
Add an optional `identityContent` field to `@swamp/ssh` transport auth
that accepts inline PEM private key content (e.g. from `vault.get()`).
When set, the extension writes the PEM body to a temporary file (mode
0600), passes it as `-i <tmpfile>` for the SSH session, and removes the
file in a `finally` block when the method completes.

This enables users who store SSH private keys in a vault (1Password,
HashiCorp Vault, AWS Secrets Manager, etc.) to use `@swamp/ssh` directly
without writing the key to disk externally or pre-loading an ssh-agent —
both of which break the self-contained model principle.

`identityFile` and `identityContent` are mutually exclusive. Setting
neither is fine (SSH agent auth). The field is marked with
`z.meta({ sensitive: true })` so swamp redacts it from logs and reports.

Temp-key materialization is applied to exec, script, copy, and open.
The forward method is excluded — it operates via an already-open
ControlMaster socket and never passes `-i`.

Verified end-to-end against a real openssh-server in Docker: exec,
script, and copy all succeed with a vault-style PEM key, and temp files
are cleaned up after each method returns.

Co-Authored-By: Blake Irvin <blakeirvin@me.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

Code Review

Blocking Issues

  1. Temp key files orphaned on partial materializeTempKeys failure (ssh/extensions/models/_lib/operations.ts:310-314)

    tempPaths.set(host.name, tmpFile) is called after Deno.writeTextFile. If writeTextFile throws (e.g. disk full), the created temp file is never registered in tempPaths, so cleanupTempKeys will never remove it. If the failure occurs on host N, temp files for hosts 1 through N-1 are already in tempPaths but are also never cleaned up: materializeTempKeys throws before returning, the destructuring assignment in the caller never completes, and the try/finally block that calls cleanupTempKeys is never entered. All orphaned files may contain private key material.

    Fix: move tempPaths.set(host.name, tmpFile) to before Deno.writeTextFile, and add a catch block inside materializeTempKeys that calls cleanupTempKeys(tempPaths) then re-throws on failure.

Suggestions

  1. sanitizeResources: false without an explanatory comment (ssh/extensions/models/ssh_test.ts:387,421,452,486,515,578,630)

    CLAUDE.md requires a comment alongside sanitizeResources: false. All seven identityContent tests use the flag without explanation. A brief inline comment would satisfy the project rule.


Everything else looks correct:

  • identityContent marked .meta({ sensitive: true }) so swamp redacts it from logs.
  • Temp file written with mode: 0o600.
  • identityContent cleared to undefined in the patched host before reaching argv builders -- no key material in recorded argv.
  • cleanupTempKeys called in finally in all four methods that materialize keys (exec, script, copy, open). check, close, and forward correctly skip materialization.
  • Schema-level mutual exclusivity enforcement for identityFile / identityContent in both TransportSchema and TransportOverrideSchema.
  • No any types in hand-written code; all named exports; no default exports.
  • Model files under model/gcp/ are auto-generated output (codegen/ changes are present in the same commit).
  • Test coverage is thorough: materialization, content verification, cleanup on success and failure, mixed-fleet scenario, per-host override, and the open ControlMaster path.
## Code Review ### Blocking Issues 1. **Temp key files orphaned on partial `materializeTempKeys` failure** (`ssh/extensions/models/_lib/operations.ts:310-314`) `tempPaths.set(host.name, tmpFile)` is called **after** `Deno.writeTextFile`. If `writeTextFile` throws (e.g. disk full), the created temp file is never registered in `tempPaths`, so `cleanupTempKeys` will never remove it. If the failure occurs on host N, temp files for hosts 1 through N-1 are already in `tempPaths` but are also never cleaned up: `materializeTempKeys` throws before returning, the destructuring assignment in the caller never completes, and the `try/finally` block that calls `cleanupTempKeys` is never entered. All orphaned files may contain private key material. Fix: move `tempPaths.set(host.name, tmpFile)` to **before** `Deno.writeTextFile`, and add a catch block inside `materializeTempKeys` that calls `cleanupTempKeys(tempPaths)` then re-throws on failure. ### Suggestions 1. **`sanitizeResources: false` without an explanatory comment** (`ssh/extensions/models/ssh_test.ts:387,421,452,486,515,578,630`) CLAUDE.md requires a comment alongside `sanitizeResources: false`. All seven `identityContent` tests use the flag without explanation. A brief inline comment would satisfy the project rule. --- Everything else looks correct: - `identityContent` marked `.meta({ sensitive: true })` so swamp redacts it from logs. - Temp file written with `mode: 0o600`. - `identityContent` cleared to `undefined` in the patched host before reaching argv builders -- no key material in recorded `argv`. - `cleanupTempKeys` called in `finally` in all four methods that materialize keys (exec, script, copy, open). `check`, `close`, and `forward` correctly skip materialization. - Schema-level mutual exclusivity enforcement for `identityFile` / `identityContent` in both `TransportSchema` and `TransportOverrideSchema`. - No `any` types in hand-written code; all named exports; no default exports. - Model files under `model/gcp/` are auto-generated output (codegen/ changes are present in the same commit). - Test coverage is thorough: materialization, content verification, cleanup on success and failure, mixed-fleet scenario, per-host override, and the `open` ControlMaster path.
fix(ssh): clean up temp key on write failure, add sanitizeResources comments
Some checks failed
CI / cve/mini-shai-hulud - check (pull_request) Has been skipped
CI / ssh - lint (pull_request) Successful in 1m6s
CI / ssh - fmt (pull_request) Successful in 1m8s
CI / ssh - lockfile up to date (pull_request) Successful in 1m6s
CI / cve/mini-shai-hulud - fmt (pull_request) Has been skipped
CI / cve/mini-shai-hulud - lint (pull_request) Has been skipped
CI / ssh - check (pull_request) Successful in 1m12s
CI / cve/mini-shai-hulud - test (pull_request) Has been skipped
CI / Dependency Audit (pull_request) Successful in 4m5s
CI / cve/dirtyfrag - lockfile up to date (pull_request) Has been skipped
CI / cve/mini-shai-hulud - lockfile up to date (pull_request) Has been skipped
CI / ssh - test (pull_request) Successful in 1m18s
CI / model/digitalocean - check (pull_request) Has been skipped
CI / model/hetzner-cloud - check (pull_request) Has been skipped
CI / model/digitalocean - lockfile up to date (pull_request) Has been skipped
CI / model/hetzner-cloud - lockfile up to date (pull_request) Has been skipped
CI / aws models - sample check (pull_request) Has been skipped
CI / gcp models - sample check (pull_request) Has been skipped
CI / gcp models - lockfiles up to date (pull_request) Has been skipped
CI / cloudflare models - sample check (pull_request) Has been skipped
CI / aws models - lockfiles up to date (pull_request) Has been skipped
CI / cloudflare models - lockfiles up to date (pull_request) Has been skipped
CI / codegen - check (pull_request) Has been skipped
CI / codegen - fmt (pull_request) Has been skipped
CI / codegen - lint (pull_request) Has been skipped
CI / Adversarial Code Review (pull_request) Has been skipped
CI / Claude Code Review (pull_request) Failing after 4m7s
CI / Merge Gate (pull_request) Failing after 32s
CI / codegen - lockfile up to date (pull_request) Has been skipped
CI / CI Security Review (pull_request) Has been skipped
0d411b2c61
Address CI review findings:
- Register temp file path before writeTextFile so cleanup covers it if
  the write fails. Clean up all already-created temp files on error.
- Add explanatory comments on sanitizeResources: false in tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

Code Review

Blocking Issues

  1. GCP model .ts files changed without corresponding codegen/ changes.
    The CHANGED FILES list includes nine model/gcp/compute/extensions/models/*.ts files
    (backendservices.ts, forwardingrules.ts, globalforwardingrules.ts, instances.ts,
    regionbackendservices.ts, regionzones.ts, rollouts.ts, routers.ts, zones.ts) and
    model/gcp/vault/extensions/models/matters_exports.ts, alongside their manifests — but no
    codegen/ files appear in the CHANGED FILES list. Per CLAUDE.md, files under model/ are
    auto-generated and must never be hand-edited. Two legitimate explanations exist: (a) these are
    purely version/upgrade-entry bumps produced by the bump-versions script, in which case the
    .ts bodies contain only version and upgrade changes, or (b) these came from a codegen
    regeneration run, in which case the missing codegen/ diff is a problem. If the .ts changes
    contain anything beyond version/upgrades/manifest entries without a corresponding
    codegen/ change, this is a blocking CLAUDE.md violation. Verify that the GCP model .ts
    changes are limited to version/upgrade entries (consistent with a bump-versions run) before
    merging.

Suggestions

  1. ForwardSpecPattern allows out-of-range port numbers.
    schemas.ts:275: const ForwardSpecPattern = /^\d{1,5}:[^:]+:\d{1,5}$/ permits values up to
    99999 for both local and remote ports, which exceeds the maximum valid port of 65535. OpenSSH
    will reject the value at runtime, so this is not a security issue, but adding an explicit
    numeric range check (or tightening the regex to (?:[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5]))
    would surface the error at schema parse time with a clear message instead of failing deep in
    the SSH handshake.

  2. Mixed-fleet test assumes a specific request-ordering across parallel hosts.
    ssh_test.ts:563-584: the test inspects requests[0] for web-1 and requests[1] for web-2.
    Because the mock executor resolves synchronously (Promise.resolve(...)) the ordering is
    deterministic in practice, but the assumption is subtle and not documented. A comment or
    ordering assertion on requests.map(r => r....) keyed by the host address/argv would make the
    intent explicit and prevent accidental breakage if the executor is ever changed to be async.

## Code Review ### Blocking Issues 1. **GCP model `.ts` files changed without corresponding `codegen/` changes.** The CHANGED FILES list includes nine `model/gcp/compute/extensions/models/*.ts` files (`backendservices.ts`, `forwardingrules.ts`, `globalforwardingrules.ts`, `instances.ts`, `regionbackendservices.ts`, `regionzones.ts`, `rollouts.ts`, `routers.ts`, `zones.ts`) and `model/gcp/vault/extensions/models/matters_exports.ts`, alongside their manifests — but no `codegen/` files appear in the CHANGED FILES list. Per CLAUDE.md, files under `model/` are auto-generated and must never be hand-edited. Two legitimate explanations exist: (a) these are purely version/upgrade-entry bumps produced by the `bump-versions` script, in which case the `.ts` bodies contain only version and upgrade changes, or (b) these came from a codegen regeneration run, in which case the missing `codegen/` diff is a problem. If the `.ts` changes contain anything beyond `version`/`upgrades`/manifest entries without a corresponding `codegen/` change, this is a blocking CLAUDE.md violation. **Verify that the GCP model `.ts` changes are limited to version/upgrade entries (consistent with a `bump-versions` run) before merging.** --- ### Suggestions 1. **`ForwardSpecPattern` allows out-of-range port numbers.** `schemas.ts:275`: `const ForwardSpecPattern = /^\d{1,5}:[^:]+:\d{1,5}$/` permits values up to 99999 for both local and remote ports, which exceeds the maximum valid port of 65535. OpenSSH will reject the value at runtime, so this is not a security issue, but adding an explicit numeric range check (or tightening the regex to `(?:[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])`) would surface the error at schema parse time with a clear message instead of failing deep in the SSH handshake. 2. **Mixed-fleet test assumes a specific request-ordering across parallel hosts.** `ssh_test.ts:563-584`: the test inspects `requests[0]` for web-1 and `requests[1]` for web-2. Because the mock executor resolves synchronously (`Promise.resolve(...)`) the ordering is deterministic in practice, but the assumption is subtle and not documented. A comment or ordering assertion on `requests.map(r => r....)` keyed by the host address/argv would make the intent explicit and prevent accidental breakage if the executor is ever changed to be async.
stack72 deleted branch worktree-522 2026-06-01 21:30:11 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
swamp-club/swamp-extensions!23
No description provided.