feat(ssh): add identityContent for vault-resolved private keys (#522)

stack72 commented

2026-06-01 18:13:42 +00:00

Owner

Summary

Add an optional identityContent field to @swamp/ssh transport auth that accepts inline PEM private key content (e.g. from vault.get()), enabling vault-stored SSH keys to work with @swamp/ssh without external scripting or pre-loaded agents.

When identityContent is set, the extension writes the PEM body to a temp file (mode 0600), passes it as -i <tmpfile>, and removes the file in a finally block
Mutually exclusive with identityFile — setting both is a validation error, setting neither uses SSH agent (unchanged behavior)
Marked z.meta({ sensitive: true }) so swamp redacts it from logs/reports
Applied to exec, script, copy, and open methods; forward excluded (operates via ControlMaster, never passes -i)

Why this is correct

The problem: vault.get() resolves to PEM key content, not a file path. identityFile expects a path. Users storing SSH keys in vaults (1Password, HashiCorp Vault, AWS SM) had to either pre-load ssh-agent (fragile in CI) or write a custom extension model duplicating @swamp/ssh's execution logic.

The fix: A new field that accepts the content directly. The temp-file materialization happens in the operations layer (above the runner), so the runner never sees identityContent — it only sees the resolved identityFile path. Clean layering, no runner changes needed.

Key design decisions:

Mutual-exclusion .refine() lives on TransportSchema (the discriminated union), not on SshTransportSchema (the member) — putting it on the member would break Zod's discriminated union introspection
TransportOverrideSchema also validates mutual exclusion so per-host overrides can't set both
Temp files use Deno.makeTempFile with a swamp-ssh-key- prefix, cleaned up in finally even when throwOnHostFailures fires

Changes

File	What
`schemas.ts`	`identityContent` on `SshTransportSchema` + `TransportOverrideSchema`, `.meta({ sensitive: true })`, mutual-exclusion refine
`hosts.ts`	`identityContent` in `mergeSshTransport` via `pick()`
`operations.ts`	`materializeTempKeys` / `cleanupTempKeys` helpers, applied to exec/script/copy/open with `finally` cleanup
`ssh.ts`	Version `2026.06.01.2`, upgrade entry
`manifest.yaml`	Version `2026.06.01.2`, updated auth modes description
`README.md`	New transport field, new auth subsection with vault example
`schemas_test.ts`	6 new tests: validation, mutual exclusion, sensitive meta
`hosts_test.ts`	3 new tests: merge propagation
`ssh_test.ts`	7 new tests: exec/script/copy/open materialization, mixed fleet, per-host override, failure cleanup

Test plan

16 new unit + integration tests covering schema validation, host merge, all 4 methods (exec/script/copy/open), mixed fleet, per-host override, and failure-path cleanup
212/212 existing tests pass (excluding 3 pre-existing runner_spawn_test.ts failures unrelated to this change)
deno check, deno lint, deno fmt, deno install --frozen all clean
End-to-end verification against a real openssh-server in Docker: exec (echo hello-from-identityContent → exit 0), script (VAL=42; echo value=$VAL → exit 0), copy (scp + cat round-trip → content matches), temp file cleanup (no swamp-ssh-key-* files remain)

Closes #522

Co-Authored-By: Blake Irvin blakeirvin@me.com

## Summary Add an optional `identityContent` field to `@swamp/ssh` transport auth that accepts inline PEM private key content (e.g. from `vault.get()`), enabling vault-stored SSH keys to work with `@swamp/ssh` without external scripting or pre-loaded agents. - When `identityContent` is set, the extension writes the PEM body to a temp file (mode `0600`), passes it as `-i <tmpfile>`, and removes the file in a `finally` block - Mutually exclusive with `identityFile` — setting both is a validation error, setting neither uses SSH agent (unchanged behavior) - Marked `z.meta({ sensitive: true })` so swamp redacts it from logs/reports - Applied to `exec`, `script`, `copy`, and `open` methods; `forward` excluded (operates via ControlMaster, never passes `-i`) ## Why this is correct **The problem:** `vault.get()` resolves to PEM key *content*, not a file path. `identityFile` expects a path. Users storing SSH keys in vaults (1Password, HashiCorp Vault, AWS SM) had to either pre-load `ssh-agent` (fragile in CI) or write a custom extension model duplicating `@swamp/ssh`'s execution logic. **The fix:** A new field that accepts the content directly. The temp-file materialization happens in the operations layer (above the runner), so the runner never sees `identityContent` — it only sees the resolved `identityFile` path. Clean layering, no runner changes needed. **Key design decisions:** - Mutual-exclusion `.refine()` lives on `TransportSchema` (the discriminated union), not on `SshTransportSchema` (the member) — putting it on the member would break Zod's discriminated union introspection - `TransportOverrideSchema` also validates mutual exclusion so per-host overrides can't set both - Temp files use `Deno.makeTempFile` with a `swamp-ssh-key-` prefix, cleaned up in `finally` even when `throwOnHostFailures` fires ## Changes | File | What | |------|------| | `schemas.ts` | `identityContent` on `SshTransportSchema` + `TransportOverrideSchema`, `.meta({ sensitive: true })`, mutual-exclusion refine | | `hosts.ts` | `identityContent` in `mergeSshTransport` via `pick()` | | `operations.ts` | `materializeTempKeys` / `cleanupTempKeys` helpers, applied to exec/script/copy/open with `finally` cleanup | | `ssh.ts` | Version `2026.06.01.2`, upgrade entry | | `manifest.yaml` | Version `2026.06.01.2`, updated auth modes description | | `README.md` | New transport field, new auth subsection with vault example | | `schemas_test.ts` | 6 new tests: validation, mutual exclusion, sensitive meta | | `hosts_test.ts` | 3 new tests: merge propagation | | `ssh_test.ts` | 7 new tests: exec/script/copy/open materialization, mixed fleet, per-host override, failure cleanup | ## Test plan - [x] 16 new unit + integration tests covering schema validation, host merge, all 4 methods (exec/script/copy/open), mixed fleet, per-host override, and failure-path cleanup - [x] 212/212 existing tests pass (excluding 3 pre-existing `runner_spawn_test.ts` failures unrelated to this change) - [x] `deno check`, `deno lint`, `deno fmt`, `deno install --frozen` all clean - [x] End-to-end verification against a real openssh-server in Docker: exec (`echo hello-from-identityContent` → exit 0), script (`VAL=42; echo value=$VAL` → exit 0), copy (scp + cat round-trip → content matches), temp file cleanup (no `swamp-ssh-key-*` files remain) Closes #522 Co-Authored-By: Blake Irvin <blakeirvin@me.com>

stack72 added 1 commit

2026-06-01 18:13:42 +00:00

feat(ssh): add identityContent for vault-resolved private keys (#522 )

CI / workflows/s3-bootstrap - lockfile up to date (pull_request) Has been skipped

Details

CI / workflows/gcs-bootstrap - lockfile up to date (pull_request) Has been skipped

Details

CI / ssh - check (pull_request) Successful in 59s

Details

CI / Dependency Audit (pull_request) Successful in 4m2s

Details

CI / cve/dirtyfrag - check (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - lint (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - fmt (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - test (pull_request) Has been skipped

Details

CI / ssh - fmt (pull_request) Successful in 1m5s

Details

CI / cve/mini-shai-hulud - fmt (pull_request) Has been skipped

Details

CI / ssh - lint (pull_request) Successful in 1m5s

Details

CI / cve/mini-shai-hulud - check (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - lint (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - test (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - lockfile up to date (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - lockfile up to date (pull_request) Has been skipped

Details

CI / model/hetzner-cloud - check (pull_request) Has been skipped

Details

CI / ssh - test (pull_request) Successful in 1m14s

Details

CI / model/digitalocean - check (pull_request) Has been skipped

Details

CI / gcp models - lockfiles up to date (pull_request) Has been skipped

Details

CI / cloudflare models - lockfiles up to date (pull_request) Has been skipped

Details

CI / cloudflare models - sample check (pull_request) Has been skipped

Details

CI / codegen - check (pull_request) Has been skipped

Details

CI / ssh - lockfile up to date (pull_request) Successful in 1m31s

Details

CI / codegen - fmt (pull_request) Has been skipped

Details

CI / codegen - lint (pull_request) Has been skipped

Details

CI / CI Security Review (pull_request) Has been skipped

Details

CI / Adversarial Code Review (pull_request) Has been skipped

Details

CI / Claude Code Review (pull_request) Failing after 9m13s

Details

CI / Merge Gate (pull_request) Failing after 28s

Details

cbabefc8f8

Add an optional `identityContent` field to `@swamp/ssh` transport auth
that accepts inline PEM private key content (e.g. from `vault.get()`).
When set, the extension writes the PEM body to a temporary file (mode
0600), passes it as `-i <tmpfile>` for the SSH session, and removes the
file in a `finally` block when the method completes.

This enables users who store SSH private keys in a vault (1Password,
HashiCorp Vault, AWS Secrets Manager, etc.) to use `@swamp/ssh` directly
without writing the key to disk externally or pre-loading an ssh-agent —
both of which break the self-contained model principle.

`identityFile` and `identityContent` are mutually exclusive. Setting
neither is fine (SSH agent auth). The field is marked with
`z.meta({ sensitive: true })` so swamp redacts it from logs and reports.

Temp-key materialization is applied to exec, script, copy, and open.
The forward method is excluded — it operates via an already-open
ControlMaster socket and never passes `-i`.

Verified end-to-end against a real openssh-server in Docker: exec,
script, and copy all succeed with a vault-style PEM key, and temp files
are cleaned up after each method returns.

Co-Authored-By: Blake Irvin <blakeirvin@me.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

stack72 commented

2026-06-01 18:29:10 +00:00

Author

Owner

Code Review

Blocking Issues

Temp key files orphaned on partial materializeTempKeys failure (ssh/extensions/models/_lib/operations.ts:310-314)

tempPaths.set(host.name, tmpFile) is called after Deno.writeTextFile. If writeTextFile throws (e.g. disk full), the created temp file is never registered in tempPaths, so cleanupTempKeys will never remove it. If the failure occurs on host N, temp files for hosts 1 through N-1 are already in tempPaths but are also never cleaned up: materializeTempKeys throws before returning, the destructuring assignment in the caller never completes, and the try/finally block that calls cleanupTempKeys is never entered. All orphaned files may contain private key material.

Fix: move tempPaths.set(host.name, tmpFile) to before Deno.writeTextFile, and add a catch block inside materializeTempKeys that calls cleanupTempKeys(tempPaths) then re-throws on failure.

Suggestions

sanitizeResources: false without an explanatory comment (ssh/extensions/models/ssh_test.ts:387,421,452,486,515,578,630)

CLAUDE.md requires a comment alongside sanitizeResources: false. All seven identityContent tests use the flag without explanation. A brief inline comment would satisfy the project rule.

Everything else looks correct:

identityContent marked .meta({ sensitive: true }) so swamp redacts it from logs.
Temp file written with mode: 0o600.
identityContent cleared to undefined in the patched host before reaching argv builders -- no key material in recorded argv.
cleanupTempKeys called in finally in all four methods that materialize keys (exec, script, copy, open). check, close, and forward correctly skip materialization.
Schema-level mutual exclusivity enforcement for identityFile / identityContent in both TransportSchema and TransportOverrideSchema.
No any types in hand-written code; all named exports; no default exports.
Model files under model/gcp/ are auto-generated output (codegen/ changes are present in the same commit).
Test coverage is thorough: materialization, content verification, cleanup on success and failure, mixed-fleet scenario, per-host override, and the open ControlMaster path.

## Code Review ### Blocking Issues 1. **Temp key files orphaned on partial `materializeTempKeys` failure** (`ssh/extensions/models/_lib/operations.ts:310-314`) `tempPaths.set(host.name, tmpFile)` is called **after** `Deno.writeTextFile`. If `writeTextFile` throws (e.g. disk full), the created temp file is never registered in `tempPaths`, so `cleanupTempKeys` will never remove it. If the failure occurs on host N, temp files for hosts 1 through N-1 are already in `tempPaths` but are also never cleaned up: `materializeTempKeys` throws before returning, the destructuring assignment in the caller never completes, and the `try/finally` block that calls `cleanupTempKeys` is never entered. All orphaned files may contain private key material. Fix: move `tempPaths.set(host.name, tmpFile)` to **before** `Deno.writeTextFile`, and add a catch block inside `materializeTempKeys` that calls `cleanupTempKeys(tempPaths)` then re-throws on failure. ### Suggestions 1. **`sanitizeResources: false` without an explanatory comment** (`ssh/extensions/models/ssh_test.ts:387,421,452,486,515,578,630`) CLAUDE.md requires a comment alongside `sanitizeResources: false`. All seven `identityContent` tests use the flag without explanation. A brief inline comment would satisfy the project rule. --- Everything else looks correct: - `identityContent` marked `.meta({ sensitive: true })` so swamp redacts it from logs. - Temp file written with `mode: 0o600`. - `identityContent` cleared to `undefined` in the patched host before reaching argv builders -- no key material in recorded `argv`. - `cleanupTempKeys` called in `finally` in all four methods that materialize keys (exec, script, copy, open). `check`, `close`, and `forward` correctly skip materialization. - Schema-level mutual exclusivity enforcement for `identityFile` / `identityContent` in both `TransportSchema` and `TransportOverrideSchema`. - No `any` types in hand-written code; all named exports; no default exports. - Model files under `model/gcp/` are auto-generated output (codegen/ changes are present in the same commit). - Test coverage is thorough: materialization, content verification, cleanup on success and failure, mixed-fleet scenario, per-host override, and the `open` ControlMaster path.

stack72 added 1 commit

2026-06-01 21:14:33 +00:00

fix(ssh): clean up temp key on write failure, add sanitizeResources comments

CI / cve/mini-shai-hulud - check (pull_request) Has been skipped

Details

CI / ssh - lint (pull_request) Successful in 1m6s

Details

CI / ssh - fmt (pull_request) Successful in 1m8s

Details

CI / ssh - lockfile up to date (pull_request) Successful in 1m6s

Details

CI / cve/mini-shai-hulud - fmt (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - lint (pull_request) Has been skipped

Details

CI / ssh - check (pull_request) Successful in 1m12s

Details

CI / cve/mini-shai-hulud - test (pull_request) Has been skipped

Details

CI / Dependency Audit (pull_request) Successful in 4m5s

Details

CI / cve/dirtyfrag - lockfile up to date (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - lockfile up to date (pull_request) Has been skipped

Details

CI / ssh - test (pull_request) Successful in 1m18s

Details

CI / model/digitalocean - check (pull_request) Has been skipped

Details

CI / model/hetzner-cloud - check (pull_request) Has been skipped

Details

CI / model/digitalocean - lockfile up to date (pull_request) Has been skipped

Details

CI / model/hetzner-cloud - lockfile up to date (pull_request) Has been skipped

Details

CI / aws models - sample check (pull_request) Has been skipped

Details

CI / gcp models - sample check (pull_request) Has been skipped

Details

CI / gcp models - lockfiles up to date (pull_request) Has been skipped

Details

CI / cloudflare models - sample check (pull_request) Has been skipped

Details

CI / aws models - lockfiles up to date (pull_request) Has been skipped

Details

CI / cloudflare models - lockfiles up to date (pull_request) Has been skipped

Details

CI / codegen - check (pull_request) Has been skipped

Details

CI / codegen - fmt (pull_request) Has been skipped

Details

CI / codegen - lint (pull_request) Has been skipped

Details

CI / Adversarial Code Review (pull_request) Has been skipped

Details

CI / Claude Code Review (pull_request) Failing after 4m7s

Details

CI / Merge Gate (pull_request) Failing after 32s

Details

CI / codegen - lockfile up to date (pull_request) Has been skipped

Details

CI / CI Security Review (pull_request) Has been skipped

Details

0d411b2c61

Address CI review findings:
- Register temp file path before writeTextFile so cleanup covers it if
  the write fails. Clean up all already-created temp files on error.
- Add explanatory comments on sanitizeResources: false in tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

stack72 commented

2026-06-01 21:24:55 +00:00

Author

Owner

Code Review

Blocking Issues

GCP model .ts files changed without corresponding codegen/ changes.
The CHANGED FILES list includes nine model/gcp/compute/extensions/models/*.ts files
(backendservices.ts, forwardingrules.ts, globalforwardingrules.ts, instances.ts,
regionbackendservices.ts, regionzones.ts, rollouts.ts, routers.ts, zones.ts) and
model/gcp/vault/extensions/models/matters_exports.ts, alongside their manifests — but no
codegen/ files appear in the CHANGED FILES list. Per CLAUDE.md, files under model/ are
auto-generated and must never be hand-edited. Two legitimate explanations exist: (a) these are
purely version/upgrade-entry bumps produced by the bump-versions script, in which case the
.ts bodies contain only version and upgrade changes, or (b) these came from a codegen
regeneration run, in which case the missing codegen/ diff is a problem. If the .ts changes
contain anything beyond version/upgrades/manifest entries without a corresponding
codegen/ change, this is a blocking CLAUDE.md violation. Verify that the GCP model .ts
changes are limited to version/upgrade entries (consistent with a bump-versions run) before
merging.

Suggestions

ForwardSpecPattern allows out-of-range port numbers.
schemas.ts:275: const ForwardSpecPattern = /^\d{1,5}:[^:]+:\d{1,5}$/ permits values up to
99999 for both local and remote ports, which exceeds the maximum valid port of 65535. OpenSSH
will reject the value at runtime, so this is not a security issue, but adding an explicit
numeric range check (or tightening the regex to (?:[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5]))
would surface the error at schema parse time with a clear message instead of failing deep in
the SSH handshake.
Mixed-fleet test assumes a specific request-ordering across parallel hosts.
ssh_test.ts:563-584: the test inspects requests[0] for web-1 and requests[1] for web-2.
Because the mock executor resolves synchronously (Promise.resolve(...)) the ordering is
deterministic in practice, but the assumption is subtle and not documented. A comment or
ordering assertion on requests.map(r => r....) keyed by the host address/argv would make the
intent explicit and prevent accidental breakage if the executor is ever changed to be async.

## Code Review ### Blocking Issues 1. **GCP model `.ts` files changed without corresponding `codegen/` changes.** The CHANGED FILES list includes nine `model/gcp/compute/extensions/models/*.ts` files (`backendservices.ts`, `forwardingrules.ts`, `globalforwardingrules.ts`, `instances.ts`, `regionbackendservices.ts`, `regionzones.ts`, `rollouts.ts`, `routers.ts`, `zones.ts`) and `model/gcp/vault/extensions/models/matters_exports.ts`, alongside their manifests — but no `codegen/` files appear in the CHANGED FILES list. Per CLAUDE.md, files under `model/` are auto-generated and must never be hand-edited. Two legitimate explanations exist: (a) these are purely version/upgrade-entry bumps produced by the `bump-versions` script, in which case the `.ts` bodies contain only version and upgrade changes, or (b) these came from a codegen regeneration run, in which case the missing `codegen/` diff is a problem. If the `.ts` changes contain anything beyond `version`/`upgrades`/manifest entries without a corresponding `codegen/` change, this is a blocking CLAUDE.md violation. **Verify that the GCP model `.ts` changes are limited to version/upgrade entries (consistent with a `bump-versions` run) before merging.** --- ### Suggestions 1. **`ForwardSpecPattern` allows out-of-range port numbers.** `schemas.ts:275`: `const ForwardSpecPattern = /^\d{1,5}:[^:]+:\d{1,5}$/` permits values up to 99999 for both local and remote ports, which exceeds the maximum valid port of 65535. OpenSSH will reject the value at runtime, so this is not a security issue, but adding an explicit numeric range check (or tightening the regex to `(?:[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])`) would surface the error at schema parse time with a clear message instead of failing deep in the SSH handshake. 2. **Mixed-fleet test assumes a specific request-ordering across parallel hosts.** `ssh_test.ts:563-584`: the test inspects `requests[0]` for web-1 and `requests[1]` for web-2. Because the mock executor resolves synchronously (`Promise.resolve(...)`) the ordering is deterministic in practice, but the assumption is subtle and not documented. A comment or ordering assertion on `requests.map(r => r....)` keyed by the host address/argv would make the intent explicit and prevent accidental breakage if the executor is ever changed to be async.

stack72 merged commit 69032317bd into main

2026-06-01 21:30:11 +00:00

stack72 deleted branch worktree-522

2026-06-01 21:30:11 +00:00

stack72 referenced this pull request from a commit

2026-06-01 21:30:13 +00:00

feat(ssh): add identityContent for vault-resolved private keys (#522) (#23)