fix(ssh): fail exec/script/copy methods on non-zero exit (#510) #22

Merged
stack72 merged 1 commit from worktree-510 into main 2026-06-01 13:58:08 +00:00
Owner

Summary

  • exec, script, and copy now fail the method when any host process exits non-zero, is killed by a signal, or fails to spawn. Previously these methods always reported succeeded even when the underlying ssh/scp process failed — silently allowing downstream workflow steps gated on dependsOn: { condition: succeeded } to proceed past broken operations.
  • RunResult resources are still written before the error is raised, so per-host diagnostic data (exit code, stderr, timing) is always available regardless of method outcome.
  • Fail-fast "skipped" hosts are excluded from the failure count — only genuine process failures appear in the aggregate error message.

What changed

File Change
ssh/extensions/models/_lib/operations.ts Added throwOnHostFailures() helper; wired into runExec, runScript, runCopy after resource writes
ssh/extensions/models/ssh_test.ts 5 new tests: per-method failure on non-zero exit, resource persistence before throw, fail-fast skip exclusion
ssh/extensions/models/ssh.ts Version bump 2026.05.29.22026.06.01.1, upgrades entry
ssh/manifest.yaml Version bump to match
ssh/README.md exec/script/copy method docs now note failure behavior

Why this is correct

The runForward method (operations.ts:744) already throws on non-zero exit — exec, script, and copy were the outliers. This aligns all command-execution methods with the same contract: a method that returns normally = succeeded; a method that throws = failed. Swamp has no partial-success status, so the throw is the only mechanism to propagate failure to workflow dependency conditions.

Impact

Workflows that previously relied on exec/script/copy always succeeding will now see method failures when the underlying process fails. This is the intended fix — the issue reporter's build pipeline silently skipped a file copy and proceeded to configure a VM with missing data because the copy method masked the scp exit code 255.

Verification

Verified end-to-end against a local @swamp/ssh instance:

$ swamp model method run verify-510 exec --input hosts=all --input command=uptime --json
{
  "error": "exec failed on 1/1 host(s): badhost (exit 255: ssh: connect to host 127.0.0.1 port 22: Connection refused)",
  "code": "method_execution_failed"
}

RunResult resource confirmed persisted with exitCode: 255 and full stderr.

Test plan

  • All 209 existing + new tests pass (deno test)
  • deno check, deno lint, deno fmt --check clean
  • deno install --frozen — no lockfile drift
  • End-to-end verification: method reports failed on non-zero exit
  • End-to-end verification: RunResult resource persisted before throw
  • CI passes

Closes #510

Co-authored-by: stateless
Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

## Summary - **`exec`, `script`, and `copy` now fail the method when any host process exits non-zero**, is killed by a signal, or fails to spawn. Previously these methods always reported `succeeded` even when the underlying `ssh`/`scp` process failed — silently allowing downstream workflow steps gated on `dependsOn: { condition: succeeded }` to proceed past broken operations. - RunResult resources are **still written before the error is raised**, so per-host diagnostic data (exit code, stderr, timing) is always available regardless of method outcome. - Fail-fast "skipped" hosts are excluded from the failure count — only genuine process failures appear in the aggregate error message. ## What changed | File | Change | |------|--------| | `ssh/extensions/models/_lib/operations.ts` | Added `throwOnHostFailures()` helper; wired into `runExec`, `runScript`, `runCopy` after resource writes | | `ssh/extensions/models/ssh_test.ts` | 5 new tests: per-method failure on non-zero exit, resource persistence before throw, fail-fast skip exclusion | | `ssh/extensions/models/ssh.ts` | Version bump `2026.05.29.2` → `2026.06.01.1`, upgrades entry | | `ssh/manifest.yaml` | Version bump to match | | `ssh/README.md` | exec/script/copy method docs now note failure behavior | ## Why this is correct The `runForward` method (`operations.ts:744`) already throws on non-zero exit — `exec`, `script`, and `copy` were the outliers. This aligns all command-execution methods with the same contract: a method that returns normally = `succeeded`; a method that throws = `failed`. Swamp has no partial-success status, so the throw is the only mechanism to propagate failure to workflow dependency conditions. ## Impact Workflows that previously relied on `exec`/`script`/`copy` always succeeding will now see method failures when the underlying process fails. This is the **intended fix** — the issue reporter's build pipeline silently skipped a file copy and proceeded to configure a VM with missing data because the `copy` method masked the `scp` exit code 255. ## Verification Verified end-to-end against a local `@swamp/ssh` instance: ``` $ swamp model method run verify-510 exec --input hosts=all --input command=uptime --json { "error": "exec failed on 1/1 host(s): badhost (exit 255: ssh: connect to host 127.0.0.1 port 22: Connection refused)", "code": "method_execution_failed" } ``` RunResult resource confirmed persisted with `exitCode: 255` and full stderr. ## Test plan - [x] All 209 existing + new tests pass (`deno test`) - [x] `deno check`, `deno lint`, `deno fmt --check` clean - [x] `deno install --frozen` — no lockfile drift - [x] End-to-end verification: method reports `failed` on non-zero exit - [x] End-to-end verification: RunResult resource persisted before throw - [ ] CI passes Closes #510 Co-authored-by: stateless Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(ssh): fail exec/script/copy methods on non-zero exit (#510)
All checks were successful
CI / workflows/gcs-bootstrap - check (pull_request) Has been skipped
CI / workflows/s3-bootstrap - check (pull_request) Has been skipped
CI / workflows/s3-bootstrap - fmt (pull_request) Has been skipped
CI / workflows/gcs-bootstrap - lint (pull_request) Has been skipped
CI / workflows/s3-bootstrap - lint (pull_request) Has been skipped
CI / Dependency Audit (pull_request) Successful in 3m56s
CI / workflows/s3-bootstrap - test (pull_request) Has been skipped
CI / workflows/gcs-bootstrap - fmt (pull_request) Has been skipped
CI / workflows/gcs-bootstrap - test (pull_request) Has been skipped
CI / workflows/gcs-bootstrap - lockfile up to date (pull_request) Has been skipped
CI / workflows/s3-bootstrap - lockfile up to date (pull_request) Has been skipped
CI / cve/dirtyfrag - lint (pull_request) Has been skipped
CI / cve/dirtyfrag - fmt (pull_request) Has been skipped
CI / cve/dirtyfrag - check (pull_request) Has been skipped
CI / cve/mini-shai-hulud - check (pull_request) Has been skipped
CI / cve/mini-shai-hulud - fmt (pull_request) Has been skipped
CI / cve/dirtyfrag - test (pull_request) Has been skipped
CI / cve/mini-shai-hulud - lint (pull_request) Has been skipped
CI / cve/dirtyfrag - lockfile up to date (pull_request) Has been skipped
CI / cve/mini-shai-hulud - test (pull_request) Has been skipped
CI / cve/mini-shai-hulud - lockfile up to date (pull_request) Has been skipped
CI / model/digitalocean - check (pull_request) Has been skipped
CI / ssh - fmt (pull_request) Successful in 51s
CI / ssh - check (pull_request) Successful in 57s
CI / Claude Code Review (pull_request) Successful in 3m41s
CI / Merge Gate (pull_request) Successful in 24s
CI / codegen - lint (pull_request) Has been skipped
CI / codegen - lockfile up to date (pull_request) Has been skipped
CI / CI Security Review (pull_request) Has been skipped
CI / Adversarial Code Review (pull_request) Has been skipped
58dbdf6820
The exec, script, and copy methods in @swamp/ssh silently reported
"succeeded" even when the underlying ssh/scp process exited non-zero.
RunResult resources correctly recorded the exit code and stderr, but the
method status was never checked — so workflows gated on
`dependsOn: { condition: succeeded }` proceeded past failed operations.

Add a shared `throwOnHostFailures` check that runs after all RunResult
resources are written. It inspects results for genuine process failures
(non-zero exit code, signal kill, spawn errors) while excluding fail-fast
"skipped" entries. On failure, it throws an aggregate error naming each
failed host with its exit code and last stderr line.

Verified end-to-end: `swamp model method run ... exec` now returns
`method_execution_failed` when ssh exits 255, and the RunResult resource
is still persisted for diagnostics.

Co-Authored-By: stateless <stateless@users.noreply.swamp.club>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

Code Review

Blocking Issues

None.

Suggestions

  1. No test for the signal-kill failure path (operations.ts line 224). throwOnHostFailures handles three failure conditions: non-zero exit, signal kill, and spawn error. Every test executor in ssh_test.ts returns signal null. The signal branch is structurally correct, but a test with a CommandExecutor that resolves with a non-null signal string would give full branch coverage of the new code.

  2. Stderr surfaces in the thrown error message (operations.ts line 233). Including the last line of remote stderr in the method-level error is the right call for operational visibility. No credentials-leak risk: passwords travel via SSHPASS env var, never through stderr. Worth noting as a deliberate design choice rather than an oversight.

## Code Review ### Blocking Issues None. ### Suggestions 1. No test for the signal-kill failure path (operations.ts line 224). throwOnHostFailures handles three failure conditions: non-zero exit, signal kill, and spawn error. Every test executor in ssh_test.ts returns signal null. The signal branch is structurally correct, but a test with a CommandExecutor that resolves with a non-null signal string would give full branch coverage of the new code. 2. Stderr surfaces in the thrown error message (operations.ts line 233). Including the last line of remote stderr in the method-level error is the right call for operational visibility. No credentials-leak risk: passwords travel via SSHPASS env var, never through stderr. Worth noting as a deliberate design choice rather than an oversight.
stack72 deleted branch worktree-510 2026-06-01 13:58:08 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
swamp-club/swamp-extensions!22
No description provided.