fix(ssh): fail exec/script/copy methods on non-zero exit (#510)

stack72 commented

2026-06-01 13:46:08 +00:00

Owner

Summary

exec, script, and copy now fail the method when any host process exits non-zero, is killed by a signal, or fails to spawn. Previously these methods always reported succeeded even when the underlying ssh/scp process failed — silently allowing downstream workflow steps gated on dependsOn: { condition: succeeded } to proceed past broken operations.
RunResult resources are still written before the error is raised, so per-host diagnostic data (exit code, stderr, timing) is always available regardless of method outcome.
Fail-fast "skipped" hosts are excluded from the failure count — only genuine process failures appear in the aggregate error message.

What changed

File	Change
`ssh/extensions/models/_lib/operations.ts`	Added `throwOnHostFailures()` helper; wired into `runExec`, `runScript`, `runCopy` after resource writes
`ssh/extensions/models/ssh_test.ts`	5 new tests: per-method failure on non-zero exit, resource persistence before throw, fail-fast skip exclusion
`ssh/extensions/models/ssh.ts`	Version bump `2026.05.29.2` → `2026.06.01.1`, upgrades entry
`ssh/manifest.yaml`	Version bump to match
`ssh/README.md`	exec/script/copy method docs now note failure behavior

Why this is correct

The runForward method (operations.ts:744) already throws on non-zero exit — exec, script, and copy were the outliers. This aligns all command-execution methods with the same contract: a method that returns normally = succeeded; a method that throws = failed. Swamp has no partial-success status, so the throw is the only mechanism to propagate failure to workflow dependency conditions.

Impact

Workflows that previously relied on exec/script/copy always succeeding will now see method failures when the underlying process fails. This is the intended fix — the issue reporter's build pipeline silently skipped a file copy and proceeded to configure a VM with missing data because the copy method masked the scp exit code 255.

Verification

Verified end-to-end against a local @swamp/ssh instance:

$ swamp model method run verify-510 exec --input hosts=all --input command=uptime --json
{
  "error": "exec failed on 1/1 host(s): badhost (exit 255: ssh: connect to host 127.0.0.1 port 22: Connection refused)",
  "code": "method_execution_failed"
}

RunResult resource confirmed persisted with exitCode: 255 and full stderr.

Test plan

All 209 existing + new tests pass (deno test)
deno check, deno lint, deno fmt --check clean
deno install --frozen — no lockfile drift
End-to-end verification: method reports failed on non-zero exit
End-to-end verification: RunResult resource persisted before throw
CI passes

Closes #510

Co-authored-by: stateless
Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

## Summary - **`exec`, `script`, and `copy` now fail the method when any host process exits non-zero**, is killed by a signal, or fails to spawn. Previously these methods always reported `succeeded` even when the underlying `ssh`/`scp` process failed — silently allowing downstream workflow steps gated on `dependsOn: { condition: succeeded }` to proceed past broken operations. - RunResult resources are **still written before the error is raised**, so per-host diagnostic data (exit code, stderr, timing) is always available regardless of method outcome. - Fail-fast "skipped" hosts are excluded from the failure count — only genuine process failures appear in the aggregate error message. ## What changed | File | Change | |------|--------| | `ssh/extensions/models/_lib/operations.ts` | Added `throwOnHostFailures()` helper; wired into `runExec`, `runScript`, `runCopy` after resource writes | | `ssh/extensions/models/ssh_test.ts` | 5 new tests: per-method failure on non-zero exit, resource persistence before throw, fail-fast skip exclusion | | `ssh/extensions/models/ssh.ts` | Version bump `2026.05.29.2` → `2026.06.01.1`, upgrades entry | | `ssh/manifest.yaml` | Version bump to match | | `ssh/README.md` | exec/script/copy method docs now note failure behavior | ## Why this is correct The `runForward` method (`operations.ts:744`) already throws on non-zero exit — `exec`, `script`, and `copy` were the outliers. This aligns all command-execution methods with the same contract: a method that returns normally = `succeeded`; a method that throws = `failed`. Swamp has no partial-success status, so the throw is the only mechanism to propagate failure to workflow dependency conditions. ## Impact Workflows that previously relied on `exec`/`script`/`copy` always succeeding will now see method failures when the underlying process fails. This is the **intended fix** — the issue reporter's build pipeline silently skipped a file copy and proceeded to configure a VM with missing data because the `copy` method masked the `scp` exit code 255. ## Verification Verified end-to-end against a local `@swamp/ssh` instance: ``` $ swamp model method run verify-510 exec --input hosts=all --input command=uptime --json { "error": "exec failed on 1/1 host(s): badhost (exit 255: ssh: connect to host 127.0.0.1 port 22: Connection refused)", "code": "method_execution_failed" } ``` RunResult resource confirmed persisted with `exitCode: 255` and full stderr. ## Test plan - [x] All 209 existing + new tests pass (`deno test`) - [x] `deno check`, `deno lint`, `deno fmt --check` clean - [x] `deno install --frozen` — no lockfile drift - [x] End-to-end verification: method reports `failed` on non-zero exit - [x] End-to-end verification: RunResult resource persisted before throw - [ ] CI passes Closes #510 Co-authored-by: stateless Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

stack72 added 1 commit

2026-06-01 13:46:08 +00:00

fix(ssh): fail exec/script/copy methods on non-zero exit (#510 )

CI / workflows/gcs-bootstrap - check (pull_request) Has been skipped

Details

CI / workflows/s3-bootstrap - check (pull_request) Has been skipped

Details

CI / workflows/s3-bootstrap - fmt (pull_request) Has been skipped

Details

CI / workflows/gcs-bootstrap - lint (pull_request) Has been skipped

Details

CI / workflows/s3-bootstrap - lint (pull_request) Has been skipped

Details

CI / Dependency Audit (pull_request) Successful in 3m56s

Details

CI / workflows/s3-bootstrap - test (pull_request) Has been skipped

Details

CI / workflows/gcs-bootstrap - fmt (pull_request) Has been skipped

Details

CI / workflows/gcs-bootstrap - test (pull_request) Has been skipped

Details

CI / workflows/gcs-bootstrap - lockfile up to date (pull_request) Has been skipped

Details

CI / workflows/s3-bootstrap - lockfile up to date (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - lint (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - fmt (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - check (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - check (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - fmt (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - test (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - lint (pull_request) Has been skipped

Details

CI / cve/dirtyfrag - lockfile up to date (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - test (pull_request) Has been skipped

Details

CI / cve/mini-shai-hulud - lockfile up to date (pull_request) Has been skipped

Details

CI / model/digitalocean - check (pull_request) Has been skipped

Details

CI / ssh - fmt (pull_request) Successful in 51s

Details

CI / ssh - check (pull_request) Successful in 57s

Details

CI / Claude Code Review (pull_request) Successful in 3m41s

Details

CI / Merge Gate (pull_request) Successful in 24s

Details

CI / codegen - lint (pull_request) Has been skipped

Details

CI / codegen - lockfile up to date (pull_request) Has been skipped

Details

CI / CI Security Review (pull_request) Has been skipped

Details

CI / Adversarial Code Review (pull_request) Has been skipped

Details

58dbdf6820

The exec, script, and copy methods in @swamp/ssh silently reported
"succeeded" even when the underlying ssh/scp process exited non-zero.
RunResult resources correctly recorded the exit code and stderr, but the
method status was never checked — so workflows gated on
`dependsOn: { condition: succeeded }` proceeded past failed operations.

Add a shared `throwOnHostFailures` check that runs after all RunResult
resources are written. It inspects results for genuine process failures
(non-zero exit code, signal kill, spawn errors) while excluding fail-fast
"skipped" entries. On failure, it throws an aggregate error naming each
failed host with its exit code and last stderr line.

Verified end-to-end: `swamp model method run ... exec` now returns
`method_execution_failed` when ssh exits 255, and the RunResult resource
is still persisted for diagnostics.

Co-Authored-By: stateless <stateless@users.noreply.swamp.club>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

stack72 commented

2026-06-01 13:56:17 +00:00

Author

Owner

Code Review

Blocking Issues

None.

Suggestions

No test for the signal-kill failure path (operations.ts line 224). throwOnHostFailures handles three failure conditions: non-zero exit, signal kill, and spawn error. Every test executor in ssh_test.ts returns signal null. The signal branch is structurally correct, but a test with a CommandExecutor that resolves with a non-null signal string would give full branch coverage of the new code.
Stderr surfaces in the thrown error message (operations.ts line 233). Including the last line of remote stderr in the method-level error is the right call for operational visibility. No credentials-leak risk: passwords travel via SSHPASS env var, never through stderr. Worth noting as a deliberate design choice rather than an oversight.

## Code Review ### Blocking Issues None. ### Suggestions 1. No test for the signal-kill failure path (operations.ts line 224). throwOnHostFailures handles three failure conditions: non-zero exit, signal kill, and spawn error. Every test executor in ssh_test.ts returns signal null. The signal branch is structurally correct, but a test with a CommandExecutor that resolves with a non-null signal string would give full branch coverage of the new code. 2. Stderr surfaces in the thrown error message (operations.ts line 233). Including the last line of remote stderr in the method-level error is the right call for operational visibility. No credentials-leak risk: passwords travel via SSHPASS env var, never through stderr. Worth noting as a deliberate design choice rather than an oversight.

stack72 merged commit 6e204a6ed4 into main

2026-06-01 13:58:08 +00:00

stack72 deleted branch worktree-510

2026-06-01 13:58:08 +00:00

stack72 referenced this pull request from a commit

2026-06-01 13:58:10 +00:00

fix(ssh): fail exec/script/copy methods on non-zero exit (#510) (#22)