Debugging Runs

When a workflow misbehaves, Triggo gives you one place to figure out why: the run detail view. Every step that ran, every input it received, every byte it returned, every error it threw — all of it is in the journal. This page walks through how to find a run, how to read the journal, what gets redacted, the error classes you'll see most often, and how replay works.

Finding the run

The Runs tab lists every execution in your workspace, newest first. Each card shows the action slug, status badge, source (e.g. webhook, schedule, manual, replay), duration, and timestamp. Click a row to open the run detail.

Three filters sit above the list:

Action slug — free-text match against the action.
Status — one of accepted, running, waiting_for_approval, succeeded, failed, cancelled, timed_out.
Source — one of action, webhook, schedule, manual, replay, test, dry_run.

Results paginate 20 at a time. There's no search by correlation ID in the UI yet — if you have a correlation ID from an external agent, use the runtime API (GET /api/v1/runtime/runs) instead.

The journal — what each event means

Every step execution writes one or more events to the execution_events table. The run detail renders these in order. There are exactly seven event types. Understand these and you can read any run.

Event	Meaning	When it fires
`step_started`	The step has begun executing. Carries the resolved `inputData`.	Written immediately before the connector/code/LLM handler is invoked.
`step_completed`	The step finished successfully. Carries `outputData` and `durationMs`.	Written after the handler returns without throwing.
`step_failed`	The step threw or returned a failure, and the run will halt.	Written when the handler fails and the step does not have `continueOnFailure` set.
`step_failed_continued`	The step failed, but the run keeps going.	Same trigger as `step_failed`, but only when the step's `continueOnFailure` flag is set. See Error Handling.
`step_skipped`	The step was not executed because the DAG path leading to it was not taken.	Written when a condition branch evaluates false, or a conditional predecessor routed around this node.
`step_timed_out`	The step exceeded the 30 s step timeout.	Written by the step scheduler when the per-step watchdog fires. Error payload has `code: "TIMEOUT"`.
`step_pending_approval`	The run is waiting for a human decision.	Written when an approval gate is reached; status transitions to `waiting_for_approval` and execution pauses.

One nuance worth calling out: for resume/replay, the executor treats step_completed, step_skipped, and step_failed_continued as "this node is done, don't re-run it." That's why a replayed run can pick up after a partial failure without double-firing side effects.

Inspecting step input and output

The run detail shows each step's payloads in two panels:

Input — inputData from the step_started event. What the step actually saw after field mappings, coercion, and trigger data flowed in.
Output — outputData from step_completed (or the error payload from step_failed / step_timed_out).

If a downstream step received something unexpected, this is where you look. Cross-reference the mapping expressions in the step config against what arrived in inputData. Undefined references typically point to a typo in a {{node.field}} path or a predecessor that wrote a different shape than you assumed.

Redaction

All payloads are passed through sanitize() in before they're persisted. Recursive, depth-capped at 10. It rewrites string values that match any of these patterns:

OAuth bearer tokens → Bearer [REDACTED]
api_key=..., access_token=..., secret_key=... assignments with 16+ char values → [REDACTED]
Email addresses → [REDACTED_EMAIL]
Russian-format phone numbers (+7 / 8) → [REDACTED_PHONE]
Hex or base64-style tokens ≥ 40 chars → [REDACTED_TOKEN]

If a field you need to debug shows up redacted, it's because one of those patterns matched. Numbers and booleans pass through untouched — redaction only applies to strings.

Common failure signatures

These are the error code values you'll see most often in a failed run's error payload. All are defined in (CONNECTOR_ERROR_CODES) or set by the executor directly.

`AUTH_EXPIRED`

The connector's OAuth token, API key, or session is no longer valid. Refresh didn't recover.

What to do: Open the Connections page, reconnect the integration. Then replay the failed run (or let the next scheduled run pick up). AUTH_EXPIRED is not retried automatically — retrying won't fix a dead credential.

`RATE_LIMITED`

The upstream API returned 429 or otherwise signalled you're over their limit. If the response included a Retry-After header, the executor parses it into retryAfterMs on the error payload.

What to do: This one is retryable (it's in RETRYABLE_ERROR_CODES). If you set maxRetries on the step, the executor will back off and try again. Otherwise, space out your triggers or reduce throughput. If the breaker has tripped for this integration you'll see the calls failing fast — that's by design (see Error Handling).

`VALIDATION_ERROR`

A field failed property coercion or the connector rejected the input. Thrown by property-coercer or returned by the connector itself.

What to do: Check the step's field mappings against the connector's declared property types. Missing required fields, wrong types (string where a number was expected), malformed enums. The error message usually names the offending field.

`TIMEOUT`

Either a step exceeded the 30 s step timeout (step scheduler sets code: "TIMEOUT") or the whole pipeline exceeded the 300 s global timeout.

What to do: Look at durationMs on surrounding events to confirm which timeout fired. For a single slow step, narrow the work it does or offload to a queue. For a whole-pipeline timeout, the DAG itself is too heavy for one run — split it into smaller workflows linked by webhooks.

`UPSTREAM_ERROR`

The integration returned a non-specific 5xx or other error that didn't map to the codes above. Often transient.

What to do: Inspect the error message for the remote API's response. If transient, enabling retries (maxRetries: 2–3 with exponential backoff) will usually ride it out. If persistent, it's a real outage on their side — the circuit breaker will trip after five failures in five minutes and protect you from hammering.

One more you may see: UNKNOWN is the fallback when the executor can't classify the error (e.g. an unexpected throw without a code field). Treat it as a bug signal — something inside the handler didn't conform to the expected shape.

Replay

The Replay button in the run detail toolbar creates a new run from the failed or completed one. The original run is never modified.

What replay does:

Clones the original run's workflow_id, workflow_version_id, and input, then inserts a new row with source: "replay" and resume_from_run_id set to the original.
Pre-fills a JSON editor with the original input — you can edit it before confirming, or replay with the original input as-is.
Enqueues a fresh BullMQ execution job. The engine re-loads the current pipeline definition at that workflow_version_id and uses your current credentials — not a snapshot from when the original ran.
Resumes from the journal: steps whose node IDs already have step_completed, step_skipped, or step_failed_continued events in the original run are not re-executed. Execution picks up at the failed node.

Only terminal runs can be replayed (succeeded, failed, cancelled, timed_out). Replaying a running or waiting_for_approval run is rejected with a BAD_REQUEST.

The replay button lives in and calls the run.replay tRPC mutation, which delegates to executionService.replayRun in.

Error Handling — retries, continue-on-failure, circuit breaker, auto-pause.
Limits — step timeout (30 s), pipeline timeout (300 s), per-user run rate limit.
Agent Integration Overview — running and inspecting runs from external agents over the runtime API.

Debugging Runs

On this page