Debugging Runs
Inspect a pipeline run — journal events, step inputs and outputs, and replay.
Debugging Runs
When a workflow misbehaves, Triggo gives you one place to figure out why: the run detail view. Every step that ran, every input it received, every byte it returned, every error it threw — all of it is in the journal. This page walks through how to find a run, how to read the journal, what gets redacted, the error classes you'll see most often, and how replay works.
Finding the run
The Runs tab lists every execution in your workspace, newest first. Each card shows the action slug, status badge, source (e.g. webhook, schedule, manual, replay), duration, and timestamp. Click a row to open the run detail.
Three filters sit above the list:
- Action slug — free-text match against the action.
- Status — one of
accepted,running,waiting_for_approval,succeeded,failed,cancelled,timed_out. - Source — one of
action,webhook,schedule,manual,replay,test,dry_run.
Results paginate 20 at a time. There's no search by correlation ID in the UI yet — if you have a correlation ID from an external agent, use the runtime API (GET /api/v1/runtime/runs) instead.
The journal — what each event means
Every step execution writes one or more events to the execution_events table. The run detail renders these in order. There are exactly seven event types. Understand these and you can read any run.
| Event | Meaning | When it fires |
|---|---|---|
step_started | The step has begun executing. Carries the resolved inputData. | Written immediately before the connector/code/LLM handler is invoked. |
step_completed | The step finished successfully. Carries outputData and durationMs. | Written after the handler returns without throwing. |
step_failed | The step threw or returned a failure, and the run will halt. | Written when the handler fails and the step does not have continueOnFailure set. |
step_failed_continued | The step failed, but the run keeps going. | Same trigger as step_failed, but only when the step's continueOnFailure flag is set. See Error Handling. |
step_skipped | The step was not executed because the DAG path leading to it was not taken. | Written when a condition branch evaluates false, or a conditional predecessor routed around this node. |
step_timed_out | The step exceeded the 30 s step timeout. | Written by the step scheduler when the per-step watchdog fires. Error payload has code: "TIMEOUT". |
step_pending_approval | The run is waiting for a human decision. | Written when an approval gate is reached; status transitions to waiting_for_approval and execution pauses. |
One nuance worth calling out: for resume/replay, the executor treats step_completed, step_skipped, and step_failed_continued as "this node is done, don't re-run it." That's why a replayed run can pick up after a partial failure without double-firing side effects.
Inspecting step input and output
The run detail shows each step's payloads in two panels:
- Input —
inputDatafrom thestep_startedevent. What the step actually saw after field mappings, coercion, and trigger data flowed in. - Output —
outputDatafromstep_completed(or theerrorpayload fromstep_failed/step_timed_out).
If a downstream step received something unexpected, this is where you look. Cross-reference the mapping expressions in the step config against what arrived in inputData. Undefined references typically point to a typo in a {{node.field}} path or a predecessor that wrote a different shape than you assumed.
Redaction
All payloads are passed through sanitize() in before they're persisted. Recursive, depth-capped at 10. It rewrites string values that match any of these patterns:
- OAuth bearer tokens →
Bearer [REDACTED] api_key=...,access_token=...,secret_key=...assignments with 16+ char values →[REDACTED]- Email addresses →
[REDACTED_EMAIL] - Russian-format phone numbers (
+7/8) →[REDACTED_PHONE] - Hex or base64-style tokens ≥ 40 chars →
[REDACTED_TOKEN]
If a field you need to debug shows up redacted, it's because one of those patterns matched. Numbers and booleans pass through untouched — redaction only applies to strings.
Common failure signatures
These are the error code values you'll see most often in a failed run's error payload. All are defined in (CONNECTOR_ERROR_CODES) or set by the executor directly.
AUTH_EXPIRED
The connector's OAuth token, API key, or session is no longer valid. Refresh didn't recover.
What to do: Open the Connections page, reconnect the integration. Then replay the failed run (or let the next scheduled run pick up). AUTH_EXPIRED is not retried automatically — retrying won't fix a dead credential.
RATE_LIMITED
The upstream API returned 429 or otherwise signalled you're over their limit. If the response included a Retry-After header, the executor parses it into retryAfterMs on the error payload.
What to do: This one is retryable (it's in RETRYABLE_ERROR_CODES). If you set maxRetries on the step, the executor will back off and try again. Otherwise, space out your triggers or reduce throughput. If the breaker has tripped for this integration you'll see the calls failing fast — that's by design (see Error Handling).
VALIDATION_ERROR
A field failed property coercion or the connector rejected the input. Thrown by property-coercer or returned by the connector itself.
What to do: Check the step's field mappings against the connector's declared property types. Missing required fields, wrong types (string where a number was expected), malformed enums. The error message usually names the offending field.
TIMEOUT
Either a step exceeded the 30 s step timeout (step scheduler sets code: "TIMEOUT") or the whole pipeline exceeded the 300 s global timeout.
What to do: Look at durationMs on surrounding events to confirm which timeout fired. For a single slow step, narrow the work it does or offload to a queue. For a whole-pipeline timeout, the DAG itself is too heavy for one run — split it into smaller workflows linked by webhooks.
UPSTREAM_ERROR
The integration returned a non-specific 5xx or other error that didn't map to the codes above. Often transient.
What to do: Inspect the error message for the remote API's response. If transient, enabling retries (maxRetries: 2–3 with exponential backoff) will usually ride it out. If persistent, it's a real outage on their side — the circuit breaker will trip after five failures in five minutes and protect you from hammering.
One more you may see: UNKNOWN is the fallback when the executor can't classify the error (e.g. an unexpected throw without a code field). Treat it as a bug signal — something inside the handler didn't conform to the expected shape.
Replay
The Replay button in the run detail toolbar creates a new run from the failed or completed one. The original run is never modified.
What replay does:
- Clones the original run's
workflow_id,workflow_version_id, andinput, then inserts a new row withsource: "replay"andresume_from_run_idset to the original. - Pre-fills a JSON editor with the original
input— you can edit it before confirming, or replay with the original input as-is. - Enqueues a fresh BullMQ execution job. The engine re-loads the current pipeline definition at that
workflow_version_idand uses your current credentials — not a snapshot from when the original ran. - Resumes from the journal: steps whose node IDs already have
step_completed,step_skipped, orstep_failed_continuedevents in the original run are not re-executed. Execution picks up at the failed node.
Only terminal runs can be replayed (succeeded, failed, cancelled, timed_out). Replaying a running or waiting_for_approval run is rejected with a BAD_REQUEST.
The replay button lives in and calls the run.replay tRPC mutation, which delegates to executionService.replayRun in.
Related
- Error Handling — retries, continue-on-failure, circuit breaker, auto-pause.
- Limits — step timeout (30 s), pipeline timeout (300 s), per-user run rate limit.
- Agent Integration Overview — running and inspecting runs from external agents over the runtime API.