"Just have the Lambda call the next Lambda" is an architecture choice you can hear in five minutes of any whiteboard session. It is also the single most reliable way to build a serverless workflow that is hard to debug, easy to break, and expensive to retry. Step Functions exists to fix this — but it has its own cost cliff at scale.
This post simulates three patterns side by side: Step Functions Standard, Step Functions Express, and orchestrating-in-Lambda. The right choice depends on volume, latency, and how much you value being able to see what your workflow did three weeks ago.
How each bills
- Step Functions Standard — $25 per million state transitions. Built for low-volume, long-running, durable workflows. Each Activity, Task, and Pass counts.
- Step Functions Express — billed per request + duration ($1 per million requests + GB-hour). Optimised for high-volume, short-lived workflows.
- Lambda-orchestrated — just Lambda billing. You pay invocation + duration on every step. No state-transition fee, but no orchestration features either.
Cost at three workflow shapes
For a 5-step workflow with average 200 ms compute per step:
| Workflow volume | SF Standard | SF Express | Lambda-only |
|---|---|---|---|
| 10K executions/mo (long-running) | $1.25 | $4.20 | $2.10 |
| 1M executions/mo (short) | $125 | $84 | $210 |
| 100M executions/mo (short) | $12,500 | $840 | $21,000 |
| 1B executions/mo (short) | $125,000 | $8,400 | $210,000 |
Standard does not scale economically beyond a few million executions per month. Express handles billions cheaply. Lambda-orchestrated is competitive at low volume and ruinous at high — and lacks the safety net.
Reliability is where Lambda-only loses
Step Functions (either)
Built-in retries with backoff, error catch, parallel branches, visual trace, durable state for long-running flows. Failures are debuggable.
Lambda-only
Each step is an HTTP/async invocation. Retries are bespoke. State lives in transit. A partial failure leaves you guessing what completed.
The "async invoke" trap
Async Lambda invocations retry twice silently and drop to DLQ. Teams discover this after the first incident where a downstream step quietly executed three times.
When each one is right
- Step Functions Standard — long-running workflows (minutes to a year), human approval steps, anything where audit trail matters. Order fulfilment, batch ETL, data pipelines.
- Step Functions Express — high-volume short workflows under 5 minutes. API request validation, content transforms, fan-out/fan-in jobs.
- Lambda-only — two-step workflows where the second step is fire-and-forget, or where you genuinely cannot tolerate the ~30 ms Step Functions overhead.
"Will I need to ask three weeks from now what this workflow did for execution ID xyz?" If yes, Step Functions. The execution history alone justifies the per-transition cost in any non-trivial workflow.
Simulating it on pinpole
Drop a Step Functions node on the canvas, configure state transitions per execution and execution rate, choose Standard or Express. Compare to a chain of Lambdas with the same compute profile. The simulator returns total cost, end-to-end latency, and failure-recovery cost (how much retry traffic costs when a downstream blip occurs). The "expensive" option is rarely the one you expect.
Stop wiring Lambdas to Lambdas. Simulate the orchestrator option first.
The cost of one production debugging session usually pays for Step Functions for a year.
Start 14-day free trial →