← Back to blog

Step Functions vs Lambda orchestration: a simulation

"Just have the Lambda call the next Lambda" is an architecture choice you can hear in five minutes of any whiteboard session. It is also the single most reliable way to build a serverless workflow that is hard to debug, easy to break, and expensive to retry. Step Functions exists to fix this — but it has its own cost cliff at scale.

This post simulates three patterns side by side: Step Functions Standard, Step Functions Express, and orchestrating-in-Lambda. The right choice depends on volume, latency, and how much you value being able to see what your workflow did three weeks ago.

How each bills

Step Functions Standard — $25 per million state transitions. Built for low-volume, long-running, durable workflows. Each Activity, Task, and Pass counts.
Step Functions Express — billed per request + duration ($1 per million requests + GB-hour). Optimised for high-volume, short-lived workflows.
Lambda-orchestrated — just Lambda billing. You pay invocation + duration on every step. No state-transition fee, but no orchestration features either.

Cost at three workflow shapes

For a 5-step workflow with average 200 ms compute per step:

Workflow volume	SF Standard	SF Express	Lambda-only
10K executions/mo (long-running)	$1.25	$4.20	$2.10
1M executions/mo (short)	$125	$84	$210
100M executions/mo (short)	$12,500	$840	$21,000
1B executions/mo (short)	$125,000	$8,400	$210,000

Standard does not scale economically beyond a few million executions per month. Express handles billions cheaply. Lambda-orchestrated is competitive at low volume and ruinous at high — and lacks the safety net.

Reliability is where Lambda-only loses

Step Functions (either)

Built-in retries with backoff, error catch, parallel branches, visual trace, durable state for long-running flows. Failures are debuggable.

Lambda-only

Each step is an HTTP/async invocation. Retries are bespoke. State lives in transit. A partial failure leaves you guessing what completed.

The "async invoke" trap

Async Lambda invocations retry twice silently and drop to DLQ. Teams discover this after the first incident where a downstream step quietly executed three times.

When each one is right

Step Functions Standard — long-running workflows (minutes to a year), human approval steps, anything where audit trail matters. Order fulfilment, batch ETL, data pipelines.
Step Functions Express — high-volume short workflows under 5 minutes. API request validation, content transforms, fan-out/fan-in jobs.
Lambda-only — two-step workflows where the second step is fire-and-forget, or where you genuinely cannot tolerate the ~30 ms Step Functions overhead.

The decision question

"Will I need to ask three weeks from now what this workflow did for execution ID xyz?" If yes, Step Functions. The execution history alone justifies the per-transition cost in any non-trivial workflow.

Simulating it on pinpole

Drop a Step Functions node on the canvas, configure state transitions per execution and execution rate, choose Standard or Express. Compare to a chain of Lambdas with the same compute profile. The simulator returns total cost, end-to-end latency, and failure-recovery cost (how much retry traffic costs when a downstream blip occurs). The "expensive" option is rarely the one you expect.

Stop wiring Lambdas to Lambdas. Simulate the orchestrator option first.

The cost of one production debugging session usually pays for Step Functions for a year.

Start 14-day free trial →

Step Functions vs Lambda orchestration: a simulation

How each bills

Cost at three workflow shapes

Reliability is where Lambda-only loses

Step Functions (either)

Lambda-only

The "async invoke" trap

When each one is right

Simulating it on pinpole

Stop wiring Lambdas to Lambdas. Simulate the orchestrator option first.

Related

See these patterns in your own architecture