Orchestration Serverless Engineering · May 2026 · 8 min read AWS

Step Functions vs orchestrating-in-Lambda: a cost, latency, and reliability simulation

Solutions Architect Workflow-heavy platform May 2026
← Back to blog

"Just have the Lambda call the next Lambda" is an architecture choice you can hear in five minutes of any whiteboard session. It is also the single most reliable way to build a serverless workflow that is hard to debug, easy to break, and expensive to retry. Step Functions exists to fix this — but it has its own cost cliff at scale.

This post simulates three patterns side by side: Step Functions Standard, Step Functions Express, and orchestrating-in-Lambda. The right choice depends on volume, latency, and how much you value being able to see what your workflow did three weeks ago.

How each bills

Cost at three workflow shapes

For a 5-step workflow with average 200 ms compute per step:

Workflow volumeSF StandardSF ExpressLambda-only
10K executions/mo (long-running)$1.25$4.20$2.10
1M executions/mo (short)$125$84$210
100M executions/mo (short)$12,500$840$21,000
1B executions/mo (short)$125,000$8,400$210,000

Standard does not scale economically beyond a few million executions per month. Express handles billions cheaply. Lambda-orchestrated is competitive at low volume and ruinous at high — and lacks the safety net.

Reliability is where Lambda-only loses

Step Functions (either)

Built-in retries with backoff, error catch, parallel branches, visual trace, durable state for long-running flows. Failures are debuggable.

Lambda-only

Each step is an HTTP/async invocation. Retries are bespoke. State lives in transit. A partial failure leaves you guessing what completed.

The "async invoke" trap

Async Lambda invocations retry twice silently and drop to DLQ. Teams discover this after the first incident where a downstream step quietly executed three times.

When each one is right

The decision question

"Will I need to ask three weeks from now what this workflow did for execution ID xyz?" If yes, Step Functions. The execution history alone justifies the per-transition cost in any non-trivial workflow.

Simulating it on pinpole

Drop a Step Functions node on the canvas, configure state transitions per execution and execution rate, choose Standard or Express. Compare to a chain of Lambdas with the same compute profile. The simulator returns total cost, end-to-end latency, and failure-recovery cost (how much retry traffic costs when a downstream blip occurs). The "expensive" option is rarely the one you expect.

Stop wiring Lambdas to Lambdas. Simulate the orchestrator option first.

The cost of one production debugging session usually pays for Step Functions for a year.

Start 14-day free trial →