Lambda · Cold Starts Spike Traffic Pre-Deploy Simulation 4,000 words · March 2026

How We Model Lambda Cold-Start Behaviour Under Spike Traffic

By a Senior AWS Solutions Architect Growth-Stage Technology Company Engineering Blog
← All posts

There is a particular category of AWS incident that I have started calling the "everything looked fine in testing" failure. It goes like this. You design a serverless API. You configure a Lambda function with sensible defaults, wire it through API Gateway, point it at DynamoDB, and test it in your dev environment with the handful of engineers pinging it throughout the day. Everything looks healthy. Latency is acceptable. Costs are tracking to plan.

Then you run a campaign. Or you land on the front page. Or your sales team does their job too well and signs a new customer who brings three thousand of their users on day one. Your traffic goes from three hundred requests per second to three thousand in the space of a minute. And your Lambda function, which has never had to spin up more than a dozen concurrent instances at once, is now being asked to handle a hundred.

2,400ms
p99 latency during spike
90
concurrent cold starts at peak
80ms
p99 latency at steady state

Customers leave. The Slack channel lights up. You are spending a Saturday explaining to your CTO why the architecture that "passed all our tests" just fell over under a load it should have anticipated. I have been in this situation. Not once. The second time is when I stopped treating load testing as a post-deployment activity.

This post is about how I now model Lambda cold-start behaviour under spike traffic before a single resource is provisioned — and specifically how I use pinpole to make that modelling rigorous, reproducible, and tied directly to the deployment that follows.

The Cold Start Problem, Precisely Stated

Before going into tooling and workflow, it is worth being precise about what we are actually modelling. Lambda's execution model does not maintain persistent servers. When an invocation arrives and no warm execution environment exists for that function, Lambda must provision one. That provisioning sequence involves selecting a host, initialising the execution environment, loading the runtime, and executing your initialisation code — the logic in your function's module-level scope that runs once per environment, not once per invocation. The total elapsed time for this sequence is the cold start duration.

Cold start duration is not a constant. It varies along several dimensions:

⚡ The spike dynamic

Cold starts are not primarily a problem at steady state. The cold start problem is a spike problem. When traffic increases rapidly, Lambda must provision new environments in parallel. Under a genuine traffic spike, you can find yourself with dozens or hundreds of concurrent cold starts happening simultaneously — all spiking p99 latency at the moment when your users' experience is most consequential. Steady-state load testing does not expose this.

Why "Deploy to Discover" Is Not a Testing Strategy

For most of my career, the accepted practice for understanding how a Lambda architecture behaved under spike traffic was to deploy it and generate spike traffic against the live environment. This workflow has real problems, and I do not think the engineering community has been honest enough about them.

pinpole changes that constraint directly. The core value proposition — the reason I now use it as the primary validation tool for any Lambda-heavy architecture — is that it runs traffic simulation against architecture designs before any infrastructure is provisioned.

Building the Simulation Model: Canvas and Configuration

Before running any simulation, I spend time on the canvas getting the architecture right. This is not busywork — the fidelity of the simulation depends directly on the fidelity of the model.

For a typical Lambda API, my starting canvas is: Route 53 → CloudFront → API Gateway → Lambda → DynamoDB. pinpole enforces compatibility and directionality rules in real time as I wire services together — if I attempt an invalid connection, the platform blocks it before it is created.

The Lambda node configuration panel is where most of the cold-start-relevant decisions live:

Config Parameter Baseline Value Cold Start Relevance Notes
Runtime Node.js 20.x High Directly factors into cold start latency model
Memory Allocation 512 MB High More CPU → faster init; non-linear cost relationship
Reserved Concurrency Explicit (not default) Critical Defines throttle ceiling; reduces pool for other functions
Provisioned Concurrency 0 (baseline run) Intentional Set to zero first to observe the cold start problem honestly
⚠ Critical: Concurrency and Simulation State

If you change Lambda concurrency settings while a simulation is paused, you must stop the simulation fully and restart it. Concurrency values are applied at simulation initialisation. Resuming a paused run after changing concurrency will not pick up the new values — you will be looking at results that reflect the previous configuration.

The Spike Pattern Simulation: What the Metrics Tell You

With the canvas wired and Lambda configured at baseline, I set up the first spike simulation. pinpole provides four traffic patterns: Constant, Ramp, Spike, and Wave. For cold start modelling, Spike is the right choice and Constant will actively mislead you. Under a Constant pattern at your expected steady-state RPS, Lambda has time to maintain a pool of warm environments and cold starts are infrequent. The metrics look healthy, and you might conclude the architecture is production-ready.

For spike testing a Lambda API, my three-scenario approach is:

Scenario 1 — Baseline

1,000 RPS

Constant traffic at expected daily load. Confirms steady-state health. Cold starts are infrequent here — this is your sanity check, not your stress test.

Scenario 2 — Peak

3–5k RPS

Spike at 3–5× baseline. Your expected high-traffic period: busy Monday morning, a sales campaign, a feature launch. Watch that alert counter.

Scenario 3 — Stress

10k RPS

Spike at 10× baseline. The scenario where the architecture either holds or it does not. This is where cold starts become production incidents.

What I am watching during the Spike simulation is Lambda's latency metric in the node panel. Under Constant load at 1,000 RPS, Lambda latency might sit at 150–200ms. Under a Spike at 10,000 RPS, I typically observe latency spike sharply in the first several seconds as Lambda provisions new execution environments in parallel, then stabilise as warm instances fill the concurrency pool. The shape and magnitude of that initial spike is the data I am after.

The AI Recommendation Cycle: Closing the Cold Start Gap

After the baseline spike simulation, I request AI recommendations. The pinpole recommendation engine analyses the current architecture and simulation results and returns prioritised, categorised findings. For a Lambda API with no provisioned concurrency running under spike traffic, the recommendations follow a predictable priority order that I have found is also the correct order to address them:

1
Add CloudFront  WARNING

At high RPS, CloudFront absorbs cacheable requests before they reach API Gateway and Lambda — reducing the effective invocation rate and smoothing the spike. A burst that causes 10,000 Lambda invocations per second at origin may translate to only 2,000–3,000 after cache hits absorb the rest. This does not eliminate cold starts, but it reduces their frequency and the peak concurrency demand that drives them.

2
Enable Provisioned Concurrency  INFO

The direct cold start mitigation. Pre-initialises a specified number of execution environments, keeping them warm with no cold start delay. My heuristic for the initial value is 20–30% of expected peak concurrency — covering the rapid burst at the start of a spike, while on-demand scaling fills in remaining capacity as the spike sustains. Stop the simulation fully, apply the change, restart, and re-run Spike.

3
Introduce Circuit Breaker Pattern  WARNING

Once Lambda's own cold start behaviour is addressed, the simulation often surfaces downstream risk. Without a circuit breaker, Lambda will continue invoking degraded downstream services, queuing up invocations and exhausting its concurrency pool waiting for timeouts that may not come. Verify the circuit breaker thresholds match your downstream SLAs explicitly.

4
Implement Asynchronous Processing via SQS  INFO

For write-path Lambda functions, introducing SQS between API Gateway and Lambda converts the invocation model from push to pull. Lambda controls the consumption rate, naturally smoothing traffic spikes — the burst fills the queue, and Lambda works through messages at a controlled rate. Note: SQS visibility timeout must exceed your Lambda timeout with margin.

5
Configure Lambda Auto Scaling  INFO

Ensure Lambda's concurrency limits can grow with sustained load. The auto-scaling configuration sets targets for concurrency scaling — typically keeping utilisation within 60–70% of reserved concurrency at expected peak RPS. This provides headroom for unexpected additional spikes without hitting the hard throttle ceiling.

Execution History: The Version Record of Your Optimisation Journey

Every simulation run in pinpole is saved automatically to the Execution History log. Each entry records the run number and status, timestamp, duration, peak RPS, and the estimated monthly cost of the simulated architecture at that load level.

The Version Workflow Viewer stores the exact architecture snapshot associated with each run. I can select any historical run and inspect the exact canvas state at that point — the services present, the connections wired, and every configuration value on every node. For cold start modelling work, this creates a precise, version-controlled record of the optimisation journey.

When I hand an architecture to a team for implementation — or when a new engineer joins and needs to understand why the architecture is configured the way it is — the simulation history is the evidence. It is a design artefact that carries its own rationale, which is a capability that no draw.io diagram or Lucidchart export has ever provided.

Deploying the Validated Architecture

Once the architecture passes the spike simulation with all WARNING-level recommendations addressed and p99 latency within the target budget, pinpole's deploy-to-cloud workflow takes over.

✓ Security model

The deployment uses a secure STS cross-account IAM workflow. pinpole does not store credentials — the integration is established through a one-time IAM role configuration in the target AWS account, and each deployment uses short-lived STS tokens. This is the right security model; I would not adopt a deployment tool that stored long-lived AWS credentials.

The recommended promotion sequence is Canvas → ST (System Test) → UAT → PR (Production). I do not deploy directly from canvas to production. The ST and UAT stages confirm that the architecture behaves correctly in a real AWS account — with real Lambda cold starts, real DynamoDB latency, real API Gateway throttle enforcement — before production traffic is at risk.

A Note on What Competitors Do and Do Not Offer

The absence of pre-deployment traffic simulation is not an oversight in competing tools — it reflects a genuinely hard engineering problem. Simulating how a Lambda function behaves under spike traffic without deploying it requires a model that accounts for runtime, memory, initialisation code, invocation model, concurrency pool dynamics, and the interaction between provisioned and on-demand concurrency.

Tool Visual Design Cost Est. Spike Simulation Pre-Deploy Verdict
Cloudcraft (Datadog) Excellent diagrams, no traffic modelling
Brainboard Hints only IaC generation, zero simulation capability
System Initiative Wiring correctness, not throughput or latency
AWS Infrastructure Composer 1,134+ resource types, no performance modelling
k6 / Gatling / JMeter Post-deploy only Excellent post-deploy tools, not pre-deploy
pinpole ✓ (live) ✓ Pre-deploy The only tool that simulates spike traffic pre-deploy

The Broader Discipline: Simulation as Engineering Standard

Cold start modelling is the use case that motivated me to adopt pinpole, but it is not the only simulation I run. The workflow generalises across every architectural question that involves load-dependent behaviour. DynamoDB hot partition risk is not visible under steady-state load — it appears under spike traffic when a campaign drives thousands of writes per second to a poorly chosen partition key. API Gateway throttle limits are not hit at daily average load — they are hit at the peak of a content launch.

Every architectural decision that has load-dependent consequences should be validated under simulated spike traffic before deployment, not after. The cost of discovering a cold start problem in simulation is zero. The cost of discovering it in production — incident response time, engineer weekend hours, customer experience degradation — is substantial.

Summary: The Spike Simulation Checklist

For engineers who want to apply this workflow to their own Lambda architectures, the sequence I follow on every new architecture:

The Saturday incident that opened this post was the last time I discovered a cold start problem in production.

The workflow described here is why. Run a Spike simulation on an architecture you think is ready to deploy — the first time you watch Lambda latency spike to two seconds, you will understand why this belongs at the beginning of infrastructure delivery, not the end.

Start 14-day free trial →

Senior AWS Solutions Architect at a growth-stage technology company. AWS Solutions Architect — Professional. Focuses on serverless architecture design, infrastructure cost optimisation, and pre-deployment simulation as a standard engineering practice.

Tags: AWS · Lambda · Cold Starts · Serverless · Spike Traffic · Shift-Left · pinpole · FinOps

This post reflects the author's independent experience using pinpole in production architecture work. A 14-day free trial with full feature access — including Spike, Ramp, and Wave traffic patterns, AI recommendations, execution history, and deploy-to-cloud — requires no credit card to start.