← Back to blog

Designing for traffic spikes: simulating Black Friday before it happens

Every team that has ever survived a Black Friday or a successful product launch has a war story. Every team that hasn't survived one also has a war story. The difference between the two groups isn't talent or budget — it's whether anyone simulated the spike before it arrived. Steady-state load tests catch the wrong class of problem. Spikes catch the right one.

What spike traffic actually breaks

Lambda regional concurrency limit. Default 1,000 per region. A 5,000-RPS spike on functions with 200 ms average duration hits the wall in seconds.
API Gateway throttling defaults. 10,000 RPS default per region. Often left untouched. Burst on top of sustained traffic blows through it.
ALB pre-warming. ALBs scale on a slope of minutes, not seconds. A 10× spike that lands in 30 seconds will see 5xxs before scaling catches up.
Downstream cascades. The frontend scales; the auth service doesn't. The auth service starts throttling; clients retry; the retry storm consumes capacity that should be serving real traffic.
Connection pool exhaustion. Lambda or container fleet scales; each new instance opens a connection to RDS; you blow the connection limit and database goes hostile.

Four spike patterns worth simulating

Sale-launch spike

0→10× in <30 seconds, sustained 10× for 15 minutes, then taper. Tests instant-scale capability of every layer.

Email-blast wave

Multiple 3× spikes over an hour as email batches deliver. Tests retry storm tolerance and warm-pool stability.

Geographic cascade

Wave moving from one timezone to the next. Tests regional scaling and connection pool churn.

Influencer flash

0→50× in 60 seconds, sustained 5 minutes, gone. Tests circuit breakers and graceful-degradation paths.

A simulation walkthrough

Take a typical e-commerce checkout: CloudFront → API Gateway → Lambda → DynamoDB + RDS (orders) + Stripe (external). On the pinpole canvas, set the traffic source to Spike at 5,000 RPS peak from a 200 RPS baseline. Run the simulation.

What surfaces:

Lambda concurrency hits 1,000 within 12 seconds. API Gateway starts returning 429s on overflow.
DynamoDB on a hot partition shows throttling. The orders write throughput is bottlenecked by a single shard key (likely the customer ID).
RDS connection count spikes as Lambda scales. The simulator surfaces "approaching connection limit at 850/1000."
Stripe call latency grows as the external service throttles. Lambda functions hang waiting for the response, consuming concurrency, accelerating the death spiral.

None of this happens in steady-state load tests. All of it happens in production launches.

What you actually fix before the launch

Request a regional Lambda concurrency limit raise (it's free, takes ~24 hours).
Pre-warm Provisioned Concurrency to the floor of expected spike. Auto-scale on top.
Configure API Gateway throttling explicitly. Don't rely on regional defaults.
Implement RDS Proxy or move Stripe-style external calls behind a queue with explicit DLQ.
Add timeouts and circuit breakers on every external call. Default Lambda timeout (3 seconds) is rarely the right number for downstream-aware design.
Pre-warm ALBs by ramping traffic over the 30 minutes before the launch.

The rule we follow

No major launch is approved until a Spike simulation passes at 1.5× the expected peak. If it fails, the launch slips. The cost of one delayed launch is small. The cost of one failed launch — in lost revenue, refunds, and reputational damage — is the next quarter's road map.

Simulating spikes on pinpole

The canvas Spike traffic pattern propagates the spike through every node in the architecture and surfaces failure modes per-service. You see exactly where the architecture breaks, in what order, and at what RPS. The AI recommendations engine flags the fix for each. Run, fix, re-run. By the time the launch arrives, every prevented failure is documentation, not a war story.

Black Friday is a deterministic problem. Simulate it before it simulates you.

Spike simulation on the pinpole canvas surfaces the four classes of failure that load tests miss.

Start 14-day free trial →