Reliability Capacity Engineering · May 2026 · 9 min read AWS

Designing for traffic spikes: simulating Black Friday and product launches before they happen

Reliability Engineer E-commerce platform May 2026
← Back to blog

Every team that has ever survived a Black Friday or a successful product launch has a war story. Every team that hasn't survived one also has a war story. The difference between the two groups isn't talent or budget — it's whether anyone simulated the spike before it arrived. Steady-state load tests catch the wrong class of problem. Spikes catch the right one.

What spike traffic actually breaks

Four spike patterns worth simulating

Sale-launch spike

0→10× in <30 seconds, sustained 10× for 15 minutes, then taper. Tests instant-scale capability of every layer.

Email-blast wave

Multiple 3× spikes over an hour as email batches deliver. Tests retry storm tolerance and warm-pool stability.

Geographic cascade

Wave moving from one timezone to the next. Tests regional scaling and connection pool churn.

Influencer flash

0→50× in 60 seconds, sustained 5 minutes, gone. Tests circuit breakers and graceful-degradation paths.

A simulation walkthrough

Take a typical e-commerce checkout: CloudFront → API Gateway → Lambda → DynamoDB + RDS (orders) + Stripe (external). On the pinpole canvas, set the traffic source to Spike at 5,000 RPS peak from a 200 RPS baseline. Run the simulation.

What surfaces:

  1. Lambda concurrency hits 1,000 within 12 seconds. API Gateway starts returning 429s on overflow.
  2. DynamoDB on a hot partition shows throttling. The orders write throughput is bottlenecked by a single shard key (likely the customer ID).
  3. RDS connection count spikes as Lambda scales. The simulator surfaces "approaching connection limit at 850/1000."
  4. Stripe call latency grows as the external service throttles. Lambda functions hang waiting for the response, consuming concurrency, accelerating the death spiral.

None of this happens in steady-state load tests. All of it happens in production launches.

What you actually fix before the launch

The rule we follow

No major launch is approved until a Spike simulation passes at 1.5× the expected peak. If it fails, the launch slips. The cost of one delayed launch is small. The cost of one failed launch — in lost revenue, refunds, and reputational damage — is the next quarter's road map.

Simulating spikes on pinpole

The canvas Spike traffic pattern propagates the spike through every node in the architecture and surfaces failure modes per-service. You see exactly where the architecture breaks, in what order, and at what RPS. The AI recommendations engine flags the fix for each. Run, fix, re-run. By the time the launch arrives, every prevented failure is documentation, not a war story.

Black Friday is a deterministic problem. Simulate it before it simulates you.

Spike simulation on the pinpole canvas surfaces the four classes of failure that load tests miss.

Start 14-day free trial →