DevOps · AWS Lambda

Lambda throttled at 4x load — we caught it before deploy

PinPole Engineering 8 min read April 2026

It's 2am. PagerDuty fires. Your on-call engineer spends three hours tracing a cascading timeout back to a Lambda concurrency limit. The fix takes five minutes. The investigation took three hours. The incident report takes a day. The trust your team lost in the deployment process takes longer to rebuild.

This isn't a hypothetical. It's the most common production incident pattern we hear from the engineers who sign up for PinPole. The architecture was designed correctly. The code was tested. The limit that broke everything was a configuration value that nobody thought to check under spike traffic.

The scenario: event processing at launch scale

Here's the architecture. It's standard, well-designed, and it will break.

Architecture

API GatewayLambda (event-processor)DynamoDB (events-table)SQS (async-queue)

Expected traffic: 3,000 RPS steady state. Launch spike: 12,400 RPS.

The Lambda function has a default concurrency limit of 1,000. At steady-state 3,000 RPS, each invocation takes ~300ms, so you need roughly 900 concurrent executions. That fits. Barely.

At launch spike — 12,400 RPS — you need 3,720 concurrent executions. The limit is 1,000. Lambda throttles. Requests queue. API Gateway times out. Your users see 500 errors.

What PinPole shows you

Drag the four services onto the canvas. Wire them together. Set the Lambda concurrency to 1,000 (the default). Select Spike traffic pattern. Set peak RPS to 12,400. Hit Simulate.

Here's what the per-node metrics show:

API Gateway: 12,400 req/s ✓ Lambda: 1,000 / 3,720 needed ✕ DynamoDB: 2,340 / 3,000 WCU ⚠ SQS: queue depth 14,230 ✕

The simulation catches three problems in under 30 seconds:

1. Lambda concurrency is the bottleneck. At 12,400 RPS with 300ms execution time, you need 3,720 concurrent executions. You have 1,000. PinPole's recommendation: increase reserved concurrency to 4,000, or enable provisioned concurrency for launch.

2. DynamoDB WCU is close to the limit. At 2,340 out of 3,000 provisioned WCU, you're at 78% capacity during the spike. The recommendation engine flags this as an advisory — it won't fail, but you have no headroom.

3. SQS queue depth is exploding. Because Lambda is throttling, messages are backing up. The queue depth of 14,230 means your async processing is 11 minutes behind. The recommendation: this is a downstream effect of the Lambda bottleneck. Fix concurrency first, re-simulate.

Fix it, re-simulate, confirm

Click the Lambda recommendation. One click applies the fix — concurrency goes to 4,000. PinPole re-queues the simulation automatically.

API Gateway: 12,400 req/s ✓ Lambda: 3,720 / 4,000 ✓ DynamoDB: 2,340 / 3,000 WCU ⚠ SQS: queue depth 42 ✓

Lambda passes. SQS queue depth drops from 14,230 to 42. DynamoDB is still at 78%, but that's a business decision, not an incident risk.

Total time: under two minutes. No infrastructure provisioned. No cloud spend. No 2am wake-up call.

The cost of not simulating

Without PinPole, this is the sequence: you deploy. You launch. Traffic spikes. Lambda throttles. API Gateway returns 500s. PagerDuty fires. Your on-call engineer investigates. Three hours later, they find the concurrency limit. They increase it. They wait for the fix to propagate. The launch window is over. The incident report goes to engineering leadership.

The fix is the same five-minute configuration change either way. The difference is when you find it.

The pattern: Every architecture has a limit that will break under spike traffic. The question is whether you find it at design time or at 2am. PinPole finds it at design time.

Try it yourself

Build this exact architecture on the PinPole canvas. Run the spike simulation. See the throttling happen in the simulation, not in production.

Simulate before you deploy

No cloud account required. No credit card. Free tier available.

Start for free →