DynamoDB vs RDS: Simulation-Based Cost & Performance at 10K–1M RPS

← Back to blog

I have made this mistake exactly once. About three years into my AWS career, I inherited a Lambda-based API with DynamoDB on the backend and was tasked with migrating it to Aurora PostgreSQL because the data model had grown relational and the team wanted proper foreign key constraints. The migration went smoothly in UAT. We promoted to production on a Tuesday night. By Thursday morning, Lambda concurrency was exhausted, Aurora was throwing connection pool errors, and I was sitting in a war room with the CTO trying to explain why a database migration — not a code change — had caused a full API outage at 80K RPS.

I had never tested what the connection behaviour would actually look like at production load. I had assumed it would be fine. It was not fine.

You cannot assume your way through database selection at scale. The choice between DynamoDB and RDS has enormous cost and performance implications that diverge dramatically depending on your traffic profile — and those implications are nearly impossible to reason about accurately until you have simulation data in front of you.

This post documents a structured comparison I ran using pinpole's pre-deployment traffic simulation. The methodology is deliberately practical: build the architecture on a canvas, configure the services to production-realistic specs, run simulations at 10K, 100K, and 1M RPS using different traffic patterns, read the per-node metrics and estimated costs, apply the AI recommendations, and iterate. No deployed infrastructure. No AWS bill until the design is validated. No Thursday morning war rooms.

The architecture under test

The test topology is a representative serverless API stack — the pattern most common at growth-stage companies — running in a single AWS region with multi-AZ availability. The critical path on the pinpole canvas is:

Route 53 → CloudFront → API Gateway → Lambda → [ DynamoDB | RDS/Aurora ]

Supporting services include WAF (in front of CloudFront), SQS for the write decoupling path, and ElastiCache for the RDS comparison scenario. I ran two separate canvases — one for each database — and a third for Aurora Serverless v2 as a middle-ground comparison.

For the RDS scenarios, RDS Proxy is always present. If you're running Lambda against RDS at any significant scale without a proxy managing the connection pool, you will exhaust database connections under burst load. This is, in fact, essentially the architecture bug that caused my Tuesday night migration disaster. Lambda creates a new execution environment for each concurrent invocation — without a proxy, it creates a new database connection for each one.

All configurations were set to production-realistic values: Lambda at 1769MB memory (1 vCPU equivalent), 30-second timeout; API Gateway with a 10K RPS burst limit per region; DynamoDB in both on-demand and provisioned capacity modes; RDS PostgreSQL on db.r6g instances scaled to workload; Aurora Serverless v2 with ACU limits appropriate to each tier.

Test methodology

pinpole's simulation engine propagates synthetic load through the architecture and reports real-time metrics per node: current RPS, latency (p50 and p99), health status, utilisation percentage, and live monthly cost estimate. I ran four traffic patterns at each RPS tier:

Constant — steady baseline load, the "what does this cost per month" scenario
Ramp — traffic grows linearly to peak over ten minutes, the "Tuesday morning traffic ramp" scenario
Spike — sudden 10× burst from baseline, the "marketing campaign dropped" scenario
Wave — oscillating load between 30% and 100% of peak, the "24-hour natural traffic cycle" scenario

Each simulation run was saved to pinpole's Execution History. The version comparison view was particularly useful when iterating on provisioned DynamoDB capacity settings — pulling up two simulation runs and seeing the cost and latency delta directly saved significant time. I requested AI recommendations after each major configuration change.

Results: 10K RPS

At 10K requests per second, this is solidly "well-funded startup" territory — significant enough to require proper production engineering, but within range of multiple database options without heroic infrastructure.

Performance

Configuration	p50 Latency	p99 Latency
DynamoDB (on-demand)	2ms	7ms
DynamoDB (provisioned, 9K RCU / 1K WCU)	2ms	7ms
RDS PostgreSQL db.r6g.2xlarge + Proxy	3ms	11ms
Aurora MySQL db.r6g.2xlarge + Proxy	4ms	13ms
Aurora Serverless v2 (0.5–16 ACU)	4ms	15ms

At 10K RPS, all configurations are healthy. DynamoDB's single-digit millisecond consistency advantage is real but modest — a 4ms p99 difference is unlikely to drive a database selection decision on its own at this scale.

The spike pattern is where differences begin to surface. Under a 10× burst (10K to 100K instantaneous), DynamoDB on-demand absorbs the spike without configuration changes. Provisioned DynamoDB triggers auto-scaling, which takes 3–7 minutes to fully respond — during which, pinpole flagged elevated p99 latency up to 28ms and a WARNING recommendation to enable more aggressive scale-out settings. Aurora Serverless v2 handled the spike gracefully. RDS PostgreSQL with a fixed instance showed connection pool pressure at the spike peak — the proxy absorbed the burst but p99 climbed to 38ms for about 90 seconds.

First signal

If your traffic is spiky and unpredictable, the managed scaling story for DynamoDB on-demand and Aurora Serverless v2 is genuinely better than fixed-instance RDS at this scale tier. The simulation makes this visible in minutes.

Cost — estimated monthly at 10K RPS sustained

Configuration	Monthly Estimate	Key Cost Drivers
DynamoDB (on-demand)	~$8,400	$0.25/M reads, $1.25/M writes — per-request pricing is expensive at sustained load
DynamoDB (provisioned)	~$1,150	Fixed RCU/WCU capacity — highly efficient at predictable sustained load
RDS PostgreSQL db.r6g.2xlarge + Proxy + Storage	~$870	Instance hours + proxy vCPU + 500GB gp3 storage
Aurora MySQL db.r6g.2xlarge + Proxy + Storage	~$980	Aurora per-I/O charges add to base instance cost
Aurora Serverless v2 (avg 4 ACU)	~$720	ACU-hour pricing at moderate utilisation; storage separate

⚠ Biggest surprise at 10K RPS

DynamoDB on-demand is nearly 10× more expensive than a well-configured RDS instance for sustained, predictable traffic. DynamoDB's reputation as the "serverless database" leads engineers to assume it is cheap at modest scales. For a product with a consistent diurnal load pattern — which describes most B2B SaaS — provisioned DynamoDB or Aurora Serverless v2 delivers dramatically better cost efficiency.

pinpole's AI recommendations at this tier flagged two cost items I acted on immediately: switching DynamoDB from on-demand to provisioned mode (reducing the estimated monthly cost by $7,250 on that canvas alone) and right-sizing the Aurora read replica from r6g.2xlarge to r6g.xlarge given actual utilisation was sitting at 34%.

Results: 100K RPS

One hundred thousand requests per second is a different environment. This is where architectural decisions made at the 10K tier begin to compound, and where I have personally seen the most costly post-deployment surprises. The AWS service quotas alone require more deliberate configuration — API Gateway's regional limit of 10K RPS is a hard wall that requires a service quota increase, and Lambda concurrency at 100K RPS means you're meaningfully consuming your regional concurrency pool.

Performance

Configuration	p50 Latency	p99 Latency	Throttle Events
DynamoDB (on-demand)	2ms	8ms	0
DynamoDB (provisioned + auto-scaling)	2ms	9ms	0 (tuned scale-out)
RDS PostgreSQL db.r6g.8xlarge + Proxy	5ms	22ms	0
Aurora MySQL db.r6g.4xlarge + 2 replicas + Proxy	5ms	17ms	0
Aurora Serverless v2 (2–64 ACU)	5ms	19ms	0

DynamoDB's latency profile remains essentially flat moving from 10K to 100K RPS. Because there is no connection state and request routing is partition-key-based, the service does not exhibit the connection management pressure that relational databases do as concurrency climbs.

During the Spike pattern test, RDS PostgreSQL at this scale exhibited a problem that no amount of static configuration review would have surfaced without simulation: hot partition behaviour on the RDS Proxy connection pool under sudden burst. When traffic spiked from 100K to 300K instantaneous RPS, proxy connection routing created transient contention on a subset of database connections. pinpole flagged this as a WARNING, recommended adding a second RDS Proxy instance in a different AZ, and estimated that this change would reduce p99 latency under spike conditions from 89ms to approximately 31ms. I verified this by applying the recommendation to the canvas and re-running the spike simulation — the p99 in the updated run was 29ms.

⚠ What simulation caught that a code review could not

The RDS Proxy AZ distribution issue under spike load is not visible in a diagram, a code review, or a static cost estimate. It requires a traffic model that reflects real burst conditions against the actual connection pool configuration. Simulation flagged it before a dollar of infrastructure was provisioned.

Cost — estimated monthly at 100K RPS sustained

Configuration	Monthly Estimate	Notes
DynamoDB (on-demand)	~$84,000	Per-request pricing scales linearly — prohibitive at sustained high volume
DynamoDB (provisioned + auto-scaling)	~$9,800	Highly efficient; requires accurate capacity planning
RDS PostgreSQL db.r6g.8xlarge + Proxy + Replicas	~$5,200	Large instance + Multi-AZ + 2 read replicas + proxy overhead
Aurora MySQL mid-tier + 2 replicas + Proxy	~$6,500	Per-I/O charges and replica costs compound
Aurora Serverless v2 (avg 12 ACU)	~$4,900	Best cost-efficiency at this tier for variable workloads

The $74K/month insight — DynamoDB on-demand vs provisioned at 100K RPS

DynamoDB on-demand (100K RPS sustained) ~$84,000/mo

DynamoDB provisioned + auto-scaling ~$9,800/mo

Monthly saving — one configuration change ~$74,200/mo

Annual saving ~$890,000/yr

Time to identify this on pinpole canvas Under 1 hour

pinpole's AI recommendations at this tier produced five items, two of which I found genuinely non-obvious. First, adding ElastiCache in front of DynamoDB for read-heavy workloads (>80% reads) — estimated to reduce provisioned DynamoDB cost from $9,800 to approximately $6,100/month at 80% cache hit ratio. Second, a partition key diversification warning — the simulation flagged a hot partition risk based on the key schema configuration. If a single partition key value is receiving a disproportionate share of requests (common in session-key-based access patterns), DynamoDB will throttle at the partition level even if provisioned throughput is sufficient in aggregate. pinpole recommended a write sharding pattern using a composite key suffix.

Results: 1M RPS

One million requests per second. For most companies, a theoretical exercise — but the scenario that most starkly illuminates the architectural ceiling of each option, and directly relevant to companies experiencing rapid growth who need to know whether their current architecture can survive 10× scale.

At this scale, the AWS service quota landscape changes the design conversation materially. Lambda has a regional concurrency default of 1,000 concurrent executions; at 1M RPS with even 100ms average Lambda execution time, you need 100,000 concurrent Lambda executions — this requires advance quota negotiation with AWS. API Gateway is region-limited to 10K RPS by default and requires multiple distributions or an account-level limit increase to reach 1M RPS aggregate throughput. pinpole's Node Configuration panel surfaces these quota limits and engineering notes before simulation, which I'd strongly recommend reading at this tier.

Performance

Configuration	p50 Latency	p99 Latency	Throttle Risk	Operational Complexity
DynamoDB (on-demand)	2ms	9ms	Low	Low
DynamoDB (provisioned + adaptive scaling)	2ms	11ms	Medium	Medium
Aurora db.r6g.16xlarge + 4 replicas + Proxy fleet	9ms	52ms	High	Very High
Aurora Serverless v2 (8–256 ACU)	8ms	44ms	Medium	Medium
RDS PostgreSQL (multi-instance sharding)	10ms	68ms	High	Extreme

At 1M RPS, the gap between DynamoDB and relational databases is no longer a latency preference — it is an architectural boundary. Aurora at 1M RPS requires the largest available instance class, four or more read replicas, a fleet of RDS Proxy instances, and connection routing logic at the application layer. The p99 latency of 52ms reflects the connection management overhead under extreme concurrency.

RDS PostgreSQL at 1M RPS through traditional vertical and horizontal scaling is, candidly, not a practical architecture. The simulation made this visible within minutes: connection pool exhaustion warnings, proxy fleet sizing approaching the limits of a viable configuration, and estimated operational complexity that translates directly into incident risk. If you need a relational data model at 1M RPS, Aurora with a carefully engineered multi-replica topology is the viable path — but you are accepting significantly higher latency and a meaningful operational burden.

Cost — estimated monthly at 1M RPS sustained

Configuration	Monthly Estimate	Notes
DynamoDB (on-demand)	~$840,000	Per-request at this scale: not viable as a sustained architecture
DynamoDB (provisioned + adaptive scaling + DAX)	~$109,000	DAX cache fleet adds ~$14K/month; provisioned capacity ~$95K
Aurora db.r6g.16xlarge + 4 replicas + Proxy fleet	~$38,000	Fixed infrastructure cost becomes relatively efficient at extreme scale
Aurora Serverless v2 (avg 48 ACU)	~$32,000	ACU scaling at max range; cost-effective but latency trade-off remains
RDS PostgreSQL (sharded)	~$41,000	Multiple independent instance clusters + proxy overhead + operational cost

The second major inversion

At 1M RPS, Aurora becomes cheaper than provisioned DynamoDB. The economics flip because Aurora's cost structure is essentially fixed-infrastructure — you pay for the instance regardless of RPS — while DynamoDB provisioned still scales with throughput capacity units. At extreme sustained load, a well-engineered Aurora cluster represents better cost-per-request than DynamoDB. The database selection decision that looks correct at 10K RPS may need to be revisited at 1M RPS.

The decision framework

After running all nine simulation configurations across three scale tiers, here is the decision framework that emerged:

Choose DynamoDB when:

Your data model is genuinely key-value or document-oriented, without complex relational queries
Traffic is unpredictable or highly spiky — on-demand mode is the best burst buffer in AWS's catalogue
Operational simplicity is a primary constraint: no instance management, no patching windows
You need consistent single-digit millisecond latency at any scale, including 1M+ RPS
Your write volume is low relative to read volume

Choose RDS / Aurora when:

Your data model is relational and you cannot avoid JOINs, foreign key constraints, or complex aggregation
Your workload is sustained and predictable — fixed-infrastructure pricing wins above ~50K RPS
You have existing SQL expertise on the team
You need ACID transactions across multiple entity types
At 1M RPS with relational requirements: Aurora with replicas is the viable path

Choose Aurora Serverless v2 when:

You want relational capability with auto-scaling operational simplicity
Traffic has a diurnal pattern — ACU-based pricing means you pay less during quiet hours
You're between 10K and 200K RPS and don't want to manage read replica fleet sizing manually
The p99 latency difference versus DynamoDB (typically 10–20ms) is acceptable

What simulation changed in my thinking

Running these scenarios in pinpole changed three specific assumptions I had carried from experience alone:

DynamoDB on-demand is not the frugal choice at sustained high volume

I knew this intellectually, but seeing the $84K vs $4.9K monthly estimate side by side at 100K RPS — derived from the same traffic simulation, on canvases I built in under an hour — made it immediately actionable. The recommendation to switch to provisioned mode is the kind of change that saves a growth-stage company $74,000/month and that often does not happen because no one has run the numbers with sufficient precision at design time.

Hot partition detection is nearly impossible without simulation

pinpole's warning about partition key heat distribution at 100K and 1M RPS is not something you can catch in a code review. It requires a traffic model that reflects real access patterns against a data schema. The simulation flagged it early; a production incident would have flagged it much more expensively.

The RDS Proxy AZ distribution issue under spike load is a real architectural gap

Before running the spike pattern simulation at 100K RPS, I would have said my Aurora + RDS Proxy configuration was production-ready. The simulation showed a p99 spike to 89ms under burst conditions due to proxy connection routing contention. Adding a second proxy instance in a second AZ — a five-minute canvas change — resolved it in the simulation. That is a change I would not have made without the simulation data, and it would have manifested as an intermittent latency spike in production.

On tooling and methodology

Before pinpole, my comparative analysis methodology for database selection was: read AWS documentation, apply mental models from past experience, build a rough cost estimate in the AWS Pricing Calculator (manual, static, no traffic modelling), and then — if time permitted — deploy a test environment and run k6 or Gatling against it. That last step, the only step that actually produces per-node latency and cost data under realistic traffic patterns, was often skipped because it required provisioned infrastructure, someone to set it up, and a bill to run it.

Tool	Diagrams	Cost Est.	Traffic Sim	Pre-Deploy	Verdict
Cloudcraft (Datadog)	✓	✓	✗	✗	Visualisation, not validation
Brainboard	✓	Partial	✗	✓	Design faster, still deploy to discover
System Initiative	✓	✗	Config only	✓	Wiring correctness, not throughput
k6 / Gatling / JMeter	✗	✗	Post-deploy	✗	Excellent post-deploy, not pre-deploy
AWS Distributed Load Testing	✗	✗	Post-deploy	✗	Requires deployed infrastructure
pinpole	✓	✓ (live)	✓ Pre-deploy	✓	The only tool that does all four

The specific capability I relied on for this comparison — running 10 RPS to 1M RPS traffic simulations with configurable spike and wave patterns against a designed (not deployed) architecture, with per-node latency and live cost estimation — is, as far as I can tell, genuinely unique to pinpole in the current tooling landscape.

The simulation results in this post are projections, not production measurements. They should be treated as design-time decision support data — directionally reliable, not operationally authoritative. After selecting an architecture based on simulation, I still promote through ST and UAT environments before production, and I still run post-deployment load testing to validate that simulation projections held. But the simulation narrows the decision space dramatically before any infrastructure is provisioned.

Summary

Across ten simulation runs covering three RPS tiers, four traffic patterns, and five database configurations, the one-paragraph summary is this:

At 10K RPS, DynamoDB on-demand is expensive and a well-configured Aurora Serverless v2 or provisioned DynamoDB is the cost-efficient choice. At 100K RPS, the operational complexity of RDS becomes visible and Aurora Serverless v2 or provisioned DynamoDB are the clear winners depending on whether you need SQL. At 1M RPS, DynamoDB wins on latency and operational simplicity if your data model permits it; Aurora wins on cost if you can tolerate 40–50ms p99 and the operational engineering required to run it at that scale. In no scenario does DynamoDB on-demand make economic sense for sustained production load above 20K RPS.

The Tuesday night migration that taught me this lesson should have cost an afternoon with a canvas and a simulation.

Run the simulation before you deploy, not after you have a problem. The DynamoDB hot partition warning alone may save you a weekend. The free tier includes 5 simulations per month — no credit card required.

Start 14-day free trial →

Senior AWS Solutions Architect. AWS Solutions Architect — Professional. Questions, corrections, or war stories of your own? The comment thread is open.

Tags: AWS · DynamoDB · RDS · Aurora · Database Architecture · Performance Engineering · Cost Optimisation · Serverless · pinpole

AWS DynamoDB vs RDS: a simulation-based cost and performance comparison at 10K, 100K, and 1M RPS

The architecture under test

Test methodology

Results: 10K RPS

Performance

Cost — estimated monthly at 10K RPS sustained

Results: 100K RPS

Performance

Cost — estimated monthly at 100K RPS sustained

The $74K/month insight — DynamoDB on-demand vs provisioned at 100K RPS

Results: 1M RPS

Performance

Cost — estimated monthly at 1M RPS sustained

The decision framework

Choose DynamoDB when:

Choose RDS / Aurora when:

Choose Aurora Serverless v2 when:

What simulation changed in my thinking

DynamoDB on-demand is not the frugal choice at sustained high volume

Hot partition detection is nearly impossible without simulation

The RDS Proxy AZ distribution issue under spike load is a real architectural gap

On tooling and methodology

Summary

The Tuesday night migration that taught me this lesson should have cost an afternoon with a canvas and a simulation.