Performance FinOps Engineering · May 2026 · 7 min read AWS

CloudFront vs API Gateway vs ElastiCache: a simulation-led head-to-head at 1M RPS

Edge Performance Engineer Global SaaS May 2026
← Back to blog

Most architectures have all three caching layers available, and most architectures use exactly one of them — usually the one whoever designed the system was most familiar with. The result is predictable: requests get cached too late in the path, origins get pounded, bills inflate, and somebody adds a fourth caching layer in the application code.

The three layers do different things. They compose. This piece walks through what each one is actually for, and what happens when you wire them together correctly.

The three layers

Cost at 1M RPS, simulated

Assume 80% cacheable traffic, 5 KB average response, mixed global users. From a pinpole canvas simulation:

Caching strategyOrigin hits/secp99 latency (global)Monthly cost
None (direct to origin)1,000,000320ms$520k+ (huge origin)
CloudFront only~200,00045ms hits / 220ms miss$48k (CF) + $90k origin
API Gateway cache (regional)~200,000180ms hits / 240ms miss$210k (API GW + cache)
ElastiCache only1,000,000 to cache2ms cache / 30ms miss$30k + $90k origin
CloudFront + ElastiCache (composed)~40,00040ms global hits$48k + $15k + $30k

The pattern: composition beats any single layer. CloudFront takes the global edge hits. ElastiCache absorbs everything that gets past it, with programmatic invalidation. API Gateway cache is rarely the right primary layer in 2026 — too expensive per GB-month.

When each is the right primary layer

CloudFront

Anything cacheable by URL, anything served globally, anything where edge latency matters. Static + dynamic API responses with appropriate Cache-Control.

API Gateway cache

Niche: REST API with regional traffic, cacheable by query params, where CloudFront is not in the path. Most teams should not be reaching for this.

ElastiCache

Application-level caching with explicit invalidation, session state, leaderboards, rate-limiting counters, anything that needs sub-ms reads with programmable eviction.

The composition that wins

  1. CloudFront at the edge. Cache anything that can be cached by URL. Set sensible TTLs. Use Cache-Control headers from the origin.
  2. ElastiCache in-VPC. Cache the computed responses and the database reads. Invalidate on write.
  3. Origin only for cache misses. The origin shouldn't see 1M RPS — it should see whatever leaks through both upstream layers.

Skip API Gateway cache unless you have a specific reason to use it. The flat per-GB-month pricing makes it economically unattractive compared to either neighbour.

The invalidation question

"How will I invalidate this when the data changes?" is the question that decides between CloudFront and ElastiCache. CloudFront invalidation is global, slow (minutes), and you pay for it after 1,000/month. ElastiCache invalidation is instant, programmatic, and free. Use the right tool for the invalidation cadence.

Simulating layered caching on pinpole

The canvas lets you add CloudFront, API Gateway cache, and ElastiCache nodes upstream of an origin. Each node has hit-rate and TTL configuration. Run the simulation and see exactly how many requests each layer absorbs, and what the residual hits the origin look like. Optimise the layer with the lowest cost-per-request first.

Three caching layers, one architecture — composed correctly, not picked one-of.

Simulate CloudFront + ElastiCache + origin on the canvas. See per-layer hit rates and cost in real time.

Start 14-day free trial →