Most architectures have all three caching layers available, and most architectures use exactly one of them — usually the one whoever designed the system was most familiar with. The result is predictable: requests get cached too late in the path, origins get pounded, bills inflate, and somebody adds a fourth caching layer in the application code.
The three layers do different things. They compose. This piece walks through what each one is actually for, and what happens when you wire them together correctly.
The three layers
- CloudFront — global edge cache, ~450 PoPs. Caches anything cacheable via HTTP semantics. Sub-50 ms RTT for cache hits. Cheapest layer per request.
- API Gateway cache — regional cache, integrated with REST API. Tied to a stage, sized in GB (0.5–237). Decent for shielding origin from cacheable API responses.
- ElastiCache (Redis/Memcached) — in-VPC application cache. Programmable, sub-ms reads, application-level invalidation. Most flexible, requires explicit cache-aside code.
Cost at 1M RPS, simulated
Assume 80% cacheable traffic, 5 KB average response, mixed global users. From a pinpole canvas simulation:
| Caching strategy | Origin hits/sec | p99 latency (global) | Monthly cost |
|---|---|---|---|
| None (direct to origin) | 1,000,000 | 320ms | $520k+ (huge origin) |
| CloudFront only | ~200,000 | 45ms hits / 220ms miss | $48k (CF) + $90k origin |
| API Gateway cache (regional) | ~200,000 | 180ms hits / 240ms miss | $210k (API GW + cache) |
| ElastiCache only | 1,000,000 to cache | 2ms cache / 30ms miss | $30k + $90k origin |
| CloudFront + ElastiCache (composed) | ~40,000 | 40ms global hits | $48k + $15k + $30k |
The pattern: composition beats any single layer. CloudFront takes the global edge hits. ElastiCache absorbs everything that gets past it, with programmatic invalidation. API Gateway cache is rarely the right primary layer in 2026 — too expensive per GB-month.
When each is the right primary layer
CloudFront
Anything cacheable by URL, anything served globally, anything where edge latency matters. Static + dynamic API responses with appropriate Cache-Control.
API Gateway cache
Niche: REST API with regional traffic, cacheable by query params, where CloudFront is not in the path. Most teams should not be reaching for this.
ElastiCache
Application-level caching with explicit invalidation, session state, leaderboards, rate-limiting counters, anything that needs sub-ms reads with programmable eviction.
The composition that wins
- CloudFront at the edge. Cache anything that can be cached by URL. Set sensible TTLs. Use Cache-Control headers from the origin.
- ElastiCache in-VPC. Cache the computed responses and the database reads. Invalidate on write.
- Origin only for cache misses. The origin shouldn't see 1M RPS — it should see whatever leaks through both upstream layers.
Skip API Gateway cache unless you have a specific reason to use it. The flat per-GB-month pricing makes it economically unattractive compared to either neighbour.
"How will I invalidate this when the data changes?" is the question that decides between CloudFront and ElastiCache. CloudFront invalidation is global, slow (minutes), and you pay for it after 1,000/month. ElastiCache invalidation is instant, programmatic, and free. Use the right tool for the invalidation cadence.
Simulating layered caching on pinpole
The canvas lets you add CloudFront, API Gateway cache, and ElastiCache nodes upstream of an origin. Each node has hit-rate and TTL configuration. Run the simulation and see exactly how many requests each layer absorbs, and what the residual hits the origin look like. Optimise the layer with the lowest cost-per-request first.
Three caching layers, one architecture — composed correctly, not picked one-of.
Simulate CloudFront + ElastiCache + origin on the canvas. See per-layer hit rates and cost in real time.
Start 14-day free trial →