AI Optimization - pinpole · Design-time recommendations before you spend a dollar

Enable Provisioned Concurrency⟶reduce cold starts by 90%

Add CloudFront distribution⟶absorb cacheable reads at edge

Implement SQS async processing⟶decouple write path from Lambda

Add circuit breaker pattern⟶prevent cascade failures from DynamoDB throttling

Add DynamoDB DAX caching layer⟶reduce read latency by up to 10×

Configure Lambda auto scaling⟶concurrency limits grow with load

Implement exponential backoff⟶resilient retries on downstream degradation

Enable Provisioned Concurrency⟶reduce cold starts by 90%

Add CloudFront distribution⟶absorb cacheable reads at edge

Implement SQS async processing⟶decouple write path from Lambda

Add circuit breaker pattern⟶prevent cascade failures from DynamoDB throttling

Add DynamoDB DAX caching layer⟶reduce read latency by up to 10×

Configure Lambda auto scaling⟶concurrency limits grow with load

Implement exponential backoff⟶resilient retries on downstream degradation

Run a simulation.
Get Recommendations.
Refresh as you iterate.

The AI engine reads your architecture canvas and the live simulation results together. It doesn't scan a static diagram - it sees what actually happened under load and surfaces findings specific to your services, your configuration, and your traffic pattern.

Run your simulation

Set your target RPS and traffic pattern - Constant, Ramp, Spike, or Wave - and start the simulation. The panel shows live per-node health, current RPS, elapsed time, and a running monthly cost estimate. Let it stabilise before moving to recommendations.

Select Get Recommendations

Hit Get Recommendations in the simulation panel. The engine analyses your canvas and simulation results and returns a ranked set of findings - typically 4–8 items for a new architecture. Each finding includes a severity level, a category, a full rationale, and an expected impact.

Expand, read, implement

Expand any recommendation to read the full rationale before accepting. When ready, click Implement - the change is applied to the canvas immediately. New services are added and wired. Configuration changes are applied to the relevant node.

Re-simulate and Refresh Recommendations

After implementing a change, re-run the simulation to confirm the expected effect on latency, cost, and health. Then use Refresh Recommendations to re-analyse the updated architecture. New findings may emerge that were not visible before - the engine works against the current state.

Current RPS

1.0K

Est. Cost

$12,029

/mo

Node Metrics

api-gateway healthy 1,021 RPS · 10ms

cloudfront healthy 1,042 RPS · 16.7ms

lambda healthy 0 RPS · 156.3ms

ec2 healthy 0 RPS · 10ms

rds healthy 0 RPS · 10ms

✦ Get Recommendations

Prioritised findings.
One-click implementation.

Every recommendation is categorised, severity-ranked, and expandable. Read the full rationale, review the expected impact, then implement directly from the panel - no manual canvas edits required.

Recommendations 6

∧

⚠ WARNING modify config

Enable Provisioned Concurrency for Request Processor Lambda

▲

The Request Processor Lambda experiences a significant number of cold starts (1,488) which increases latency to 156.3ms. Enabling provisioned concurrency will keep a set number of Lambda instances initialised and ready to respond, reducing cold start latency and improving response times during traffic spikes.

Impact

Reduces latency and cold start delays for Request Processor Lambda, improving overall API responsiveness during spikes.

Expected

Reduce latency by up to 50%, decrease cold starts by 90%

⚠ WARNING modify config

Enable Provisioned Concurrency for Background Worker Lambda

▼

Background Worker Lambda shows 892 cold starts during the simulation run at 1,000 RPS. Provisioned concurrency ensures worker instances are pre-initialised, preventing request queuing during burst periods.

Expected

Reduce background processing latency by up to 45%

ℹ INFO add service

Add Caching Layer with Amazon DynamoDB Accelerator (DAX)

▼

DynamoDB read patterns show high repetition at current load. Adding a DAX cluster in front of DynamoDB will serve repeated reads from memory, reducing DynamoDB read unit consumption and improving p99 latency on read-heavy paths.

Expected

Reduce DynamoDB read latency by up to 10×, lower RCU cost by ~60%

ℹ INFO architecture

Implement Exponential Backoff and Retry Logic in Request Processor

▼

Request Processor Lambda calls downstream services without retry logic. Under load, transient failures from DynamoDB or RDS will surface as hard errors. Exponential backoff with jitter absorbs transient failures without amplifying downstream pressure.

Expected

Reduce error rate by up to 30% under peak load

+ 2 more recommendations

Recommendations are returned in severity order - address WARNINGs before INFOs. Each card shows enough context to act, with the full rationale one expand away. The Implement button applies the change immediately: services are added, wires are drawn, configuration is updated.

Recommendation categories

modify config

Configuration adjustments

Concurrency settings, memory allocation, caching TTLs, timeouts, SQS visibility windows - changes to existing node configuration.

add service

Missing services

Services that would materially improve performance, resilience, or cost. CloudFront for edge caching, DAX for DynamoDB reads, SQS for write path decoupling.

architecture

Structural changes

Circuit breakers, async decoupling, retry patterns, fan-out topology - changes to how services connect and communicate.

scaling

Auto-scaling gaps

Missing or misconfigured auto-scaling policies that will cause services to hit hard concurrency or throughput limits under increasing load.

WARNING first.
Then INFO.

Every finding is assigned a severity level that signals urgency. The ordering is deliberate - address WARNINGs before INFOs, and re-simulate between each change.

WARNING

A configuration that represents a real failure risk or significant cost inefficiency at the simulated load. These findings require action before deployment - they are not advisory.

Examples:
Lambda throttling at target RPS
API Gateway as single point of failure
Missing CloudFront on high-read API
Lambda concurrency limit will be exceeded

INFO

An improvement opportunity that is not blocking but should be addressed before production - these findings represent meaningful performance or cost gains that are not urgent at current load.

Examples:
Lambda provisioned concurrency not set
DynamoDB on-demand mode suboptimal
SQS visibility timeout too short
No exponential backoff on retry paths

Apply one recommendation at a time. Batch-applying all recommendations in one pass obscures which change had the most impact. Apply WARNING items, re-simulate, then review INFO items in the new state. Refreshing recommendations after each re-simulation ensures the engine is always analysing the current architecture.

Typical Lambda API optimization sequence

Add CloudFront WARNING

Reduces API Gateway load and p99 latency. Absorbs cacheable reads at edge before they reach Lambda.

Introduce Circuit Breaker WARNING

Prevents Lambda cascade failures when downstream services (DynamoDB, RDS) degrade under load.

Enable Provisioned Concurrency INFO

Pre-initialises Lambda execution environments, eliminating cold start latency under spike traffic.

Implement SQS Async Processing INFO

Decouples the write path. API Gateway accepts requests into the queue; Lambda processes at its own pace.

Configure Auto Scaling INFO

Ensures concurrency limits scale with load. Required for architectures serving variable or unpredictable traffic.

Read the rationale.
Click Implement.
Re-simulate.

No manual canvas edits. No digging through documentation to figure out how to wire a new service. The Implement button applies the recommendation directly - new services appear on the canvas, connections are drawn, and affected node configuration is updated.

What Implement does

The canvas updates
automatically.

Each recommendation maps to a specific, reversible change on the canvas. Implement applies it immediately - the change is visible in the canvas, and the simulation state is reset so your next run reflects the updated architecture.

+ Add service recommendations - new service node appears on the canvas, wired to the correct upstream and downstream services according to the recommended topology.

⚙ Modify config recommendations - the relevant node's configuration is updated in place. Open the Node Configuration panel to review the applied change.

⇄ Architecture recommendations - structural changes are applied: new connection types are added, existing wires may be re-routed, and protective patterns (circuit breakers, retry logic) are configured on the affected nodes.

▸ After implementing any recommendation, re-run the simulation before accepting the next one. Confirm the expected effect on latency, cost, and health - then use Refresh Recommendations to re-analyse the updated state.

⚠ WARNING add service

Add CloudFront distribution in front of API Gateway

API Gateway is receiving the full 1,041 RPS directly. A CloudFront distribution at the edge will absorb cacheable requests, reduce API Gateway load, and lower end-user latency through edge PoP routing.

Expected: Reduce API Gateway load by ~60%, latency by 40ms at edge

Implement

↓ canvas updated

Applied to canvas

CloudFront node added to canvas

Wired: CloudFront → API Gateway

Latency routing tag applied to node

PriceClass_100 configured

＋

New service added & wired

For add service recommendations, the new node appears on the canvas connected to the correct upstream and downstream services. No manual wiring required.

⚙

Node configuration updated

For modify config recommendations, the affected node's properties are updated in place. Open Node Configuration to inspect the change before re-running.

⇄

Architecture restructured

For architecture recommendations, connections are re-routed, protective patterns are applied, and async paths are introduced - exactly as specified in the recommendation rationale.

Get more from
every recommendation cycle.

A few habits that make the optimize loop measurably more effective - drawn from the recommendation patterns that emerge most frequently in real simulation sessions.

01 -

One recommendation at a time

Apply a single recommendation, re-simulate, then review the next. Batch-applying all recommendations obscures which change had the most impact, making future debugging harder and the iteration history less useful.

02 -

Read the rationale before implementing

Expand each recommendation and read the full rationale. Recommendations are contextual - understanding why a change is being proposed helps you assess whether the trade-off is appropriate for your specific use case and traffic profile.

03 -

Use Refresh Recommendations after canvas changes

The engine analyses the current state of your canvas. After significant changes - whether from implementing a recommendation or making manual edits - use Refresh Recommendations to ensure the next set of findings reflects the updated architecture.

04 -

Dismiss irrelevant recommendations explicitly

If a recommendation doesn't apply to your use case, dismiss it rather than leaving it open. This keeps your recommendation history accurate and ensures future Refresh calls surface only relevant findings - not repeated noise from items you've already considered.

05 -

Run clean simulations after Lambda changes

After changing Lambda concurrency settings, stop the simulation fully and start a new run rather than resuming. Concurrency changes are applied at simulation initialisation - resuming a paused run will not pick up the new values, making the subsequent recommendation analysis unreliable.

06 -

Do not skip recommendations on a zero-alert run

A clean simulation with no WARNING alerts confirms the architecture handles load without breaching limits. It does not mean the architecture is optimized. Proceed to Recommendations even when alerts are zero - INFO-level findings often surface meaningful cost and latency improvements that aren't visible in the health indicators.

Plan availability

Recommendations are available on all plans. Free includes 3 AI calls per month. Pro, Team, and Enterprise include unlimited calls. Additional AI credits are available as add-ons on any paid plan.

3 calls / mo - Free

∞ calls / mo - Pro+

$0.03 per call add-on

Architecture warnings, fixed before
you spend a dollar on AWS.

Run a simulation.
Get Recommendations.
Refresh as you iterate.

Prioritised findings.
One-click implementation.

WARNING first.
Then INFO.

Read the rationale.
Click Implement.
Re-simulate.

The canvas updates
automatically.

Get more from
every recommendation cycle.

Your next architecture
already has warnings.
Find them first.

Architecture warnings, fixed beforeyou spend a dollar on AWS.

Run a simulation.Get Recommendations.Refresh as you iterate.

Prioritised findings.One-click implementation.

WARNING first.Then INFO.

Read the rationale.Click Implement.Re-simulate.

The canvas updatesautomatically.

Get more fromevery recommendation cycle.

Your next architecturealready has warnings.Find them first.

Architecture warnings, fixed before
you spend a dollar on AWS.

Run a simulation.
Get Recommendations.
Refresh as you iterate.

Prioritised findings.
One-click implementation.

WARNING first.
Then INFO.

Read the rationale.
Click Implement.
Re-simulate.

The canvas updates
automatically.

Get more from
every recommendation cycle.

Your next architecture
already has warnings.
Find them first.