Skip to main content

Performance Tuning

Profiling

Find slow code with distributed tracing.

Using X-Ray

Enable X-Ray (see Level 3, Lesson 5) and find:

  • Which services are slowest?
  • Which SQL queries are slowest?
  • Where are bottlenecks?

Using CloudWatch Logs Insights

fields @duration, @initDuration
| stats avg(@duration), max(@duration) by bin(1m)

Identify slow time windows.


Simple Explanation

What it is

Performance tuning is the practice of making serverless functions faster and more efficient.

Why we need it

At scale, small inefficiencies become big costs. Faster code is cheaper and gives users a better experience.

Benefits

  • Lower latency for users.
  • Lower compute cost per request.
  • More headroom during spikes.

Tradeoffs

  • Requires measurement before changes make sense.
  • Over-optimization can hurt clarity and maintainability.

Real-world examples (architecture only)

  • Slow response -> Parallelize calls -> Faster output.
  • High cost -> Reduce payload size -> Lower duration.

Code Level Optimization

1. Reduce Serialization

Parsing JSON is expensive:

import json

# ❌ Parse multiple times
data1 = json.loads(event.get("body") or "{}")
data2 = json.loads(event.get("body") or "{}")

# ✅ Parse once
data = json.loads(event.get("body") or "{}")
data1 = data
data2 = data

2. Parallel Queries

# ❌ Sequential: 100ms + 100ms + 100ms = 300ms
user_id = event.get("userId")
user = get_user(user_id)
orders = get_orders(user_id)
preferences = get_preferences(user_id)

# ✅ Parallel: max(100ms, 100ms, 100ms) = 100ms
import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as executor:
user, orders, preferences = executor.map(
lambda fn: fn(user_id),
[get_user, get_orders, get_preferences],
)

3. Batch Operations

# ❌ 1,000 DynamoDB calls = 1,000ms latency
user_ids = event.get("userIds", [])
for user_id in user_ids:
ddb.get_item(TableName="Users", Key={"id": user_id})

# ✅ 1 batch call = 50ms latency
ddb.batch_get_item(
RequestItems={
"Users": {
"Keys": [{"id": user_id} for user_id in user_ids]
}
}
)

4. Cache Aggressively

cache = {}


def get_user(user_id):
if user_id in cache:
return cache[user_id]

user = ddb.get_item(TableName="Users", Key={"id": user_id})
cache[user_id] = user
return user

Database Optimization

DynamoDB

1. Use appropriate indexes:

ItemsTable:
GlobalSecondaryIndexes:
- IndexName: UserIdIndex
Keys:
PartitionKey: userId
SortKey: createdAt
Projection:
ProjectionType: ALL

Query userId quickly.

2. Projection Expression:

# ❌ Fetch all attributes
ddb.query(TableName="Items", **params)

# ✅ Fetch only needed attributes
ddb.query(
TableName="Items",
ProjectionExpression="id, name, price",
**params,
)

RDS

1. Query optimization:

-- ❌ Slow: Full table scan
SELECT * FROM users WHERE name = 'John';

--✅ Fast: Use index
CREATE INDEX idx_name ON users(name);
SELECT * FROM users WHERE name = 'John';

2. Connection pooling:

Use RDS Proxy (covered in Lesson 1).

Lambda Optimization

1. Ephemeral Storage

Increase /tmp storage if processing files:

MyFunction:
Type: AWS::Serverless::Function
Properties:
EphemeralStorage:
Size: 10240 # 10 GB (default 512 MB)

Process large files faster.

2. Memory & CPU

More memory = more CPU (free):

MyFunction:
Type: AWS::Serverless::Function
Properties:
MemorySize: 3008 # Max memory
# Gets 2 full vCPU (huge speedup)

More expensive per-ms, but faster = cheaper overall.

3. Architecture

Use appropriate architecture:

# Check supported architectures
aws lambda get-account-settings

# Use ARM if possible (Graviton): cheaper
aws lambda update-function-configuration \
--function-name myfunction \
--architectures arm64

Graviton 2 processors are 20% faster and 20% cheaper.

API Gateway Optimization

1. Caching

Cache GET requests by path parameter. For example, to cache requests to the path /items endpoint for 5 minutes:

MyApi:
Type: AWS::Serverless::Api
Properties:
MethodSettings:
- ResourcePath: '/items'
HttpMethod: GET
CachingEnabled: true
CacheTtlInSeconds: 300

2. Compression

Enable compression for large responses:

MethodSettings:
- ContentHandlingStrategy: CONVERT_TO_TEXT
MinimumCompressionSize: 1024

Gzip responses > 1KB.

Network Optimization

1. VPC Endpoint

Lambda → RDS in VPC:

HelloWorld:
Properties:
VpcConfig:
SecurityGroupIds:
- sg-12345
SubnetIds:
- subnet-12345

Adds 5-10s cold start (avoid unless necessary).

2. CloudFront

Cache API responses globally:

CloudFrontDistribution:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
CacheBehaviors:
- PathPattern: '/api/*'
TargetOriginId: myapi
ViewerProtocolPolicy: https-only
CachePolicyId: 'CachingOptimized'

Serve cached responses from edge locations, massive latency improvement.

Benchmarking

Test after each optimization:

# Time function execution
time sam local invoke MyFunction -e event.json

# Load test with increasing concurrency
for c in 1 10 100 1000; do
ab -n 10000 -c $c https://api.example.com/
done

# Compare before/after times

Track improvements over time.

Monitoring Performance

P99 Latency

Show to executives, not P50:

P99DurationAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: Duration
Statistic: p99
Threshold: 500 # Alert if P99 > 500ms

SLOs (Service Level Objectives)

Commit to users:

  • 95% of requests < 200ms
  • 99.9% uptime

Monitor against SLOs continuously.

Real-World Example

Optimization Journey:

Before:

  • Avg latency: 800ms
  • P99: 2000ms
  • Errors: 0.5%

Optimization 1: Add caching

  • Avg latency: 600ms
  • P99: 1500ms

Optimization 2: Parallel queries

  • Avg latency: 300ms
  • P99: 800ms

Optimization 3: Increase memory

  • Avg latency: 150ms
  • P99: 400ms

Result: 5.3x faster, still profitable!

Best Practices

  1. Measure first — Use profiling, not guesswork
  2. Parallelize — Concurrent operations are free
  3. Cache aggressively — Every ms matters
  4. Right-size memory — More memory = more CPU = faster
  5. Use CloudFront — Massive latency improvement for global apps

Hands-On: Profile & Optimize

  1. Deploy API endpoint
  2. Measure latency (X-Ray or CloudWatch Logs Insights)
  3. Identify bottleneck
  4. Apply optimization
  5. Remeasure latency
  6. Calculate improvement

Key Takeaway

Performance is a feature. Every optimization compounds. Small improvements add up to transformative speedups.