Skip to main content

Concurrency & Cold Starts

Understanding Concurrency

Concurrency = number of Lambda instances running simultaneously.

Concurrency burst example

Lambda scales automatically with load.


Simple Explanation

What it is

Concurrency is how many function instances run at the same time. Cold starts happen when a new instance spins up for the first request.

Why we need it

At scale, you can hit limits or slowdowns. Understanding concurrency and cold starts helps you keep latency low during traffic spikes.

Benefits

  • Predictable performance when traffic surges.
  • Fewer throttles with the right limits and buffering.
  • Better user experience with warmed instances.

Tradeoffs

  • Warm capacity costs money even when idle.
  • Over-provisioning can waste budget.

Real-world examples (architecture only)

  • Sale launch -> Pre-warm functions -> Smooth checkout.
  • Traffic spike -> Queue buffer -> Slow but stable processing.

Reserved vs. Throttled Concurrency

Account-Level Limit

  • Default: 1,000 concurrent executions per region
  • Can be increased with AWS support

Reserved Concurrency

Reserve capacity for critical functions:

CriticalAPI:
Type: AWS::Serverless::Function
Properties:
ReservedConcurrentExecutions: 500 # Always available

Other functions share remaining capacity.

Provisioned Concurrency

Keep instances warmed and ready (eliminates cold starts):

CriticalAPI:
Type: AWS::Serverless::Function
Properties:
ProvisionedConcurrentExecutions: 100 # 100 warm
ReservedConcurrentExecutions: 500 # Up to 500 total

Cost: ~$0.0000150/hour per provisioned unit.

Cold Starts

First invocation is slow because AWS must:

  1. Allocate compute
  2. Start runtime
  3. Load code
  4. Execute handler

Duration by Runtime

RuntimeCold Start
Python300-500ms
Java1000-2000ms
.NET500-1000ms

Subsequent invocations: < 50ms

Minimize Cold Starts

1. Optimize code bundle:

# Check size
du -h function.zip

# Install only needed deps into a build folder
python -m pip install -r requirements.txt -t build

# Zip minimal package
cd build && zip -r ../function.zip . && cd ..

2. Use Python:

Java has larger cold start.

3. Lazy load imports:

# ❌ Slow
import boto3
import requests

# ✅ Fast
def handler(event, context):
import boto3
import requests
# Use AWS SDK and HTTP client here

4. Use Provisioned Concurrency:

For APIs where 100ms matters, use Provisioned Concurrency.

Cost vs. benefit: $100/month to eliminate cold starts.

Throttling

When concurrent executions exceed limit, new invocations are throttled.

Sync Invocations (API Gateway)

Throttled invocation returns 429:

Throttling flow

Async Invocations (S3, SNS triggers)

Throttled invocations are retried automatically (up to 2 times).

Then sent to DLQ if failures persist.

Handling Bursty Traffic

Before Optimization

Black Friday surge:

  • 100 req/sec → 1,000 req/sec suddenly
  • Cold starts spike latency
  • Some requests throttled
  • 5% error rate

After Optimization

Same surge:

  • Provisioned Concurrency: 100 warm
  • Handles first 100 instantly
  • Auto-scales remaining
  • 0% error rate

Connection Pooling

Reuse connections to avoid setup overhead:

# ❌ Bad: New connection per request
def handler(event, context):
conn = db.connect()
result = conn.query("SELECT ...")
conn.close()
return result


# ✅ Good: Reuse connection
_conn = None


def get_connection():
global _conn
if _conn is None:
_conn = db.connect()
return _conn


def handler(event, context):
conn = get_connection()
result = conn.query("SELECT ...")
return result

Connection persists across warm invocations (huge speedup).

Database Connection Limits

High concurrency means many Lambda instances = many DB connections.

Problem

1,000 Lambda instances × 1 connection each = 1,000 DB connections

Most databases support only 100-500 connections per user.

Solution: RDS Proxy

Maintain limited pool of connections to RDS:

1,000 Lambda instances

RDS Proxy (100 connections)

Database (handles efficiently)

RDS Proxy multiplexes connections, acts as buffer.

Setup

DBProxy:
Type: AWS::RDS::DBProxy
Properties:
DBProxyName: my-proxy
EngineFamily: MYSQL
RoleArn: !GetAtt ProxyRole.Arn
TargetGroupConfig:
DBInstanceIdentifiers:
- !GetAtt MyDatabase.DBInstanceIdentifier

Lambda:
Environment:
Variables:
DB_HOST: !GetAtt DBProxy.Endpoint

DynamoDB Auto-Scaling

DynamoDB auto-scales to handle concurrency, but respect throughput limits.

If you set provisioned capacity:

ItemsTable:
Type: AWS::DynamoDB::Table
Properties:
ProvisionedThroughput:
ReadCapacityUnits: 100
WriteCapacityUnits: 100
# If traffic > 100 writes/sec, you get throttled

Solution: Use on-demand billing:

ItemsTable:
Type: AWS::DynamoDB::Table
Properties:
BillingMode: PAY_PER_REQUEST
# Auto-scales to any traffic level

Monitoring Concurrency

ConcurrencyAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: ConcurrentExecutions
Threshold: 900 # Alert at 90% of limit
ComparisonOperator: GreaterThanThreshold

ThrottleAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: Throttles
Threshold: 1
ComparisonOperator: GreaterThanOrEqualToThreshold

Auto-Scaling API

Use API Gateway usage plans to rate-limit clients:

APIKey:
Type: AWS::ApiGateway::ApiKey
Properties:
Enabled: true

UsagePlan:
Type: AWS::ApiGateway::UsagePlan
Properties:
ApiStages:
- ApiId: !Ref MyApi
Stage: prod
Quota:
Limit: 10000
Period: DAY
Throttle:
BurstLimit: 500 # Spike to 500 req/sec
RateLimit: 100 # Average 100 req/sec

Clients exceeding limits get 429.

Load Testing

Test concurrency before going live:

# Using Apache Bench
ab -n 10000 -c 1000 https://api.example.com/

# 10,000 total requests
# 1,000 concurrent

Observe:

  • Response time under load
  • Any throttling errors
  • Cold start impact
  • Database connections

Best Practices

  1. Reserve capacity for critical functions — Prevent throttling
  2. Use RDS Proxy for databases — Avoid connection pool exhaustion
  3. Enable Provisioned Concurrency for APIs — Eliminate cold starts
  4. Use on-demand DynamoDB — Auto-scale throughput
  5. Load test regularly — Know your limits before users do

Hands-On: Scale Test

  1. Deploy function with 10s timeout
  2. Use Apache Bench to load test: 100 concurrent requests
  3. Observe response times and errors
  4. Add Provisioned Concurrency
  5. Re-test, see improvement

Key Takeaway

Concurrency is Lambda's superpower. It scales automatically, but you must understand limits, cold starts, and how to prepare for traffic spikes.