Concurrency & Cold Starts

Understanding Concurrency

Concurrency = number of Lambda instances running simultaneously.

Concurrency burst example

Lambda scales automatically with load.

Simple Explanation

What it is

Concurrency is how many function instances run at the same time. Cold starts happen when a new instance spins up for the first request.

Why we need it

At scale, you can hit limits or slowdowns. Understanding concurrency and cold starts helps you keep latency low during traffic spikes.

Benefits

Predictable performance when traffic surges.
Fewer throttles with the right limits and buffering.
Better user experience with warmed instances.

Tradeoffs

Warm capacity costs money even when idle.
Over-provisioning can waste budget.

Real-world examples (architecture only)

Sale launch -> Pre-warm functions -> Smooth checkout.
Traffic spike -> Queue buffer -> Slow but stable processing.

Reserved vs. Throttled Concurrency

Account-Level Limit

Default: 1,000 concurrent executions per region
Can be increased with AWS support

Reserved Concurrency

Reserve capacity for critical functions:

CriticalAPI:
  Type: AWS::Serverless::Function
  Properties:
    ReservedConcurrentExecutions: 500  # Always available

Other functions share remaining capacity.

Provisioned Concurrency

Keep instances warmed and ready (eliminates cold starts):

CriticalAPI:
  Type: AWS::Serverless::Function
  Properties:
    ProvisionedConcurrentExecutions: 100  # 100 warm
    ReservedConcurrentExecutions: 500     # Up to 500 total

Cost: ~$0.0000150/hour per provisioned unit.

Cold Starts

First invocation is slow because AWS must:

Allocate compute
Start runtime
Load code
Execute handler

Duration by Runtime

Runtime	Cold Start
Python	300-500ms
Java	1000-2000ms
.NET	500-1000ms

Subsequent invocations: < 50ms

Minimize Cold Starts

1. Optimize code bundle:

# Check size
du -h function.zip

# Install only needed deps into a build folder
python -m pip install -r requirements.txt -t build

# Zip minimal package
cd build && zip -r ../function.zip . && cd ..

2. Use Python:

Java has larger cold start.

3. Lazy load imports:

# ❌ Slow
import boto3
import requests

# ✅ Fast
def handler(event, context):
    import boto3
    import requests
    # Use AWS SDK and HTTP client here

4. Use Provisioned Concurrency:

For APIs where 100ms matters, use Provisioned Concurrency.

Cost vs. benefit: $100/month to eliminate cold starts.

Throttling

When concurrent executions exceed limit, new invocations are throttled.

Sync Invocations (API Gateway)

Throttled invocation returns 429:

Throttling flow

Throttled invocations are retried automatically (up to 2 times).

Then sent to DLQ if failures persist.

Handling Bursty Traffic

Before Optimization

Black Friday surge:

100 req/sec → 1,000 req/sec suddenly
Cold starts spike latency
Some requests throttled
5% error rate

After Optimization

Same surge:

Provisioned Concurrency: 100 warm
Handles first 100 instantly
Auto-scales remaining
0% error rate

Connection Pooling

Reuse connections to avoid setup overhead:

# ❌ Bad: New connection per request
def handler(event, context):
  conn = db.connect()
  result = conn.query("SELECT ...")
  conn.close()
  return result


# ✅ Good: Reuse connection
_conn = None


def get_connection():
  global _conn
  if _conn is None:
    _conn = db.connect()
  return _conn


def handler(event, context):
  conn = get_connection()
  result = conn.query("SELECT ...")
  return result

Connection persists across warm invocations (huge speedup).

Database Connection Limits

High concurrency means many Lambda instances = many DB connections.

Problem

1,000 Lambda instances × 1 connection each = 1,000 DB connections

Most databases support only 100-500 connections per user.

Solution: RDS Proxy

Maintain limited pool of connections to RDS:

1,000 Lambda instances
↓
         RDS Proxy (100 connections)
↓
Database (handles efficiently)

RDS Proxy multiplexes connections, acts as buffer.

Setup

DBProxy:
  Type: AWS::RDS::DBProxy
  Properties:
    DBProxyName: my-proxy
    EngineFamily: MYSQL
    RoleArn: !GetAtt ProxyRole.Arn
    TargetGroupConfig:
      DBInstanceIdentifiers:
        - !GetAtt MyDatabase.DBInstanceIdentifier

Lambda:
  Environment:
    Variables:
      DB_HOST: !GetAtt DBProxy.Endpoint

DynamoDB Auto-Scaling

DynamoDB auto-scales to handle concurrency, but respect throughput limits.

If you set provisioned capacity:

ItemsTable:
  Type: AWS::DynamoDB::Table
  Properties:
    ProvisionedThroughput:
      ReadCapacityUnits: 100
      WriteCapacityUnits: 100
    # If traffic > 100 writes/sec, you get throttled

Solution: Use on-demand billing:

ItemsTable:
  Type: AWS::DynamoDB::Table
  Properties:
    BillingMode: PAY_PER_REQUEST
    # Auto-scales to any traffic level

Monitoring Concurrency

ConcurrencyAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    MetricName: ConcurrentExecutions
    Threshold: 900  # Alert at 90% of limit
    ComparisonOperator: GreaterThanThreshold

ThrottleAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    MetricName: Throttles
    Threshold: 1
    ComparisonOperator: GreaterThanOrEqualToThreshold

Auto-Scaling API

Use API Gateway usage plans to rate-limit clients:

APIKey:
  Type: AWS::ApiGateway::ApiKey
  Properties:
    Enabled: true

UsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    ApiStages:
      - ApiId: !Ref MyApi
        Stage: prod
    Quota:
      Limit: 10000
      Period: DAY
    Throttle:
      BurstLimit: 500    # Spike to 500 req/sec
      RateLimit: 100     # Average 100 req/sec

Clients exceeding limits get 429.

Load Testing

Test concurrency before going live:

# Using Apache Bench
ab -n 10000 -c 1000 https://api.example.com/

# 10,000 total requests
# 1,000 concurrent

Observe:

Response time under load
Any throttling errors
Cold start impact
Database connections

Best Practices

Reserve capacity for critical functions — Prevent throttling
Use RDS Proxy for databases — Avoid connection pool exhaustion
Enable Provisioned Concurrency for APIs — Eliminate cold starts
Use on-demand DynamoDB — Auto-scale throughput
Load test regularly — Know your limits before users do

Hands-On: Scale Test

Deploy function with 10s timeout
Use Apache Bench to load test: 100 concurrent requests
Observe response times and errors
Add Provisioned Concurrency
Re-test, see improvement

Key Takeaway

Concurrency is Lambda's superpower. It scales automatically, but you must understand limits, cold starts, and how to prepare for traffic spikes.

Understanding Concurrency​

Simple Explanation​

What it is​

Why we need it​

Benefits​

Tradeoffs​

Real-world examples (architecture only)​

Reserved vs. Throttled Concurrency​

Account-Level Limit​

Reserved Concurrency​

Provisioned Concurrency​

Cold Starts​

Duration by Runtime​

Minimize Cold Starts​

Throttling​

Sync Invocations (API Gateway)​

Async Invocations (S3, SNS triggers)​

Handling Bursty Traffic​

Before Optimization​

After Optimization​

Connection Pooling​

Database Connection Limits​

Problem​

Solution: RDS Proxy​

Setup​

DynamoDB Auto-Scaling​

Monitoring Concurrency​

Auto-Scaling API​

Load Testing​

Best Practices​

Hands-On: Scale Test​

Key Takeaway​