Handling Bursty Workloads

The Problem

Normal traffic: 10 requests/second Black Friday: 10,000 requests/second (suddenly)

Traditional servers: Fail, get slammed, go down.

Serverless: Scales automatically, but you might hit limits.

Simple Explanation

What it is

Bursty workloads are traffic spikes that appear suddenly, like flash sales or viral events.

Why we need it

If you do not plan for bursts, your system can throttle or fail at the exact moment users care most.

Benefits

Stable performance during spikes.
Reduced failures by buffering or rate limiting.
Predictable user experience even under load.

Tradeoffs

Extra infrastructure like queues and caches.
Potential delays if you buffer too much work.

Real-world examples (architecture only)

Flash sale -> Queue + worker functions -> Steady processing.
Viral post -> CDN + cache -> Lower backend load.

Burst Capacity

Lambda has burst capacity: Can handle 3,000 concurrent invocations instantly.

After that, scales up gradually:

Second 0: 3,000 concurrent
Second 1: 6,000 concurrent
Second 2: 9,000 concurrent
...growth continues

Burst capacity buys you time to scale smoothly.

Queue-Based Architecture

Instead of direct API calls, use queues:

Spike in traffic
  ↓
SQS queue buffers requests
  ↓
Lambda processes at steady rate
  ↓
No throttling, no 503 errors
  ↓
Users wait in queue (better than fail)

Implementation

APIGateway:
  # Fast, returns immediately
  
SQS:
  Type: AWS::SQS::Queue
  Properties:
    VisibilityTimeout: 300

ProcessingFunction:
  # Processes queue at steady rate
  Events:
    SQSEvent:
      Type: SQS
      Properties:
        Queue: !GetAtt Queue.Arn
        BatchSize: 10

DynamoDB Throttling

During burst, DynamoDB might throttle if you set provisioned capacity:

# ❌ Provisioned: Fixed 100 writes/sec
ItemsTable:
  ProvisionedThroughput:
    WriteCapacityUnits: 100

# ✅ On-Demand: Unlimited writes
ItemsTable:
  BillingMode: PAY_PER_REQUEST

Use on-demand for apps expecting bursts.

API Rate Limiting

Prevent abuse during bursts:

UsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    ApiStages:
      - ApiId: !Ref MyApi
    Throttle:
      BurstLimit: 5000   # Handle spike
      RateLimit: 2000    # Sustained
    Quota:
      Limit: 100000000   # Daily limit
      Period: DAY

Clients exceeding rate get 429.

Graceful Degradation

When near capacity, degrade service instead of failing:

CAPACITY_THRESHOLD = 0.8  # 80% full


def handle_request(event):
  metrics = get_lambda_metrics()
  utilization = metrics["concurrentExecutions"] / metrics["limit"]

  if utilization > CAPACITY_THRESHOLD:
    if "x-priority" not in (event.get("headers") or {}):
      return {"statusCode": 503, "body": "Service busy"}

  return process_normally(event)

Circuit Breaker

Stop calling overloaded services:

import time

failure_count = 0
circuit_open = False
last_failure_time = 0

def call_external_service():
    global failure_count, circuit_open, last_failure_time
    
    if circuit_open:
        # Check if we should try again
        if time.time() - last_failure_time > 60:
            circuit_open = False  # Try again
            failure_count = 0
        else:
            raise Exception('Circuit breaker open')
    
    try:
        response = requests.get('https://api.external.com/data')
        failure_count = 0  # Reset on success
        return response.json()
    except Exception as e:
        failure_count += 1
        last_failure_time = time.time()
        
        if failure_count > 5:
            circuit_open = True  # Stop trying
        
        raise e

Caching for Burst Mitigation

Reduce load with aggressive caching:

import time

cache = {}
CACHE_TTL = 60  # 1 minute


def get_expensive_data(key):
  cached = cache.get(key)
  if cached and (time.time() - cached["time"]) < CACHE_TTL:
    return cached["value"]

  value = expensive_query()
  cache[key] = {"value": value, "time": time.time()}
  return value

During burst, cache hits avoid database queries entirely.

Scheduled Scaling

Pre-scale before expected spike:

ProvisionedConcurrency:
  Type: AWS::CloudFormation::Stack
  Properties:
    TemplateURL: provisioned-concurrency.yaml
    TimeZone: UTC
    # Increase concurrency at 11 PM EST
    # (before Black Friday midnight sale)

Use EventBridge to schedule increase/decrease:

# Scale up at 10 PM
aws events put-rule \
  --name scale-up \
  --schedule-expression "cron(22 0 ? * FRI *)"

# Invoke Lambda to increase provisioned concurrency

Load Testing

Simulate bursts before launch:

# Simulate 10,000 req/sec
artillery quick --count 10000 --num 100000 \
  https://api.example.com

# Monitor:
# - Response times
# - Error rates
# - DynamoDB throttling
# - Lambda throttling

Fix bottlenecks before real traffic arrives.

Monitoring Bursts

Alert when burst happens:

BurstDetectionAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    MetricName: Invocations
    Statistic: Sum
    Period: 60
    EvaluationPeriods: 1
    Threshold: 5000  # 5000/min = 83/sec
    ComparisonOperator: GreaterThanThreshold

Real Example: Black Friday

Preparation:

Load test with 10x expected traffic
Enable on-demand DynamoDB
Increase Lambda provisioned concurrency 10x
Setup queue-based processing
Enable caching
Setup alarms

Result:

Orders per hour: 1,000,000 (vs. 100,000 normal)
Errors: < 0.01%
Cost: 5x (acceptable)
ROI: Massive sales spike = 50x revenue increase

Best Practices

Use on-demand for databases — Auto-scales with bursts
Queue requests — Smooth out spikes
Cache aggressively — Reduce load during burst
Load test — Know your limits before users do
Monitor strictly — Alert on anomalies
Graceful degradation — Better degraded than down

Hands-On: Burst Handling

Create API with mock 1-second response time
Load test with normal traffic (100 req/sec)
Load test with burst (5,000 req/sec)
Add queue-based processing
Re-test burst, see improvement

Key Takeaway

Bursts are inevitable. Prepare with queuing, caching, and on-demand scaling. You won't be caught off guard.

The Problem​

Simple Explanation​

What it is​

Why we need it​

Benefits​

Tradeoffs​

Real-world examples (architecture only)​

Burst Capacity​

Queue-Based Architecture​

Implementation​

DynamoDB Throttling​

API Rate Limiting​

Graceful Degradation​

Circuit Breaker​

Caching for Burst Mitigation​

Scheduled Scaling​

Load Testing​

Monitoring Bursts​

Real Example: Black Friday​

Best Practices​

Hands-On: Burst Handling​

Key Takeaway​