Handling Bursty Workloads
The Problem
Normal traffic: 10 requests/second Black Friday: 10,000 requests/second (suddenly)
Traditional servers: Fail, get slammed, go down.
Serverless: Scales automatically, but you might hit limits.
Simple Explanation
What it is
Bursty workloads are traffic spikes that appear suddenly, like flash sales or viral events.
Why we need it
If you do not plan for bursts, your system can throttle or fail at the exact moment users care most.
Benefits
- Stable performance during spikes.
- Reduced failures by buffering or rate limiting.
- Predictable user experience even under load.
Tradeoffs
- Extra infrastructure like queues and caches.
- Potential delays if you buffer too much work.
Real-world examples (architecture only)
- Flash sale -> Queue + worker functions -> Steady processing.
- Viral post -> CDN + cache -> Lower backend load.
Burst Capacity
Lambda has burst capacity: Can handle 3,000 concurrent invocations instantly.
After that, scales up gradually:
Second 0: 3,000 concurrent
Second 1: 6,000 concurrent
Second 2: 9,000 concurrent
...growth continues
Burst capacity buys you time to scale smoothly.
Queue-Based Architecture
Instead of direct API calls, use queues:
Spike in traffic
↓
SQS queue buffers requests
↓
Lambda processes at steady rate
↓
No throttling, no 503 errors
↓
Users wait in queue (better than fail)
Implementation
APIGateway:
# Fast, returns immediately
SQS:
Type: AWS::SQS::Queue
Properties:
VisibilityTimeout: 300
ProcessingFunction:
# Processes queue at steady rate
Events:
SQSEvent:
Type: SQS
Properties:
Queue: !GetAtt Queue.Arn
BatchSize: 10
DynamoDB Throttling
During burst, DynamoDB might throttle if you set provisioned capacity:
# ❌ Provisioned: Fixed 100 writes/sec
ItemsTable:
ProvisionedThroughput:
WriteCapacityUnits: 100
# ✅ On-Demand: Unlimited writes
ItemsTable:
BillingMode: PAY_PER_REQUEST
Use on-demand for apps expecting bursts.
API Rate Limiting
Prevent abuse during bursts:
UsagePlan:
Type: AWS::ApiGateway::UsagePlan
Properties:
ApiStages:
- ApiId: !Ref MyApi
Throttle:
BurstLimit: 5000 # Handle spike
RateLimit: 2000 # Sustained
Quota:
Limit: 100000000 # Daily limit
Period: DAY
Clients exceeding rate get 429.
Graceful Degradation
When near capacity, degrade service instead of failing:
CAPACITY_THRESHOLD = 0.8 # 80% full
def handle_request(event):
metrics = get_lambda_metrics()
utilization = metrics["concurrentExecutions"] / metrics["limit"]
if utilization > CAPACITY_THRESHOLD:
if "x-priority" not in (event.get("headers") or {}):
return {"statusCode": 503, "body": "Service busy"}
return process_normally(event)
Circuit Breaker
Stop calling overloaded services:
import time
failure_count = 0
circuit_open = False
last_failure_time = 0
def call_external_service():
global failure_count, circuit_open, last_failure_time
if circuit_open:
# Check if we should try again
if time.time() - last_failure_time > 60:
circuit_open = False # Try again
failure_count = 0
else:
raise Exception('Circuit breaker open')
try:
response = requests.get('https://api.external.com/data')
failure_count = 0 # Reset on success
return response.json()
except Exception as e:
failure_count += 1
last_failure_time = time.time()
if failure_count > 5:
circuit_open = True # Stop trying
raise e
Caching for Burst Mitigation
Reduce load with aggressive caching:
import time
cache = {}
CACHE_TTL = 60 # 1 minute
def get_expensive_data(key):
cached = cache.get(key)
if cached and (time.time() - cached["time"]) < CACHE_TTL:
return cached["value"]
value = expensive_query()
cache[key] = {"value": value, "time": time.time()}
return value
During burst, cache hits avoid database queries entirely.
Scheduled Scaling
Pre-scale before expected spike:
ProvisionedConcurrency:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: provisioned-concurrency.yaml
TimeZone: UTC
# Increase concurrency at 11 PM EST
# (before Black Friday midnight sale)
Use EventBridge to schedule increase/decrease:
# Scale up at 10 PM
aws events put-rule \
--name scale-up \
--schedule-expression "cron(22 0 ? * FRI *)"
# Invoke Lambda to increase provisioned concurrency
Load Testing
Simulate bursts before launch:
# Simulate 10,000 req/sec
artillery quick --count 10000 --num 100000 \
https://api.example.com
# Monitor:
# - Response times
# - Error rates
# - DynamoDB throttling
# - Lambda throttling
Fix bottlenecks before real traffic arrives.
Monitoring Bursts
Alert when burst happens:
BurstDetectionAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: Invocations
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 5000 # 5000/min = 83/sec
ComparisonOperator: GreaterThanThreshold
Real Example: Black Friday
Preparation:
- Load test with 10x expected traffic
- Enable on-demand DynamoDB
- Increase Lambda provisioned concurrency 10x
- Setup queue-based processing
- Enable caching
- Setup alarms
Result:
- Orders per hour: 1,000,000 (vs. 100,000 normal)
- Errors: < 0.01%
- Cost: 5x (acceptable)
- ROI: Massive sales spike = 50x revenue increase
Best Practices
- Use on-demand for databases — Auto-scales with bursts
- Queue requests — Smooth out spikes
- Cache aggressively — Reduce load during burst
- Load test — Know your limits before users do
- Monitor strictly — Alert on anomalies
- Graceful degradation — Better degraded than down
Hands-On: Burst Handling
- Create API with mock 1-second response time
- Load test with normal traffic (100 req/sec)
- Load test with burst (5,000 req/sec)
- Add queue-based processing
- Re-test burst, see improvement
Key Takeaway
Bursts are inevitable. Prepare with queuing, caching, and on-demand scaling. You won't be caught off guard.