Debugging: AWS & GCP Strategies

Serverless applications are hard to debug—you can't SSH into the runtime, can't attach to a process. Instead, you rely on logs, metrics, and distributed tracing. Both AWS and GCP offer tools and best practices for finding and fixing issues quickly.

Simple Explanation

What it is

Debugging is the process of finding the real cause of a problem and proving the fix works.

Why we need it

In serverless, failures are often spread across services. You need a structured approach so you do not guess and hope.

Benefits

Faster root-cause discovery when incidents happen.
Less downtime because fixes are targeted.
Better confidence when shipping changes.

Tradeoffs

More tooling to learn (logs, traces, metrics).
Requires discipline to reproduce issues properly.

Real-world examples (architecture only)

Bug in payment flow -> Trace shows failure in third-party API.
Timeout spike -> Logs show slow database query.

Part 1: AWS Debugging

Debugging Strategies

1. Reproduction

Can you reproduce the issue locally?

# Get the exact event from CloudWatch
aws logs get-log-events \
  --log-group-name /aws/lambda/myfunction \
  --log-stream-name '2026/02/08/[$LATEST]abc123'

# Copy event JSON
# Test locally with SAM
sam local invoke MyFunction -e event.json

2. Isolation

Test components independently:

# Test Lambda handler separately
from index import handler

event = {"id": "123"}
print(handler(event, None))

# Test database connection
from db import connect_db

try:
    connect_db()
    print("Connected")
except Exception as exc:
    print(f"Connection failed: {exc}")

# Test external API
from api import call_api

print(call_api("https://api.example.com"))

3. Add Logging

Methodically add logs to narrow down the issue:

import json


def handler(event, context):
    print("1. Entry - Event:", event)

    data = json.loads(event.get("body") or "{}")
    print("2. Parsed - Data:", data)

    result = database.query(data)
    print("3. Query - Result:", result)

    formatted = format_result(result)
    print("4. Formatted:", formatted)

    return formatted

Gradually remove logs as you understand the flow.

4. Breakpoint Debugging

Full IDE debugging with SAM:

sam local start-api --debug

In VS Code, attach debugger:

// .vscode/launch.json
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Attach to SAM (Python)",
            "type": "python",
            "request": "attach",
            "connect": {
                "host": "localhost",
                "port": 5890
            }
        }
    ]
}

Common Issues & Fixes

Lambda Timeout

Function takes too long:

Symptoms:

"Task timed out after X seconds"
Incomplete logs

Debug:

import time

start_time = time.time()

def handler(event, context):
    print("Duration so far:", int((time.time() - start_time) * 1000), "ms")

    # Your code

    print("Final duration:", int((time.time() - start_time) * 1000), "ms")

Fix:

Increase timeout in Lambda config
Optimize slow operations
Use async/await properly
Parallelize operations

Out of Memory

Function uses too much memory:

Symptoms:

"Process exited before completing request"
Sudden termination

Debug:

import resource

usage_kb = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print("Memory usage (KB):", usage_kb)

Fix:

Stream large files instead of loading in memory
Release unused references
Increase Lambda memory allocation
Use appropriate data structures

Permission Denied

IAM role lacks permissions:

Symptoms:

"User: arn:aws:iam::... is not authorized to perform: s3:GetObject"

Debug: Check Lambda execution role:

aws iam get-role-policy \
  --role-name MyLambdaRole \
  --policy-name S3Access

Fix: Add permission to role:

{
    "Effect": "Allow",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-bucket/*"
}

Cold Start Delays

First invocation is slow:

Symptoms:

First request takes 1-2 seconds
Subsequent requests are fast

Debug:

duration_ms = int((time.time() - start_time) * 1000)
print("Duration:", duration_ms)

# Cold start is typically higher than warm starts

Fix:

Optimize code bundle size (remove unused dependencies)
Prefer lightweight runtimes (Python over Java)
Reduce VPC overhead (if using VPC)
Set provisioned concurrency for critical functions

DynamoDB Not Found

Cannot access DynamoDB table:

Symptoms:

"ResourceNotFoundException"
"Requested resource not found"

Debug:

# Verify table exists
aws dynamodb describe-table --table-name Items

# Verify Lambda can access it (IAM check)
# Verify table name matches

Fix:

Check table name spelling (case-sensitive)
Verify Lambda IAM role has dynamodb:GetItem, etc.
Check table is in same region as Lambda

Debugging Tools

AWS X-Ray

[Lesson 5] covers this in detail. Enables distributed tracing.

CloudWatch Logs Insights

Query logs to find patterns:

fields @timestamp, @message, @duration
| filter @message like /ERROR/
| stats count() as errors, avg(@duration) by @logStream

AWS Lambda Insights

CloudWatch extension for performance:

Add extension to Lambda
View metrics: CPU, memory allocation, duration
Identify performance bottlenecks

SAM Local Debugging

Debug locally before deployment:

# Run function locally with event
sam local invoke MyFunction -e event.json

# Start API locally with auto-reload
sam local start-api --debug

# Attach IDE debugger to port 5858

Remote Debugging

Debug production issues with temporary logging:

import json
import os

DEBUG = os.environ.get("DEBUG") == "true"


def handler(event, context):
    if DEBUG:
        print("Full event:", json.dumps(event, indent=2))
        print("All env vars:", dict(os.environ))
    # Your code

Enable for specific invocation:

aws lambda update-function-configuration \
  --function-name MyFunction \
  --environment Variables={DEBUG=true}

# Test
curl https://api.example.com/test

# Disable
aws lambda update-function-configuration \
  --function-name MyFunction \
  --environment Variables={DEBUG=false}

Part 2: GCP Debugging

GCP Debugging Strategies

1. Reproduction

Get the exact triggering event and test locally:

# View function execution logs
gcloud functions logs read my-function --limit 50

# Export specific log entries
gcloud logging read 'resource.type="cloud_function"' \
  --format json > logs.json

# Test locally with Functions Framework
functions-framework --target myFunction --debug

2. Isolation

Test components independently:

from db import connect_firestore
from api import call_api

try:
    connect_firestore()
    print("Connected")
except Exception as exc:
    print(f"Connection failed: {exc}")

print(call_api("https://api.example.com"))

3. Cloud Debugger

Attach a real-time debugger to your function:

import googleclouddebugger

googleclouddebugger.enable(
    service="my-function",
    version="1.0.0",
)


def my_function(request):
    print("Request:", request.get_json(silent=True))
    # Set breakpoints in Cloud Console
    return ("Hello", 200)

In Cloud Console, browse source code, set breakpoints, inspect variables.

4. Structured Logging for Debugging

Use JSON-formatted logs for powerful filtering:

import json
import uuid

import functions_framework
from google.cloud import logging as cloud_logging

logging_client = cloud_logging.Client()
log = logging_client.logger("debug-logs")


@functions_framework.http
def debug_demo(request):
    request_id = request.headers.get("x-request-id", str(uuid.uuid4()))

    log.log_struct({
        "requestId": request_id,
        "message": "Request received",
        "method": request.method,
        "path": request.path,
    }, severity="DEBUG")

    try:
        data = request.get_json(silent=True) or {}
        log.log_struct({
            "requestId": request_id,
            "message": "Body parsed",
            "data": data,
        }, severity="DEBUG")

        result = process_data(data)
        log.log_struct({
            "requestId": request_id,
            "message": "Processing complete",
            "result": result,
        }, severity="INFO")

        return (json.dumps(result), 200)
    except Exception as exc:
        log.log_struct({
            "requestId": request_id,
            "message": "Error occurred",
            "error": str(exc),
        }, severity="ERROR")

        return ({"error": str(exc)}, 500)

Cloud Logging Log Explorer

Filter and search logs in Cloud Console:

resource.type="cloud_function"
resource.labels.function_name="my-function"
severity="ERROR"
jsonPayload.requestId="abc-123"

Cloud Profiler

Identify performance bottlenecks:

from google.cloud import profiler

profiler.start({
    'service': 'my-function',
    'service_version': '1.0'
})

Generate CPU and memory profiles automatically.

Common Issues & Fixes

Issue 1: Timeout

Function takes too long:

AWS Symptoms:

"Task timed out after X seconds"
CloudWatch shows incomplete logs

AWS Debug:

import time

start_time = time.time()


def handler(event, context):
    print("Running for", int((time.time() - start_time) * 1000), "ms")
    # Your slow code
    print("Total time", int((time.time() - start_time) * 1000), "ms")

GCP Symptoms:

"Error: function execution timeout"
Cloud Logging shows incomplete traces

GCP Debug:

import time
from datetime import datetime


def my_function(request):
    start = time.time()

    print("Starting at", datetime.utcnow().isoformat())
    # Your slow code
    print("Completed in", int((time.time() - start) * 1000), "ms")

    return ("Done", 200)

Fix (Both):

Increase timeout setting
Optimize slow database queries
Use async operations properly
Parallelize independent tasks

Issue 2: Out of Memory

AWS Symptoms:

"Process exited before completing request"
Sudden termination in logs

AWS Debug:

import resource

usage_kb = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print("Memory (KB):", usage_kb)

GCP Symptoms:

"Error: resource exhausted"
Function crashes with no error message

GCP Debug:

import resource
import time


while True:
    usage_kb = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    print(f"Memory: {round(usage_kb / 1024, 2)}MB")
    time.sleep(1)

Fix (Both):

Stream large files instead of loading entire file in memory
Release unused object references
Increase memory allocation
Use appropriate data structures (Set vs Array for millions of items)

Issue 3: Permission Denied

AWS Symptoms:

"User: arn:aws:iam::... is not authorized to perform: s3:GetObject"

AWS Debug:

aws iam get-role-policy --role-name MyLambdaRole --policy-name S3Access

GCP Symptoms:

"Error: permission denied on resource"
"Cloud IAM says you don't have access to..."

GCP Debug:

gcloud functions describe my-function --format=json | grep serviceAccountEmail
gcloud iam service-accounts get-iam-policy SERVICE_ACCOUNT

Fix (AWS):

{
    "Effect": "Allow",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-bucket/*"
}

Fix (GCP):

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member serviceAccount:SA_EMAIL \
  --role roles/storage.objectViewer

Issue 4: Cold Start Delays

AWS Symptoms:

First request takes 1-2 seconds
CloudWatch shows high Duration for first invocation

GCP Symptoms:

First request takes 0.5-2 seconds
Subsequent requests are fast

Debug (Both): Check logs for first vs second invocation. Cold starts are normal but can be optimized.

Fix (Both):

Reduce code bundle size (remove unused dependencies)
Use lightweight runtimes (Python → Java)
Minimize initialization code outside handler
AWS: Use provisioned concurrency
GCP: Use min-instances setting

Issue 5: Firestore/DynamoDB Not Found

AWS Symptoms:

"ResourceNotFoundException: Requested resource not found"

AWS Debug:

aws dynamodb describe-table --table-name Items
aws iam get-role-policy --role-name LambdaRole --policy-name DynamoDB

GCP Symptoms:

"Error: failed to get document"
"PERMISSION_DENIED: permission denied"

GCP Debug:

gcloud firestore databases list
gcloud iam service-accounts get-iam-policy \
  FUNCTION_SERVICE_ACCOUNT

Fix (AWS):

Check table name (case-sensitive)
Verify Lambda IAM role has dynamodb:* permissions
Verify table exists in same region

Fix (GCP):

Check Firestore database is initialized
Verify service account has roles/datastore.user role
Check database location is correct

AWS vs. GCP Debugging Tools

Tool/Capability	AWS	GCP
Local testing	SAM CLI, docker-lambda	Functions Framework
IDE debugging	VS Code + SAM, IntelliJ	Cloud Code (VS Code)
Production debugging	X-Ray (distributed tracing)	Cloud Debugger, Cloud Profiler
Log querying	CloudWatch Logs Insights	Cloud Logging Log Explorer
Performance profiling	Lambda Insights	Cloud Profiler
Live breakpoints	CloudWatch RUM	Cloud Debugger
Error tracking	CloudWatch + third-party	Error Reporting
Memory/CPU graphs	CloudWatch metrics	Cloud Monitoring

Key Differences

Local debugging: SAM offers more mature tooling; GCP uses Functions Framework (simpler)
Remote debugging: AWS uses X-Ray for tracing; GCP uses Cloud Debugger for live breakpoints
Log analysis: CloudWatch Insights uses custom query language; Cloud Logging uses simpler filter syntax
Performance: AWS has Lambda Insights extension; GCP has Cloud Profiler

Best Practices (Both Platforms)

Log liberally in development — Be conservative in production
Use log levels — DEBUG, INFO, WARN, ERROR (structured logging)
Include context — Request ID, user ID, timestamps in every log
Test error paths — Don't just test happy path
Keep previous versions — For quick rollback during intensive debugging
Correlate traces — Use request/trace IDs to follow requests across services
Monitor memory — Set alerts for high memory usage trends
Profile in production — GCP Profiler and AWS X-Ray work on real traffic

Hands-On: Multi-Cloud Debugging

AWS Lambda

Create function with intentional bug:

aws lambda create-function \
    --function-name debug-demo \
    --runtime python3.12 \
    --handler lambda_function.handler \
    --zip-file fileb://function.zip

Invoke and check logs:

aws lambda invoke \
  --function-name debug-demo \
  --payload '{"id": "123"}' \
  response.json

aws logs tail /aws/lambda/debug-demo --follow

Add debug logging and redeploy:

aws lambda update-function-code \
  --function-name debug-demo \
  --zip-file fileb://function-v2.zip

Google Cloud

Deploy function:

gcloud functions deploy debug-demo \
    --runtime python312 \
    --trigger-http \
    --allow-unauthenticated

View logs:

gcloud functions logs read debug-demo --limit 50 --follow

Query with Cloud Logging:

gcloud logging read \
  'resource.type="cloud_function" AND resource.labels.function_name="debug-demo"' \
  --limit 50

Key Takeaway

Debugging serverless requires different tools and mindset than traditional development. Structured logging is your best friend—add request IDs, log at every decision point, and use your platform's querying tools to find patterns. Local simulation helps catch issues early; production debugging relies on logs, metrics, and trace IDs.

Simple Explanation​

What it is​

Why we need it​

Benefits​

Tradeoffs​

Real-world examples (architecture only)​

Part 1: AWS Debugging​

Debugging Strategies​

1. Reproduction​

2. Isolation​

3. Add Logging​

4. Breakpoint Debugging​

Common Issues & Fixes​

Lambda Timeout​

Out of Memory​

Permission Denied​

Cold Start Delays​

DynamoDB Not Found​

Debugging Tools​

AWS X-Ray​

CloudWatch Logs Insights​

AWS Lambda Insights​

SAM Local Debugging​

Remote Debugging​

Part 2: GCP Debugging​

GCP Debugging Strategies​

1. Reproduction​

2. Isolation​

3. Cloud Debugger​

4. Structured Logging for Debugging​

Cloud Logging Log Explorer​

Cloud Profiler​

Common Issues & Fixes​

Issue 1: Timeout​

Issue 2: Out of Memory​

Issue 3: Permission Denied​

Issue 4: Cold Start Delays​

Issue 5: Firestore/DynamoDB Not Found​

AWS vs. GCP Debugging Tools​

Key Differences​

Best Practices (Both Platforms)​

Hands-On: Multi-Cloud Debugging​

AWS Lambda​

Google Cloud​

Key Takeaway​

Simple Explanation

What it is

Why we need it

Benefits

Tradeoffs

Real-world examples (architecture only)

Part 1: AWS Debugging

Debugging Strategies

1. Reproduction

2. Isolation

3. Add Logging

4. Breakpoint Debugging

Common Issues & Fixes

Lambda Timeout

Out of Memory

Permission Denied

Cold Start Delays

DynamoDB Not Found

Debugging Tools

AWS X-Ray

CloudWatch Logs Insights

AWS Lambda Insights

SAM Local Debugging

Remote Debugging

Part 2: GCP Debugging

GCP Debugging Strategies

1. Reproduction

2. Isolation

3. Cloud Debugger

4. Structured Logging for Debugging

Cloud Logging Log Explorer

Cloud Profiler

Common Issues & Fixes

Issue 1: Timeout

Issue 2: Out of Memory

Issue 3: Permission Denied

Issue 4: Cold Start Delays

Issue 5: Firestore/DynamoDB Not Found

AWS vs. GCP Debugging Tools

Key Differences

Best Practices (Both Platforms)

Hands-On: Multi-Cloud Debugging

AWS Lambda

Google Cloud

Key Takeaway