Level 3: Operate
Purpose
Manage and monitor serverless applications in production across AWS and Google Cloud. Build observability, resilience, and cost optimization.
Simple Explanation
What it is
This level teaches you how to keep a serverless system healthy in production. You will learn how to see what is happening, detect problems early, and fix issues without guessing.
Why we need it
Serverless hides servers, but it does not remove production problems. Logs, metrics, and traces are how you know if users are happy or if your system is failing.
Benefits
- Clear visibility into errors, latency, and traffic.
- Faster recovery because you can diagnose issues quickly.
- Lower cost when you spot waste early.
Tradeoffs
- More tools to learn and configure.
- Ongoing maintenance for dashboards and alerts.
Real-world examples (architecture only)
- API errors spike -> Alert -> Rollback -> Recovery.
- High latency -> Trace -> Identify slow database query.
Who It's For
- Developers running production serverless
- DevOps/SREs supporting serverless systems
- Prerequisites: Completed Level 2: Build
What You Will Build
- Multi-cloud logging and monitoring
- Alerting and dashboards (AWS & GCP)
- Debugging strategies
- Error tracking and resilience
- Cost monitoring across clouds
Lesson Agenda
- Logging Across Clouds — CloudWatch vs. Cloud Logging
- Monitoring & Alerts — Metrics and dashboards (AWS & GCP)
- Debugging Techniques — Find and fix production issues
- Error Handling — Resilience patterns
- Tracing & Observability — X-Ray vs. Cloud Trace
- Cost Optimization — Multi-cloud cost tracking
AWS ↔ GCP Service Map
| Observability Layer | AWS | Google Cloud |
|---|---|---|
| Logging | CloudWatch Logs | Cloud Logging |
| Metrics | CloudWatch Metrics | Cloud Monitoring |
| Dashboards | CloudWatch Dashboards | Cloud Monitoring Dashboards |
| Alarms | CloudWatch Alarms | Alerting Policies |
| Distributed Tracing | X-Ray | Cloud Trace |
| Profiling | Lambda Insights | Cloud Profiler |
| Error Tracking | CloudWatch Logs Insights | Error Reporting |
| Log Analysis | Logs Insights (SQL-like) | Log Analytics (SQL) |
| Cost Tracking | Cost Explorer / CloudWatch | Cost Management / Monitoring |
| APM Integration | Datadog, New Relic, Splunk | Datadog, New Relic, Cloud APM |
Duration: 2 weeks
Time per lesson: 30–40 minutes
Focus: Observability, resilience, multi-cloud
Next level: Ready for Level 4: Scale — Global deployments