AWS Disaster Recovery for CloudOps
Plan and operate AWS disaster recovery with RTO/RPO, backups, pilot light, warm standby, multi-site active-active, AWS Backup, and Route 53 failover.
What you'll learn
- Explain RTO and RPO in practical operational terms
- Compare backup and restore, pilot light, warm standby, and active-active
- Use AWS Backup for centralized backup policies
- Design DR runbooks and test failover safely
Prerequisites
Relevant for certifications
RTO and RPO
| Term | Question it answers |
|---|---|
| RTO | How long can the service be down? |
| RPO | How much data can we afford to lose? |
Shorter RTO/RPO usually means more running infrastructure, more replication, and higher cost.
DR Strategies
| Strategy | Cost | Recovery speed | Pattern |
|---|---|---|---|
| Backup and restore | Low | Slow | Restore infrastructure and data after failure |
| Pilot light | Low-medium | Medium | Core data layer is ready; app layer scales up during DR |
| Warm standby | Medium | Fast | Smaller full environment already running |
| Multi-site active-active | High | Fastest | Traffic served from multiple Regions continuously |
Service Building Blocks
- S3 Cross-Region Replication for object copies.
- EBS snapshot copy for volume recovery.
- RDS automated backups and cross-region snapshots.
- Aurora Global Database for low-lag global database recovery.
- AWS Backup for centralized policies and vaults.
- Route 53 failover routing for DNS-level traffic shift.
- CloudFormation StackSets for repeatable baseline infrastructure.
AWS Backup
AWS Backup centralizes backup policies across supported services.
Core pieces:
- Backup plan.
- Backup rule and schedule.
- Backup vault.
- Resource assignment by tags or ARNs.
- Lifecycle to cold storage where supported.
- Cross-account or cross-region copy.
Hands-on: Create a Basic AWS Backup Plan
Goal: Back up tagged resources daily and retain backups for 35 days.
- Tag resources to protect with
Backup = daily. - Open AWS Backup > Backup plans.
- Choose Create backup plan.
- Start from a new plan and name it
daily-cloudops-backup. - Add a rule with daily frequency, an off-peak backup window, and deletion after 35 days.
- Assign resources by tag with key
Backupand valuedaily. - Create the plan.
- Start an on-demand backup to validate permissions.
- Confirm the recovery point appears in the vault.
Hands-on: S3 Cross-Region DR Bucket
- Create a source bucket in Region A and a destination bucket in Region B.
- Enable versioning on both buckets.
- Configure replication from source to destination.
- Choose or create the replication IAM role.
- Upload a new object to the source bucket.
- Confirm it appears in the destination bucket.
- Test recovery by reading the object from the destination Region.
Hands-on: DR Runbook for a Web App
Use this as a numbered runbook for a simple application:
- Confirm the incident scope and declare DR if the primary Region is unavailable.
- Freeze deployments to the primary Region.
- Validate replicated S3 data in the DR Region.
- Restore or promote the database in the DR Region.
- Deploy or scale up the application stack in the DR Region.
- Run smoke tests against the DR load balancer.
- Update Route 53 failover or weighted records to send traffic to DR.
- Monitor error rate, latency, and database health.
- Record the exact failover time and data recovery point.
- After primary recovers, plan failback as a separate controlled change.
Practice matters
A DR plan that has never been tested is only a theory. Schedule game days and record gaps as operational work items.
Common SOA-C03 Exam Questions
Q: Which DR strategy is cheapest but slowest? Backup and restore.
Q: Which service centralizes backup policies across AWS services? AWS Backup.
Q: How do you automate DNS failover between Regions? Route 53 failover routing with health checks.
What to Learn Next
- Route 53 and DNS for CloudOps - failover records and health checks
- Amazon S3 for CloudOps - replication, object lock, and lifecycle
- AWS Account Management - cross-account backup and logging governance
