AWS CloudFormation for CloudOps
Infrastructure as Code for operations. Covers CloudFormation stack management, StackSets, drift detection, cfn-init bootstrapping, and troubleshooting — key SOA-C03 topics.
What you'll learn
- Write CloudFormation templates using all major sections
- Bootstrap EC2 instances using cfn-init and cfn-signal
- Deploy stacks across multiple accounts/regions with StackSets
- Detect and remediate configuration drift
- Troubleshoot stack failures and rollback scenarios
Prerequisites
Relevant for certifications
CloudFormation Fundamentals
AWS CloudFormation treats your infrastructure as code — you define resources in a template (YAML or JSON) and CloudFormation creates, updates, and deletes them as a unit called a stack.
Template (YAML/JSON)
→ CloudFormation service
→ Stack (group of AWS resources created together)
Template sections
AWSTemplateFormatVersion: "2010-09-09"
Description: "My CloudFormation template"
Parameters: # Input values passed at deploy time
Mappings: # Lookup tables (region → AMI ID)
Conditions: # Conditional resource creation
Resources: # REQUIRED — the actual AWS resources
Outputs: # Values exported from the stack
Resources (required section)
Resources:
MyEC2Instance:
Type: AWS::EC2::Instance
Properties:
ImageId: ami-0c94855ba95c71c99
InstanceType: t3.micro
KeyName: !Ref KeyPairParam
Tags:
- Key: Name
Value: CloudOps-Demo
Parameters
Parameters:
EnvironmentType:
Type: String
Default: dev
AllowedValues: [dev, staging, prod]
Description: "Deployment environment"
InstanceType:
Type: String
Default: t3.micro
Mappings
Mappings:
RegionMap:
us-east-1:
AMI: ami-0c94855ba95c71c99
eu-west-1:
AMI: ami-0713f98de93617bb4
# Usage
ImageId: !FindInMap [RegionMap, !Ref AWS::Region, AMI]
Outputs & Exports
Outputs:
InstancePublicIP:
Description: "Public IP of the EC2 instance"
Value: !GetAtt MyEC2Instance.PublicIp
Export:
Name: !Sub "${AWS::StackName}-PublicIP"
# Cross-stack import
Value: !ImportValue prod-stack-PublicIP
Intrinsic Functions
| Function | Purpose | Example |
|---|---|---|
!Ref | Reference a resource or parameter | !Ref MyBucket |
!GetAtt | Get resource attribute | !GetAtt MyLB.DNSName |
!Sub | String substitution | !Sub "arn:aws:s3:::${BucketName}" |
!FindInMap | Lookup in Mappings | !FindInMap [Map, Key1, Key2] |
!If | Conditional value | !If [IsProd, m5.xlarge, t3.micro] |
!ImportValue | Cross-stack import | !ImportValue vpc-VPCId |
!Join | Join strings | !Join [":", [a, b, c]] |
!Select | Select from list | !Select [0, !GetAZs ""] |
EC2 Bootstrapping with cfn-init
cfn-init is a CloudFormation helper script that reads metadata from your template to configure EC2 instances — more powerful and idempotent than User Data scripts.
cfn-init vs User Data
| User Data | cfn-init | |
|---|---|---|
| Runs | Once on first boot | On every cfn-init call |
| Idempotent | No | Yes |
| Handles | Simple bash | Packages, files, services, commands |
| Feedback to CF | None | Via cfn-signal |
cfn-init example
Resources:
MyInstance:
Type: AWS::EC2::Instance
Metadata:
AWS::CloudFormation::Init:
config:
packages:
yum:
httpd: []
php: []
files:
/var/www/html/index.php:
content: !Sub |
<?php echo "Hello from ${AWS::StackName}"; ?>
mode: "000644"
owner: apache
group: apache
services:
sysvinit:
httpd:
enabled: true
ensureRunning: true
Properties:
UserData:
Fn::Base64: !Sub |
#!/bin/bash
yum update -y aws-cfn-bootstrap
/opt/aws/bin/cfn-init -v \
--stack ${AWS::StackName} \
--resource MyInstance \
--region ${AWS::Region}
/opt/aws/bin/cfn-signal -e $? \
--stack ${AWS::StackName} \
--resource MyInstance \
--region ${AWS::Region}
cfn-signal & Wait Conditions
cfn-signal tells CloudFormation whether the instance bootstrapped successfully. Combined with a CreationPolicy, this makes CloudFormation wait for the signal before marking the resource as CREATE_COMPLETE.
Resources:
MyInstance:
Type: AWS::EC2::Instance
CreationPolicy:
ResourceSignal:
Timeout: PT15M # Wait up to 15 minutes for signal
Count: 1
Properties:
UserData:
Fn::Base64: !Sub |
#!/bin/bash
# ... setup commands ...
# Signal success (0) or failure ($?)
/opt/aws/bin/cfn-signal -e 0 \
--stack ${AWS::StackName} \
--resource MyInstance \
--region ${AWS::Region}
Warning
If cfn-signal is never received (e.g., bootstrap fails silently), CloudFormation waits the full timeout then marks the stack as FAILED and rolls back. Always check /var/log/cfn-init.log and /var/log/cloud-init-output.log when debugging.
Stack Lifecycle Management
Deletion Policy
Controls what happens to a resource when the stack is deleted:
Resources:
MyDatabase:
Type: AWS::RDS::DBInstance
DeletionPolicy: Retain # Keep the RDS instance after stack delete
MyBucket:
Type: AWS::S3::Bucket
DeletionPolicy: Snapshot # Take a snapshot before deletion (for supported resources)
MyLambda:
Type: AWS::Lambda::Function
DeletionPolicy: Delete # Default — delete the resource
UpdateReplace Policy
Similar to DeletionPolicy, but controls what happens to the old resource when CloudFormation replaces it during an update:
MyVolume:
Type: AWS::EC2::Volume
UpdateReplacePolicy: Snapshot # Snapshot old volume before replacing
Stack Policy
A stack policy is a JSON document that protects specific resources from being updated or deleted during a stack update.
{
"Statement": [
{
"Effect": "Deny",
"Action": "Update:Replace",
"Principal": "*",
"Resource": "LogicalResourceId/ProductionDB"
},
{
"Effect": "Allow",
"Action": "Update:*",
"Principal": "*",
"Resource": "*"
}
]
}
Termination Protection
Prevents accidental stack deletion:
aws cloudformation update-termination-protection \
--stack-name my-prod-stack \
--enable-termination-protection
Nested Stacks
A nested stack is a stack created as a resource within another stack. Used to break large templates into reusable modules.
Resources:
VPCStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: https://s3.amazonaws.com/my-bucket/vpc.yaml
Parameters:
CIDRRange: 10.0.0.0/16
AppStack:
Type: AWS::CloudFormation::Stack
DependsOn: VPCStack
Properties:
TemplateURL: https://s3.amazonaws.com/my-bucket/app.yaml
Parameters:
VPCId: !GetAtt VPCStack.Outputs.VPCId
Nested vs cross-stack
Nested stacks use !GetAtt Stack.Outputs.Key to pass values between parent/child. Cross-stack references use Export/!ImportValue between independent stacks. Nested stacks are managed as one unit; cross-stack stacks are independent.
CloudFormation StackSets
StackSets deploy a single CloudFormation template across multiple AWS accounts and regions simultaneously.
Use cases
- Deploy security baselines (IAM roles, CloudTrail, Config) to all accounts in an AWS Organization
- Ensure consistent tagging policies across regions
- Deploy compliance controls via AWS Organizations
Deployment models
| Model | Description |
|---|---|
| Self-managed | You manually specify target accounts and execution roles |
| Service-managed | AWS Organizations integration — auto-deploy to new accounts |
Hands-on: Deploy to all accounts in an OU
1. Enable trusted access for StackSets in AWS Organizations
2. Create StackSet:
- Template: s3://my-bucket/security-baseline.yaml
- Permissions: Service-managed (Organizations)
- Deployment targets: Organizational Unit (OU) ID
- Regions: us-east-1, eu-west-1
- Failure tolerance: 0 (stop on first failure)
3. Monitor stack instances for each account/region
Warning
StackSets with service-managed permissions auto-deploy to new accounts joining the OU. This is powerful but means every new account gets the template — ensure the template is safe for all account types.
Drift Detection
Drift occurs when a resource's actual configuration differs from what CloudFormation deployed (e.g., someone manually changed a security group rule in the console).
Deployed by CF: SG allows port 443
Manual change: Someone added port 80 (not in template)
→ Drift detected on the security group resource
Detect drift
# Initiate drift detection
aws cloudformation detect-stack-drift --stack-name my-stack
# Check status (drift detection is async)
aws cloudformation describe-stack-drift-detection-status \
--stack-drift-detection-id <id>
# View drifted resources
aws cloudformation describe-stack-resource-drifts \
--stack-name my-stack \
--stack-resource-drift-status-filters MODIFIED DELETED
Remediate drift
- Option 1: Update the template to match the manual change, then update the stack
- Option 2: Revert the manual change, then run drift detection again to confirm clean
Troubleshooting CloudFormation
Stack failure scenarios
| Error | Cause | Fix |
|---|---|---|
ROLLBACK_COMPLETE | Resource creation failed | Check stack events, fix template, delete and redeploy |
UPDATE_ROLLBACK_FAILED | Stack failed to roll back | Use ContinueUpdateRollback API to skip failed resources |
CREATE_FAILED + no rollback | Set --on-failure DO_NOTHING during create for debugging | Inspect instance logs before CF cleans up |
| cfn-signal timeout | Bootstrap didn't complete in time | SSH in, check /var/log/cfn-init.log |
Checking stack events
# View all events (most recent first)
aws cloudformation describe-stack-events \
--stack-name my-stack \
--query 'StackEvents[?ResourceStatus==`CREATE_FAILED`]'
Common root causes
- IAM permissions — CloudFormation role doesn't have permission to create the resource
- Resource limit exceeded — e.g., VPC limit, EIP limit
- Invalid parameter values — wrong AMI ID for the region
- Circular dependencies — two resources reference each other; use
DependsOncarefully - cfn-signal not received — bootstrap script failed; check instance logs
CloudFormation for ASG Rolling Updates
For Auto Scaling Groups, use the UpdatePolicy attribute to control how instances are replaced during a stack update:
Resources:
MyASG:
Type: AWS::AutoScaling::AutoScalingGroup
UpdatePolicy:
AutoScalingRollingUpdate:
MinInstancesInService: 1 # Keep at least 1 running during update
MaxBatchSize: 2 # Replace 2 at a time
PauseTime: PT5M # Wait 5 min between batches
WaitOnResourceSignals: true # Wait for cfn-signal from new instances
Common SOA-C03 Exam Questions
Q: A CloudFormation stack update is failing to roll back. What command do you use?
Use aws cloudformation continue-update-rollback. You can specify --resources-to-skip to bypass resources that are blocking the rollback.
Q: How do you prevent someone from accidentally deleting a production RDS instance via a stack update?
Add a DeletionPolicy: Retain on the RDS resource AND enable a Stack Policy that denies Update:Replace on that resource.
Q: You need to deploy a CloudTrail configuration to 50 accounts across 3 regions. What service do you use? CloudFormation StackSets with service-managed permissions (AWS Organizations integration). Deploy to the target OU — it will propagate across all accounts and regions automatically.
Q: Your cfn-init bootstrap failed silently. Where do you look?
Check /var/log/cfn-init.log and /var/log/cloud-init-output.log on the instance. Set --on-failure DO_NOTHING when creating the stack so the instance isn't terminated before you can inspect it.
What to Learn Next
- AWS Systems Manager — SSM Automation complements CloudFormation for Day 2 operations
- AWS CloudWatch Monitoring — monitor stack operations and trigger remediations
- AWS Account Management — StackSets with Organizations for multi-account deployments
