AWS CloudFormation for CloudOps

IntermediateTopic45 min8 min read26 Apr 2026AWS

Infrastructure as Code for operations. Covers CloudFormation stack management, StackSets, drift detection, cfn-init bootstrapping, and troubleshooting — key SOA-C03 topics.

What you'll learn

  • Write CloudFormation templates using all major sections
  • Bootstrap EC2 instances using cfn-init and cfn-signal
  • Deploy stacks across multiple accounts/regions with StackSets
  • Detect and remediate configuration drift
  • Troubleshoot stack failures and rollback scenarios

Prerequisites

Relevant for certifications

SOA-C03DVA-C02

CloudFormation Fundamentals

AWS CloudFormation treats your infrastructure as code — you define resources in a template (YAML or JSON) and CloudFormation creates, updates, and deletes them as a unit called a stack.

Template (YAML/JSON)
  → CloudFormation service
    → Stack (group of AWS resources created together)

Template sections

AWSTemplateFormatVersion: "2010-09-09"
Description: "My CloudFormation template"

Parameters:       # Input values passed at deploy time
Mappings:         # Lookup tables (region → AMI ID)
Conditions:       # Conditional resource creation
Resources:        # REQUIRED — the actual AWS resources
Outputs:          # Values exported from the stack

Resources (required section)

Resources:
  MyEC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-0c94855ba95c71c99
      InstanceType: t3.micro
      KeyName: !Ref KeyPairParam
      Tags:
        - Key: Name
          Value: CloudOps-Demo

Parameters

Parameters:
  EnvironmentType:
    Type: String
    Default: dev
    AllowedValues: [dev, staging, prod]
    Description: "Deployment environment"

  InstanceType:
    Type: String
    Default: t3.micro

Mappings

Mappings:
  RegionMap:
    us-east-1:
      AMI: ami-0c94855ba95c71c99
    eu-west-1:
      AMI: ami-0713f98de93617bb4

# Usage
ImageId: !FindInMap [RegionMap, !Ref AWS::Region, AMI]

Outputs & Exports

Outputs:
  InstancePublicIP:
    Description: "Public IP of the EC2 instance"
    Value: !GetAtt MyEC2Instance.PublicIp
    Export:
      Name: !Sub "${AWS::StackName}-PublicIP"

# Cross-stack import
Value: !ImportValue prod-stack-PublicIP

Intrinsic Functions

FunctionPurposeExample
!RefReference a resource or parameter!Ref MyBucket
!GetAttGet resource attribute!GetAtt MyLB.DNSName
!SubString substitution!Sub "arn:aws:s3:::${BucketName}"
!FindInMapLookup in Mappings!FindInMap [Map, Key1, Key2]
!IfConditional value!If [IsProd, m5.xlarge, t3.micro]
!ImportValueCross-stack import!ImportValue vpc-VPCId
!JoinJoin strings!Join [":", [a, b, c]]
!SelectSelect from list!Select [0, !GetAZs ""]

EC2 Bootstrapping with cfn-init

cfn-init is a CloudFormation helper script that reads metadata from your template to configure EC2 instances — more powerful and idempotent than User Data scripts.

cfn-init vs User Data

User Datacfn-init
RunsOnce on first bootOn every cfn-init call
IdempotentNoYes
HandlesSimple bashPackages, files, services, commands
Feedback to CFNoneVia cfn-signal

cfn-init example

Resources:
  MyInstance:
    Type: AWS::EC2::Instance
    Metadata:
      AWS::CloudFormation::Init:
        config:
          packages:
            yum:
              httpd: []
              php: []
          files:
            /var/www/html/index.php:
              content: !Sub |
                <?php echo "Hello from ${AWS::StackName}"; ?>
              mode: "000644"
              owner: apache
              group: apache
          services:
            sysvinit:
              httpd:
                enabled: true
                ensureRunning: true
    Properties:
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          yum update -y aws-cfn-bootstrap
          /opt/aws/bin/cfn-init -v \
            --stack ${AWS::StackName} \
            --resource MyInstance \
            --region ${AWS::Region}
          /opt/aws/bin/cfn-signal -e $? \
            --stack ${AWS::StackName} \
            --resource MyInstance \
            --region ${AWS::Region}

cfn-signal & Wait Conditions

cfn-signal tells CloudFormation whether the instance bootstrapped successfully. Combined with a CreationPolicy, this makes CloudFormation wait for the signal before marking the resource as CREATE_COMPLETE.

Resources:
  MyInstance:
    Type: AWS::EC2::Instance
    CreationPolicy:
      ResourceSignal:
        Timeout: PT15M    # Wait up to 15 minutes for signal
        Count: 1
    Properties:
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          # ... setup commands ...
          # Signal success (0) or failure ($?)
          /opt/aws/bin/cfn-signal -e 0 \
            --stack ${AWS::StackName} \
            --resource MyInstance \
            --region ${AWS::Region}

Warning

If cfn-signal is never received (e.g., bootstrap fails silently), CloudFormation waits the full timeout then marks the stack as FAILED and rolls back. Always check /var/log/cfn-init.log and /var/log/cloud-init-output.log when debugging.


Stack Lifecycle Management

Deletion Policy

Controls what happens to a resource when the stack is deleted:

Resources:
  MyDatabase:
    Type: AWS::RDS::DBInstance
    DeletionPolicy: Retain       # Keep the RDS instance after stack delete
  
  MyBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Snapshot     # Take a snapshot before deletion (for supported resources)
    
  MyLambda:
    Type: AWS::Lambda::Function
    DeletionPolicy: Delete       # Default — delete the resource

UpdateReplace Policy

Similar to DeletionPolicy, but controls what happens to the old resource when CloudFormation replaces it during an update:

  MyVolume:
    Type: AWS::EC2::Volume
    UpdateReplacePolicy: Snapshot   # Snapshot old volume before replacing

Stack Policy

A stack policy is a JSON document that protects specific resources from being updated or deleted during a stack update.

{
  "Statement": [
    {
      "Effect": "Deny",
      "Action": "Update:Replace",
      "Principal": "*",
      "Resource": "LogicalResourceId/ProductionDB"
    },
    {
      "Effect": "Allow",
      "Action": "Update:*",
      "Principal": "*",
      "Resource": "*"
    }
  ]
}

Termination Protection

Prevents accidental stack deletion:

aws cloudformation update-termination-protection \
  --stack-name my-prod-stack \
  --enable-termination-protection

Nested Stacks

A nested stack is a stack created as a resource within another stack. Used to break large templates into reusable modules.

Resources:
  VPCStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://s3.amazonaws.com/my-bucket/vpc.yaml
      Parameters:
        CIDRRange: 10.0.0.0/16

  AppStack:
    Type: AWS::CloudFormation::Stack
    DependsOn: VPCStack
    Properties:
      TemplateURL: https://s3.amazonaws.com/my-bucket/app.yaml
      Parameters:
        VPCId: !GetAtt VPCStack.Outputs.VPCId

Nested vs cross-stack

Nested stacks use !GetAtt Stack.Outputs.Key to pass values between parent/child. Cross-stack references use Export/!ImportValue between independent stacks. Nested stacks are managed as one unit; cross-stack stacks are independent.


CloudFormation StackSets

StackSets deploy a single CloudFormation template across multiple AWS accounts and regions simultaneously.

Use cases

  • Deploy security baselines (IAM roles, CloudTrail, Config) to all accounts in an AWS Organization
  • Ensure consistent tagging policies across regions
  • Deploy compliance controls via AWS Organizations

Deployment models

ModelDescription
Self-managedYou manually specify target accounts and execution roles
Service-managedAWS Organizations integration — auto-deploy to new accounts

Hands-on: Deploy to all accounts in an OU

1. Enable trusted access for StackSets in AWS Organizations
2. Create StackSet:
   - Template: s3://my-bucket/security-baseline.yaml
   - Permissions: Service-managed (Organizations)
   - Deployment targets: Organizational Unit (OU) ID
   - Regions: us-east-1, eu-west-1
   - Failure tolerance: 0 (stop on first failure)
3. Monitor stack instances for each account/region

Warning

StackSets with service-managed permissions auto-deploy to new accounts joining the OU. This is powerful but means every new account gets the template — ensure the template is safe for all account types.


Drift Detection

Drift occurs when a resource's actual configuration differs from what CloudFormation deployed (e.g., someone manually changed a security group rule in the console).

Deployed by CF: SG allows port 443
Manual change: Someone added port 80 (not in template)
→ Drift detected on the security group resource

Detect drift

# Initiate drift detection
aws cloudformation detect-stack-drift --stack-name my-stack

# Check status (drift detection is async)
aws cloudformation describe-stack-drift-detection-status \
  --stack-drift-detection-id <id>

# View drifted resources
aws cloudformation describe-stack-resource-drifts \
  --stack-name my-stack \
  --stack-resource-drift-status-filters MODIFIED DELETED

Remediate drift

  • Option 1: Update the template to match the manual change, then update the stack
  • Option 2: Revert the manual change, then run drift detection again to confirm clean

Troubleshooting CloudFormation

Stack failure scenarios

ErrorCauseFix
ROLLBACK_COMPLETEResource creation failedCheck stack events, fix template, delete and redeploy
UPDATE_ROLLBACK_FAILEDStack failed to roll backUse ContinueUpdateRollback API to skip failed resources
CREATE_FAILED + no rollbackSet --on-failure DO_NOTHING during create for debuggingInspect instance logs before CF cleans up
cfn-signal timeoutBootstrap didn't complete in timeSSH in, check /var/log/cfn-init.log

Checking stack events

# View all events (most recent first)
aws cloudformation describe-stack-events \
  --stack-name my-stack \
  --query 'StackEvents[?ResourceStatus==`CREATE_FAILED`]'

Common root causes

  1. IAM permissions — CloudFormation role doesn't have permission to create the resource
  2. Resource limit exceeded — e.g., VPC limit, EIP limit
  3. Invalid parameter values — wrong AMI ID for the region
  4. Circular dependencies — two resources reference each other; use DependsOn carefully
  5. cfn-signal not received — bootstrap script failed; check instance logs

CloudFormation for ASG Rolling Updates

For Auto Scaling Groups, use the UpdatePolicy attribute to control how instances are replaced during a stack update:

Resources:
  MyASG:
    Type: AWS::AutoScaling::AutoScalingGroup
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MinInstancesInService: 1      # Keep at least 1 running during update
        MaxBatchSize: 2               # Replace 2 at a time
        PauseTime: PT5M               # Wait 5 min between batches
        WaitOnResourceSignals: true   # Wait for cfn-signal from new instances

Common SOA-C03 Exam Questions

Q: A CloudFormation stack update is failing to roll back. What command do you use? Use aws cloudformation continue-update-rollback. You can specify --resources-to-skip to bypass resources that are blocking the rollback.

Q: How do you prevent someone from accidentally deleting a production RDS instance via a stack update? Add a DeletionPolicy: Retain on the RDS resource AND enable a Stack Policy that denies Update:Replace on that resource.

Q: You need to deploy a CloudTrail configuration to 50 accounts across 3 regions. What service do you use? CloudFormation StackSets with service-managed permissions (AWS Organizations integration). Deploy to the target OU — it will propagate across all accounts and regions automatically.

Q: Your cfn-init bootstrap failed silently. Where do you look? Check /var/log/cfn-init.log and /var/log/cloud-init-output.log on the instance. Set --on-failure DO_NOTHING when creating the stack so the instance isn't terminated before you can inspect it.


What to Learn Next

  1. AWS Systems Manager — SSM Automation complements CloudFormation for Day 2 operations
  2. AWS CloudWatch Monitoring — monitor stack operations and trigger remediations
  3. AWS Account Management — StackSets with Organizations for multi-account deployments

More in Amazon Web Services