Amazon S3 for CloudOps — Storage, Security & Data Management

IntermediateTopic50 min12 min read26 Apr 2026AWS

S3 operations and governance for CloudOps. Versioning, replication, lifecycle policies, security, event notifications, Athena queries, and storage classes — all tested on SOA-C03.

What you'll learn

Manage S3 versioning, MFA delete, and replication
Design lifecycle policies to automate storage class transitions
Configure bucket policies, ACLs, and access controls
Use S3 Event Notifications to trigger downstream processing
Query S3 data with Athena
Perform bulk operations with S3 Batch Operations

Prerequisites

aws/ec2-basics

Relevant for certifications

SOA-C03SAA-C03

#aws #s3 #storage #versioning #replication #lifecycle #security #athena #soa-c03

S3 Storage Classes

Choose the right storage class based on access frequency and retrieval requirements:

Class	Use Case	Retrieval	Min Duration	Cost
S3 Standard	Frequently accessed	Milliseconds	None	Highest
S3 Standard-IA	Infrequent access, rapid retrieval	Milliseconds	30 days	Lower storage, retrieval fee
S3 One Zone-IA	Infrequent, re-creatable data	Milliseconds	30 days	20% cheaper than IA
S3 Intelligent-Tiering	Unknown or changing access patterns	Milliseconds	None	Monitoring fee per object
S3 Glacier Instant	Archive, occasional access	Milliseconds	90 days	Low
S3 Glacier Flexible	Archive, minutes-to-hours retrieval	1–12 hours	90 days	Very low
S3 Glacier Deep Archive	Long-term archive, annual access	12–48 hours	180 days	Lowest

Intelligent-Tiering

Intelligent-Tiering automatically moves objects between access tiers. Add the Deep Archive tier to extend to 180-day archives at near-zero cost. Ideal for unpredictable workloads.

S3 Versioning

Versioning keeps all versions of an object in a bucket — protecting against accidental deletion and overwrites.

Enable versioning on a bucket → all uploads create a new version
Delete an object → creates a delete marker (object not actually removed)
Restore deleted object → delete the delete marker
Permanently delete → specify version ID in delete request

Key facts

Once versioning is enabled, it cannot be disabled — only suspended
Suspended versioning still keeps existing versions; new objects get version null
Costs: you're billed for every version stored

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

# List all versions
aws s3api list-object-versions --bucket my-bucket

# Restore a deleted object (remove the delete marker)
aws s3api delete-object \
  --bucket my-bucket \
  --key myfile.txt \
  --version-id <delete-marker-version-id>

MFA Delete

MFA Delete adds an extra protection layer — it requires MFA authentication to:

Permanently delete a versioned object
Suspend versioning

# Enable MFA Delete (requires root account + MFA device)
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled,MFADelete=Enabled \
  --mfa "arn:aws:iam::123456789:mfa/root-account-mfa-device 123456"

Warning

Only the root account can enable/disable MFA Delete. It cannot be set via IAM users — even admins.

S3 Replication

Cross-Region Replication (CRR) and Same-Region Replication (SRR) automatically copy objects between buckets.

Requirements

Versioning must be enabled on both source and destination buckets
IAM role with permissions to read source and write destination
Replication only applies to new objects after replication is enabled (not existing objects)

Use cases

Type	Use case
CRR	Disaster recovery, latency reduction for global users, compliance (data residency)
SRR	Log aggregation, test/prod sync in same region, audit copies

# Enable replication
aws s3api put-bucket-replication \
  --bucket source-bucket \
  --replication-configuration '{
    "Role": "arn:aws:iam::123456789:role/s3-replication-role",
    "Rules": [{
      "Status": "Enabled",
      "Filter": { "Prefix": "logs/" },
      "Destination": {
        "Bucket": "arn:aws:s3:::destination-bucket",
        "StorageClass": "STANDARD_IA"
      }
    }]
  }'

Replication advanced features

Feature	Description
Replication Time Control (RTC)	99.99% of objects replicated within 15 minutes — with SLA
Cross-account replication	Requires bucket policy on destination to allow source account
Bidirectional replication	Configure replication in both directions; be careful of loops
Replica modification sync	Sync metadata changes after replication
Delete marker replication	Optionally replicate delete markers (disabled by default)

S3 Lifecycle Rules

Lifecycle rules automate transitioning objects to cheaper storage classes and deleting old objects.

Transition actions

S3 Standard
  → after 30 days → S3 Standard-IA
    → after 90 days → S3 Glacier Instant
      → after 180 days → S3 Glacier Deep Archive
        → after 365 days → DELETE

Expiration actions

# Example lifecycle configuration
Rules:
  - ID: "move-logs-to-archive"
    Status: Enabled
    Filter:
      Prefix: "logs/"
    Transitions:
      - Days: 30
        StorageClass: STANDARD_IA
      - Days: 90
        StorageClass: GLACIER
    Expiration:
      Days: 365

  - ID: "clean-incomplete-multipart"
    Status: Enabled
    AbortIncompleteMultipartUpload:
      DaysAfterInitiation: 7

  - ID: "delete-old-versions"
    Status: Enabled
    NoncurrentVersionExpiration:
      NoncurrentDays: 30

S3 Analytics

Use S3 Analytics to analyse access patterns before creating lifecycle rules — it suggests optimal transition points based on actual usage data. Takes 24–48 hours to populate.

S3 Event Notifications

Trigger downstream processing when objects are created, deleted, or restored.

Event types

Event	Trigger
`s3:ObjectCreated:*`	Any upload (PutObject, PostObject, Copy, CompleteMultipartUpload)
`s3:ObjectRemoved:*`	Delete or delete marker creation
`s3:ObjectRestore:*`	Glacier restore initiated/completed
`s3:Replication:*`	Replication failure events

Targets

SNS — fan-out to multiple subscribers
SQS — queue for async processing
Lambda — serverless processing triggered by uploads

{
  "LambdaFunctionConfigurations": [{
    "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789:function:process-upload",
    "Events": ["s3:ObjectCreated:*"],
    "Filter": {
      "Key": {
        "FilterRules": [{"Name": "suffix", "Value": ".jpg"}]
      }
    }
  }]
}

EventBridge integration

Alternatively, route S3 events through Amazon EventBridge for more flexible routing, filtering, and targeting. EventBridge supports 20+ target types vs the 3 native S3 notification targets.

S3 Security

Bucket Policies

Bucket policies are JSON-based resource policies attached to a bucket — they control access for any AWS principal:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowPublicRead",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-website-bucket/*"
    },
    {
      "Sid": "DenyNonSSL",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"],
      "Condition": {
        "Bool": {"aws:SecureTransport": "false"}
      }
    }
  ]
}

Block Public Access

The Block Public Access setting (at account or bucket level) is the guardrail that prevents any public access regardless of bucket policies:

# Block all public access at account level
aws s3control put-public-access-block \
  --account-id 123456789 \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,\
    BlockPublicPolicy=true,RestrictPublicBuckets=true

S3 Access Logs

Enable server access logging to capture all requests to a bucket:

aws s3api put-bucket-logging \
  --bucket my-bucket \
  --bucket-logging-status '{
    "LoggingEnabled": {
      "TargetBucket": "my-access-logs-bucket",
      "TargetPrefix": "my-bucket-logs/"
    }
  }'

Warning

Don't enable access logs on the same bucket you're logging to — it will create an infinite logging loop.

S3 Object Lock & Glacier Vault Lock

Feature	Description	Use case
Object Lock — Governance mode	Prevent deletion unless user has special IAM permission	Controlled protection
Object Lock — Compliance mode	No one (including root) can delete before retention period	Regulatory compliance (SEC, FINRA)
S3 Glacier Vault Lock	Lock a Glacier vault policy permanently	WORM (Write Once Read Many) archiving

IAM Access Analyzer for S3

Automatically reviews bucket policies and ACLs to identify buckets shared publicly or cross-account — surfaced in the S3 console as findings.

Amazon Athena

Athena is a serverless interactive query service that lets you run SQL directly on data stored in S3.

Data in S3 (CSV, JSON, Parquet, ORC, Avro)
  → Define schema in Glue Data Catalog
    → Query with standard SQL in Athena
      → Pay per TB scanned

Common CloudOps use cases

Query CloudTrail logs stored in S3
Query VPC Flow Logs for network analysis
Query CloudWatch Logs exports
Query S3 Inventory reports

Example: Query CloudTrail logs

-- Find who deleted an S3 bucket
SELECT eventtime, useridentity.username, sourceipaddress
FROM cloudtrail_logs
WHERE eventsource = 's3.amazonaws.com'
  AND eventname = 'DeleteBucket'
  AND eventtime > '2026-04-01'
ORDER BY eventtime DESC;

Performance optimisation

Partitioning: partition by year/month/day to reduce data scanned
Columnar formats: use Parquet or ORC for 10x less data scanned vs CSV
Compression: GZIP, Snappy for reducing storage and scan costs
Workgroup limits: cap maximum bytes scanned to prevent expensive queries

S3 Batch Operations

Run bulk operations on millions of S3 objects at once:

Operation	Description
Copy	Copy objects to another bucket or storage class
Replace ACL	Apply new ACL to many objects
Restore from Glacier	Bulk restore
Invoke Lambda	Call a Lambda function for each object
Apply Object Lock	Bulk apply retention settings
Replicate	Replicate existing objects (replication only covers new objects by default)

# Create a batch operations job
aws s3control create-job \
  --account-id 123456789 \
  --operation '{"S3CopyObject": {"TargetBucket": "dest-bucket"}}' \
  --manifest '{"Spec": {"Format": "S3InventoryReport_CSV_20161130"}, 
    "Location": {"ObjectArn": "arn:aws:s3:::my-bucket/inventory/manifest.json"}}' \
  --report '{"Bucket": "arn:aws:s3:::my-reports-bucket", "Enabled": true}' \
  --priority 10 \
  --role-arn arn:aws:iam::123456789:role/BatchOpsRole

S3 Inventory

S3 Inventory generates CSV/ORC reports of all objects in a bucket — useful for auditing, lifecycle management planning, and Batch Operations manifests.

Schedule: daily or weekly
Destination: another S3 bucket
Format: CSV, ORC, or Parquet
Optional fields: size, last modified, storage class, replication status, encryption status

Multi-part Upload

For objects larger than 100 MB, multi-part upload is recommended:

Upload parts in parallel → faster
Resume failed uploads (retry individual parts)
Required for objects > 5 GB

# Single AWS CLI command handles multi-part automatically
aws s3 cp large-file.zip s3://my-bucket/ \
  --storage-class INTELLIGENT_TIERING

# Check for incomplete multi-part uploads (these cost money!)
aws s3api list-multipart-uploads --bucket my-bucket

# Lifecycle rule to auto-abort incomplete uploads after 7 days
# (see lifecycle rules section above)

Hands-on: S3 Cross-Region Replication for DR

Goal: Replicate prod bucket in us-east-1 to eu-west-1 for disaster recovery

1. Create source bucket: prod-data-us-east-1 (versioning: enabled)
2. Create destination bucket: prod-data-eu-west-1 (versioning: enabled)

3. Create IAM role: s3-replication-role
   Policy:
   - Allow: s3:GetObject, s3:GetObjectVersion, s3:GetObjectVersionAcl on source
   - Allow: s3:ReplicateObject, s3:ReplicateDelete on destination

4. Configure replication rule on source bucket:
   - Destination: prod-data-eu-west-1
   - IAM role: s3-replication-role
   - Enable Replication Time Control (RTC) for 15-min SLA

5. Enable Delete marker replication: Yes (for full sync)

6. For existing objects: use S3 Batch Operations with Copy to sync retroactively

7. Verify: upload a test file to source → confirm it appears in destination within 15 min

Hands-on: Create a Secure S3 Bucket

Goal: Create a private bucket with encryption, versioning, and public-access guardrails.

Open S3 > Create bucket.
Enter a globally unique bucket name such as cloudops-lab-<account-id>-<region>.
Choose the Region closest to your lab resources.
Keep Block all public access enabled.
Enable Bucket Versioning.
Set default encryption to SSE-S3 for a simple lab, or SSE-KMS if you need key-level audit and control.
Add tags such as Environment = lab and Owner = cloudops.
Create the bucket.
Upload a test file.
Open the object and confirm it is not public.
Delete the object, then use Show versions to see the delete marker.
Remove the delete marker to restore the object.

Hands-on: Add a Bucket Policy That Requires HTTPS

Goal: Deny any request that does not use TLS.

Open the bucket.
Go to Permissions > Bucket policy.
Add a deny statement like this, replacing the bucket name:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyInsecureTransport",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::cloudops-lab-bucket",
        "arn:aws:s3:::cloudops-lab-bucket/*"
      ],
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      }
    }
  ]
}

Save the policy.
Test normal HTTPS access with the AWS CLI.
Keep Block Public Access enabled unless you are intentionally building a public static website.

Hands-on: S3 Lifecycle Rule for Logs

Goal: Move old logs to cheaper storage and delete incomplete multipart uploads.

Create a prefix named logs/ by uploading a small file such as logs/test.log.
Open Management > Create lifecycle rule.
Name the rule archive-logs.
Scope it to prefix logs/.
Add transitions: after 30 days to Standard-IA, after 90 days to Glacier Flexible Retrieval.
Add expiration after 365 days if this is acceptable for the lab.
Delete incomplete multipart uploads after 7 days.
Save the rule.
Review the rule summary and confirm it applies only to logs/.

Hands-on: Query S3 Logs with Athena

Goal: Query CSV or log data in S3 without loading it into a database.

Create or choose an S3 bucket for Athena query results.
Open Athena and set the query result location.
Create a database:

CREATE DATABASE cloudops_logs;

Create an external table for a simple CSV log prefix:

CREATE EXTERNAL TABLE cloudops_logs.web_logs (
  request_time string,
  client_ip string,
  method string,
  path string,
  status int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://your-log-bucket/logs/';

Run a query:

SELECT status, count(*) AS requests
FROM cloudops_logs.web_logs
GROUP BY status
ORDER BY requests DESC;

Add partitions or convert to Parquet for real workloads to reduce scan cost.

Q: An S3 bucket must not be publicly accessible even if a developer accidentally sets a public ACL. How? Enable Block Public Access at the account level — it overrides any bucket policy or ACL that grants public access.

Q: Objects deleted from a versioned bucket are not gone — how do you permanently delete them? Deleting a versioned object creates a delete marker. To permanently delete, you must specify the version ID in the delete request, removing the specific version. To fully remove all versions, list all versions and delete each by version ID.

Q: How do you query VPC Flow Logs stored in S3 without loading them into a database? Use Amazon Athena — create an external table pointing to the S3 prefix where flow logs are stored, then run SQL queries. Partition the table by date to reduce scanned data and cost.

Q: A lifecycle rule isn't transitioning objects on time. What do you check? Check the minimum storage duration — Standard-IA has a 30-day minimum, Glacier has 90 days. Objects smaller than 128 KB are not transitioned to IA classes (not cost-effective).

What to Learn Next

AWS Advanced Storage (FSx & Storage Gateway) — hybrid and high-performance storage
AWS Security & Compliance — S3 encryption with KMS, Macie for data discovery
AWS CloudWatch Monitoring — monitor S3 with CloudWatch metrics and Athena log analysis

More in Amazon Web Services

AWS Databases for CloudOps

50 min

AWS Disaster Recovery for CloudOps

50 min

EC2 High Availability and Scalability for CloudOps

55 min

Back to Amazon Web Services