Amazon S3 for CloudOps — Storage, Security & Data Management
S3 operations and governance for CloudOps. Versioning, replication, lifecycle policies, security, event notifications, Athena queries, and storage classes — all tested on SOA-C03.
What you'll learn
- Manage S3 versioning, MFA delete, and replication
- Design lifecycle policies to automate storage class transitions
- Configure bucket policies, ACLs, and access controls
- Use S3 Event Notifications to trigger downstream processing
- Query S3 data with Athena
- Perform bulk operations with S3 Batch Operations
Prerequisites
Relevant for certifications
S3 Storage Classes
Choose the right storage class based on access frequency and retrieval requirements:
| Class | Use Case | Retrieval | Min Duration | Cost |
|---|---|---|---|---|
| S3 Standard | Frequently accessed | Milliseconds | None | Highest |
| S3 Standard-IA | Infrequent access, rapid retrieval | Milliseconds | 30 days | Lower storage, retrieval fee |
| S3 One Zone-IA | Infrequent, re-creatable data | Milliseconds | 30 days | 20% cheaper than IA |
| S3 Intelligent-Tiering | Unknown or changing access patterns | Milliseconds | None | Monitoring fee per object |
| S3 Glacier Instant | Archive, occasional access | Milliseconds | 90 days | Low |
| S3 Glacier Flexible | Archive, minutes-to-hours retrieval | 1–12 hours | 90 days | Very low |
| S3 Glacier Deep Archive | Long-term archive, annual access | 12–48 hours | 180 days | Lowest |
Intelligent-Tiering
Intelligent-Tiering automatically moves objects between access tiers. Add the Deep Archive tier to extend to 180-day archives at near-zero cost. Ideal for unpredictable workloads.
S3 Versioning
Versioning keeps all versions of an object in a bucket — protecting against accidental deletion and overwrites.
Enable versioning on a bucket → all uploads create a new version
Delete an object → creates a delete marker (object not actually removed)
Restore deleted object → delete the delete marker
Permanently delete → specify version ID in delete request
Key facts
- Once versioning is enabled, it cannot be disabled — only suspended
- Suspended versioning still keeps existing versions; new objects get version
null - Costs: you're billed for every version stored
# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Enabled
# List all versions
aws s3api list-object-versions --bucket my-bucket
# Restore a deleted object (remove the delete marker)
aws s3api delete-object \
--bucket my-bucket \
--key myfile.txt \
--version-id <delete-marker-version-id>
MFA Delete
MFA Delete adds an extra protection layer — it requires MFA authentication to:
- Permanently delete a versioned object
- Suspend versioning
# Enable MFA Delete (requires root account + MFA device)
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Enabled,MFADelete=Enabled \
--mfa "arn:aws:iam::123456789:mfa/root-account-mfa-device 123456"
Warning
Only the root account can enable/disable MFA Delete. It cannot be set via IAM users — even admins.
S3 Replication
Cross-Region Replication (CRR) and Same-Region Replication (SRR) automatically copy objects between buckets.
Requirements
- Versioning must be enabled on both source and destination buckets
- IAM role with permissions to read source and write destination
- Replication only applies to new objects after replication is enabled (not existing objects)
Use cases
| Type | Use case |
|---|---|
| CRR | Disaster recovery, latency reduction for global users, compliance (data residency) |
| SRR | Log aggregation, test/prod sync in same region, audit copies |
# Enable replication
aws s3api put-bucket-replication \
--bucket source-bucket \
--replication-configuration '{
"Role": "arn:aws:iam::123456789:role/s3-replication-role",
"Rules": [{
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Destination": {
"Bucket": "arn:aws:s3:::destination-bucket",
"StorageClass": "STANDARD_IA"
}
}]
}'
Replication advanced features
| Feature | Description |
|---|---|
| Replication Time Control (RTC) | 99.99% of objects replicated within 15 minutes — with SLA |
| Cross-account replication | Requires bucket policy on destination to allow source account |
| Bidirectional replication | Configure replication in both directions; be careful of loops |
| Replica modification sync | Sync metadata changes after replication |
| Delete marker replication | Optionally replicate delete markers (disabled by default) |
S3 Lifecycle Rules
Lifecycle rules automate transitioning objects to cheaper storage classes and deleting old objects.
Transition actions
S3 Standard
→ after 30 days → S3 Standard-IA
→ after 90 days → S3 Glacier Instant
→ after 180 days → S3 Glacier Deep Archive
→ after 365 days → DELETE
Expiration actions
# Example lifecycle configuration
Rules:
- ID: "move-logs-to-archive"
Status: Enabled
Filter:
Prefix: "logs/"
Transitions:
- Days: 30
StorageClass: STANDARD_IA
- Days: 90
StorageClass: GLACIER
Expiration:
Days: 365
- ID: "clean-incomplete-multipart"
Status: Enabled
AbortIncompleteMultipartUpload:
DaysAfterInitiation: 7
- ID: "delete-old-versions"
Status: Enabled
NoncurrentVersionExpiration:
NoncurrentDays: 30
S3 Analytics
Use S3 Analytics to analyse access patterns before creating lifecycle rules — it suggests optimal transition points based on actual usage data. Takes 24–48 hours to populate.
S3 Event Notifications
Trigger downstream processing when objects are created, deleted, or restored.
Event types
| Event | Trigger |
|---|---|
s3:ObjectCreated:* | Any upload (PutObject, PostObject, Copy, CompleteMultipartUpload) |
s3:ObjectRemoved:* | Delete or delete marker creation |
s3:ObjectRestore:* | Glacier restore initiated/completed |
s3:Replication:* | Replication failure events |
Targets
- SNS — fan-out to multiple subscribers
- SQS — queue for async processing
- Lambda — serverless processing triggered by uploads
{
"LambdaFunctionConfigurations": [{
"LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789:function:process-upload",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [{"Name": "suffix", "Value": ".jpg"}]
}
}
}]
}
EventBridge integration
Alternatively, route S3 events through Amazon EventBridge for more flexible routing, filtering, and targeting. EventBridge supports 20+ target types vs the 3 native S3 notification targets.
S3 Security
Bucket Policies
Bucket policies are JSON-based resource policies attached to a bucket — they control access for any AWS principal:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPublicRead",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-website-bucket/*"
},
{
"Sid": "DenyNonSSL",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"],
"Condition": {
"Bool": {"aws:SecureTransport": "false"}
}
}
]
}
Block Public Access
The Block Public Access setting (at account or bucket level) is the guardrail that prevents any public access regardless of bucket policies:
# Block all public access at account level
aws s3control put-public-access-block \
--account-id 123456789 \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,\
BlockPublicPolicy=true,RestrictPublicBuckets=true
S3 Access Logs
Enable server access logging to capture all requests to a bucket:
aws s3api put-bucket-logging \
--bucket my-bucket \
--bucket-logging-status '{
"LoggingEnabled": {
"TargetBucket": "my-access-logs-bucket",
"TargetPrefix": "my-bucket-logs/"
}
}'
Warning
Don't enable access logs on the same bucket you're logging to — it will create an infinite logging loop.
S3 Object Lock & Glacier Vault Lock
| Feature | Description | Use case |
|---|---|---|
| Object Lock — Governance mode | Prevent deletion unless user has special IAM permission | Controlled protection |
| Object Lock — Compliance mode | No one (including root) can delete before retention period | Regulatory compliance (SEC, FINRA) |
| S3 Glacier Vault Lock | Lock a Glacier vault policy permanently | WORM (Write Once Read Many) archiving |
IAM Access Analyzer for S3
Automatically reviews bucket policies and ACLs to identify buckets shared publicly or cross-account — surfaced in the S3 console as findings.
Amazon Athena
Athena is a serverless interactive query service that lets you run SQL directly on data stored in S3.
Data in S3 (CSV, JSON, Parquet, ORC, Avro)
→ Define schema in Glue Data Catalog
→ Query with standard SQL in Athena
→ Pay per TB scanned
Common CloudOps use cases
- Query CloudTrail logs stored in S3
- Query VPC Flow Logs for network analysis
- Query CloudWatch Logs exports
- Query S3 Inventory reports
Example: Query CloudTrail logs
-- Find who deleted an S3 bucket
SELECT eventtime, useridentity.username, sourceipaddress
FROM cloudtrail_logs
WHERE eventsource = 's3.amazonaws.com'
AND eventname = 'DeleteBucket'
AND eventtime > '2026-04-01'
ORDER BY eventtime DESC;
Performance optimisation
- Partitioning: partition by
year/month/dayto reduce data scanned - Columnar formats: use Parquet or ORC for 10x less data scanned vs CSV
- Compression: GZIP, Snappy for reducing storage and scan costs
- Workgroup limits: cap maximum bytes scanned to prevent expensive queries
S3 Batch Operations
Run bulk operations on millions of S3 objects at once:
| Operation | Description |
|---|---|
| Copy | Copy objects to another bucket or storage class |
| Replace ACL | Apply new ACL to many objects |
| Restore from Glacier | Bulk restore |
| Invoke Lambda | Call a Lambda function for each object |
| Apply Object Lock | Bulk apply retention settings |
| Replicate | Replicate existing objects (replication only covers new objects by default) |
# Create a batch operations job
aws s3control create-job \
--account-id 123456789 \
--operation '{"S3CopyObject": {"TargetBucket": "dest-bucket"}}' \
--manifest '{"Spec": {"Format": "S3InventoryReport_CSV_20161130"},
"Location": {"ObjectArn": "arn:aws:s3:::my-bucket/inventory/manifest.json"}}' \
--report '{"Bucket": "arn:aws:s3:::my-reports-bucket", "Enabled": true}' \
--priority 10 \
--role-arn arn:aws:iam::123456789:role/BatchOpsRole
S3 Inventory
S3 Inventory generates CSV/ORC reports of all objects in a bucket — useful for auditing, lifecycle management planning, and Batch Operations manifests.
Schedule: daily or weekly
Destination: another S3 bucket
Format: CSV, ORC, or Parquet
Optional fields: size, last modified, storage class, replication status, encryption status
Multi-part Upload
For objects larger than 100 MB, multi-part upload is recommended:
- Upload parts in parallel → faster
- Resume failed uploads (retry individual parts)
- Required for objects > 5 GB
# Single AWS CLI command handles multi-part automatically
aws s3 cp large-file.zip s3://my-bucket/ \
--storage-class INTELLIGENT_TIERING
# Check for incomplete multi-part uploads (these cost money!)
aws s3api list-multipart-uploads --bucket my-bucket
# Lifecycle rule to auto-abort incomplete uploads after 7 days
# (see lifecycle rules section above)
Hands-on: S3 Cross-Region Replication for DR
Goal: Replicate prod bucket in us-east-1 to eu-west-1 for disaster recovery
1. Create source bucket: prod-data-us-east-1 (versioning: enabled)
2. Create destination bucket: prod-data-eu-west-1 (versioning: enabled)
3. Create IAM role: s3-replication-role
Policy:
- Allow: s3:GetObject, s3:GetObjectVersion, s3:GetObjectVersionAcl on source
- Allow: s3:ReplicateObject, s3:ReplicateDelete on destination
4. Configure replication rule on source bucket:
- Destination: prod-data-eu-west-1
- IAM role: s3-replication-role
- Enable Replication Time Control (RTC) for 15-min SLA
5. Enable Delete marker replication: Yes (for full sync)
6. For existing objects: use S3 Batch Operations with Copy to sync retroactively
7. Verify: upload a test file to source → confirm it appears in destination within 15 min
Hands-on: Create a Secure S3 Bucket
Goal: Create a private bucket with encryption, versioning, and public-access guardrails.
- Open S3 > Create bucket.
- Enter a globally unique bucket name such as
cloudops-lab-<account-id>-<region>. - Choose the Region closest to your lab resources.
- Keep Block all public access enabled.
- Enable Bucket Versioning.
- Set default encryption to SSE-S3 for a simple lab, or SSE-KMS if you need key-level audit and control.
- Add tags such as
Environment = labandOwner = cloudops. - Create the bucket.
- Upload a test file.
- Open the object and confirm it is not public.
- Delete the object, then use Show versions to see the delete marker.
- Remove the delete marker to restore the object.
Hands-on: Add a Bucket Policy That Requires HTTPS
Goal: Deny any request that does not use TLS.
- Open the bucket.
- Go to Permissions > Bucket policy.
- Add a deny statement like this, replacing the bucket name:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyInsecureTransport",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::cloudops-lab-bucket",
"arn:aws:s3:::cloudops-lab-bucket/*"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}
]
}
- Save the policy.
- Test normal HTTPS access with the AWS CLI.
- Keep Block Public Access enabled unless you are intentionally building a public static website.
Hands-on: S3 Lifecycle Rule for Logs
Goal: Move old logs to cheaper storage and delete incomplete multipart uploads.
- Create a prefix named
logs/by uploading a small file such aslogs/test.log. - Open Management > Create lifecycle rule.
- Name the rule
archive-logs. - Scope it to prefix
logs/. - Add transitions: after 30 days to Standard-IA, after 90 days to Glacier Flexible Retrieval.
- Add expiration after 365 days if this is acceptable for the lab.
- Delete incomplete multipart uploads after 7 days.
- Save the rule.
- Review the rule summary and confirm it applies only to
logs/.
Hands-on: Query S3 Logs with Athena
Goal: Query CSV or log data in S3 without loading it into a database.
- Create or choose an S3 bucket for Athena query results.
- Open Athena and set the query result location.
- Create a database:
CREATE DATABASE cloudops_logs;
- Create an external table for a simple CSV log prefix:
CREATE EXTERNAL TABLE cloudops_logs.web_logs (
request_time string,
client_ip string,
method string,
path string,
status int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://your-log-bucket/logs/';
- Run a query:
SELECT status, count(*) AS requests
FROM cloudops_logs.web_logs
GROUP BY status
ORDER BY requests DESC;
- Add partitions or convert to Parquet for real workloads to reduce scan cost.
Common SOA-C03 Exam Questions
Q: An S3 bucket must not be publicly accessible even if a developer accidentally sets a public ACL. How? Enable Block Public Access at the account level — it overrides any bucket policy or ACL that grants public access.
Q: Objects deleted from a versioned bucket are not gone — how do you permanently delete them? Deleting a versioned object creates a delete marker. To permanently delete, you must specify the version ID in the delete request, removing the specific version. To fully remove all versions, list all versions and delete each by version ID.
Q: How do you query VPC Flow Logs stored in S3 without loading them into a database? Use Amazon Athena — create an external table pointing to the S3 prefix where flow logs are stored, then run SQL queries. Partition the table by date to reduce scanned data and cost.
Q: A lifecycle rule isn't transitioning objects on time. What do you check? Check the minimum storage duration — Standard-IA has a 30-day minimum, Glacier has 90 days. Objects smaller than 128 KB are not transitioned to IA classes (not cost-effective).
What to Learn Next
- AWS Advanced Storage (FSx & Storage Gateway) — hybrid and high-performance storage
- AWS Security & Compliance — S3 encryption with KMS, Macie for data discovery
- AWS CloudWatch Monitoring — monitor S3 with CloudWatch metrics and Athena log analysis
