Monitor and Maintain Azure Resources — Azure Monitor (AZ-104)

IntermediateTopic50 min12 min read26 Apr 2026Azure

AZ-104 — Monitor and Maintain Azure Resources. Azure Monitor, Log Analytics, alerts, metrics, Application Insights, Azure Backup, and Site Recovery with step-by-step labs.

What you'll learn

Configure Azure Monitor metrics and diagnostic settings
Create and query a Log Analytics workspace
Set up metric and log-based alerts with action groups
Monitor VM performance with VM Insights
Configure Azure Backup for VMs and Azure Files
Set up Azure Site Recovery for disaster recovery

Prerequisites

azure/fundamentals azure/creating-and-managing-virtual-machines-part-1

Relevant for certifications

AZ-104

#azure #az-104 #azure-monitor #log-analytics #alerts #metrics #backup #site-recovery

Azure Monitor Overview

Azure Monitor is the unified observability platform for Azure — it collects metrics, logs, and traces from Azure resources, VMs, applications, and hybrid infrastructure.

Azure Resources
  → Platform metrics (automatic, no config)
  → Diagnostic logs (enable per resource)
  → Activity Log (control plane operations)
  → VM guest metrics (requires Azure Monitor Agent)

All data flows into:
  → Metrics database (real-time, 93-day retention)
  → Log Analytics workspace (long-term, queryable with KQL)

Azure Monitor Metrics

Metrics are numerical time-series data collected from Azure resources automatically.

Key metric sources

Source	Examples	Retention
Platform metrics	VM CPU, Storage IOPS, Network bytes	93 days
Guest OS metrics	Memory, disk, process count (requires Azure Monitor Agent)	93 days
Custom metrics	Application-specific (via SDK or REST)	93 days

Metrics Explorer

Access via Azure Monitor → Metrics or resource blade → Metrics.

1. Select scope: Resource group or specific resource
2. Select metric namespace: Virtual Machine Host
3. Select metric: Percentage CPU
4. Add filter: Virtual Machine = myVM01
5. Split by: None (or split to compare multiple VMs)
6. Set aggregation: Average
7. Pin to dashboard

Multi-dimensional metrics

Most metrics support dimensions for filtering and splitting:

Storage Account metric: Transactions
Dimensions:
  - ApiName (GetBlob, PutBlob, etc.)
  - ResponseType (Success, ServerError, etc.)
  - GeoType (Primary, Secondary)

Filter: ApiName = "GetBlob" AND ResponseType = "ServerError"
→ Get only failed GetBlob operations

Diagnostic Settings

Diagnostic settings export resource logs and metrics to a storage destination.

Enable diagnostic settings (Portal)

Resource → Monitoring → Diagnostic settings → Add diagnostic setting

Name: myvm-diagnostics
Logs:
  ✓ Administrative
  ✓ Security
  ✓ ServiceHealth
Metrics:
  ✓ AllMetrics

Send to:
  ✓ Log Analytics workspace: my-workspace
  ✓ Storage account: mylogstorage (archive)

Enable via Azure CLI

# Get the resource ID
VM_ID=$(az vm show -g myRG -n myVM --query id -o tsv)
WORKSPACE_ID=$(az monitor log-analytics workspace show -g myRG -n myWorkspace --query id -o tsv)

# Enable diagnostic settings
az monitor diagnostic-settings create \
  --name "vm-diagnostics" \
  --resource $VM_ID \
  --workspace $WORKSPACE_ID \
  --metrics '[{"category":"AllMetrics","enabled":true}]' \
  --logs '[{"category":"Administrative","enabled":true}]'

Log Analytics Workspace

Log Analytics stores and queries logs using Kusto Query Language (KQL).

Create a workspace

az monitor log-analytics workspace create \
  --resource-group myRG \
  --workspace-name myWorkspace \
  --location eastus \
  --sku PerGB2018     # Pay-per-GB (recommended)

Essential KQL syntax

// Basic query: last 1 hour of heartbeats
Heartbeat
| where TimeGenerated > ago(1h)
| summarize count() by Computer

// Find failed logins on a VM (Windows Security events)
SecurityEvent
| where EventID == 4625
| where TimeGenerated > ago(24h)
| summarize FailedLogins = count() by Account, Computer
| where FailedLogins > 5
| order by FailedLogins desc

// VM performance: CPU over 90%
Perf
| where ObjectName == "Processor"
| where CounterName == "% Processor Time"
| where CounterValue > 90
| where TimeGenerated > ago(1h)
| project TimeGenerated, Computer, CounterValue
| order by CounterValue desc

// Available memory less than 500 MB
Perf
| where ObjectName == "Memory"
| where CounterName == "Available MBytes"
| where CounterValue < 500
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 5m)

// Azure Activity Log: who deleted a resource?
AzureActivity
| where OperationNameValue contains "DELETE"
| where ActivityStatusValue == "Succeeded"
| where TimeGenerated > ago(7d)
| project TimeGenerated, Caller, OperationNameValue, ResourceGroup

Log Analytics data retention

# Set retention to 90 days (default is 30)
az monitor log-analytics workspace update \
  --resource-group myRG \
  --workspace-name myWorkspace \
  --retention-time 90

Azure Monitor Agent & VM Insights

Azure Monitor Agent (AMA)

Replaces the legacy MMA (Microsoft Monitoring Agent) and OMS agent. Collects guest OS metrics and logs.

# Install AMA extension on a VM
az vm extension set \
  --resource-group myRG \
  --vm-name myVM \
  --name AzureMonitorLinuxAgent \
  --publisher Microsoft.Azure.Monitor \
  --version 1.0

Data Collection Rules (DCR)

DCRs define what data the AMA collects and where to send it:

# Create a DCR to collect Syslog from Linux VMs
az monitor data-collection rule create \
  --resource-group myRG \
  --name my-linux-dcr \
  --location eastus \
  --data-flows '[{
    "streams": ["Microsoft-Syslog"],
    "destinations": ["myWorkspace"]
  }]' \
  --data-sources '{
    "syslog": [{
      "name": "syslogDataSource",
      "streams": ["Microsoft-Syslog"],
      "facilityNames": ["auth","authpriv","cron","daemon","kern"],
      "logLevels": ["Warning","Error","Critical","Alert","Emergency"]
    }]
  }' \
  --destinations '{
    "logAnalytics": [{
      "name": "myWorkspace",
      "workspaceResourceId": "/subscriptions/.../workspaces/myWorkspace"
    }]
  }'

VM Insights

Enable VM Insights for pre-built performance and dependency monitoring:

Azure Monitor → Virtual Machines → Enable
→ Installs AMA automatically
→ Pre-built workbooks: Performance, Map (network connections)
→ Available in: Azure Monitor → Workbooks → VM Insights Performance

Alerts

Alert types

Type	Based on	Example
Metric alert	Metric threshold	CPU > 85% for 5 min
Log alert	KQL query result	Heartbeat missed for 5 min
Activity Log alert	Control plane events	VM stopped, NSG modified
Service Health alert	Azure service events	Storage outage in East US
Smart detection	ML anomaly detection	Response time spike

Action Groups

Action groups define what happens when an alert fires:

# Create an action group
az monitor action-group create \
  --resource-group myRG \
  --name ops-alerts \
  --short-name opsalerts \
  --action email ops-team ops@company.com \
  --action sms +447700900000

# Create a metric alert using the action group
az monitor metrics alert create \
  --resource-group myRG \
  --name "High CPU Alert" \
  --scopes $(az vm show -g myRG -n myVM --query id -o tsv) \
  --condition "avg Percentage CPU > 85" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action ops-alerts

Log alert (KQL-based)

az monitor scheduled-query create \
  --resource-group myRG \
  --name "VM Heartbeat Lost" \
  --scopes /subscriptions/.../resourceGroups/myRG/providers/Microsoft.OperationalInsights/workspaces/myWorkspace \
  --condition-query "Heartbeat | where Computer == 'myVM' | summarize LastHB = max(TimeGenerated) | where LastHB < ago(5m)" \
  --condition-threshold 0 \
  --condition-operator GreaterThan \
  --evaluation-frequency 5m \
  --window-size 5m \
  --action-groups /subscriptions/.../actionGroups/ops-alerts \
  --severity 1

Activity Log alert — detect NSG changes

az monitor activity-log alert create \
  --resource-group myRG \
  --name "NSG Modified" \
  --scopes /subscriptions/xxx \
  --condition category=Administrative and operationName=Microsoft.Network/networkSecurityGroups/write \
  --action-group ops-alerts

Application Insights

Application Insights provides APM (Application Performance Monitoring) for web apps:

Request rates, response times, failure rates
Dependency tracking (calls to SQL, Redis, external APIs)
User behaviour analytics
Custom telemetry
Live Metrics stream (real-time)
Smart Detection (ML-based anomaly detection)

# Create Application Insights resource
az monitor app-insights component create \
  --resource-group myRG \
  --app myAppInsights \
  --location eastus \
  --workspace /subscriptions/.../workspaces/myWorkspace   # workspace-based (recommended)

Connect to your app using the Instrumentation Key or Connection String (preferred):

# Python Flask example
from applicationinsights.flask.ext import AppInsights
app.config['APPINSIGHTS_INSTRUMENTATIONKEY'] = 'your-key'
appinsights = AppInsights(app)

Azure Backup

Azure Backup is a cloud-native backup service supporting VMs, SQL in VMs, Azure Files, SAP HANA, and more.

Recovery Services Vault

The central container for backup data:

# Create Recovery Services Vault
az backup vault create \
  --resource-group myRG \
  --name myBackupVault \
  --location eastus

Backup a VM

# Enable backup for a VM (default policy = daily backup, 30-day retention)
az backup protection enable-for-vm \
  --resource-group myRG \
  --vault-name myBackupVault \
  --vm myVM \
  --policy-name DefaultPolicy

# Trigger an on-demand backup
az backup protection backup-now \
  --resource-group myRG \
  --vault-name myBackupVault \
  --container-name myVM \
  --item-name myVM \
  --backup-management-type AzureIaasVM \
  --retain-until 2026-05-26

Backup policies

# Create a custom policy
az backup policy create \
  --resource-group myRG \
  --vault-name myBackupVault \
  --name "30DayRetention" \
  --policy '{
    "schedulePolicy": {
      "schedulePolicyType": "SimpleSchedulePolicy",
      "scheduleRunFrequency": "Daily",
      "scheduleRunTimes": ["2026-01-01T02:00:00.000Z"]
    },
    "retentionPolicy": {
      "retentionPolicyType": "LongTermRetentionPolicy",
      "dailySchedule": {
        "retentionTimes": ["2026-01-01T02:00:00.000Z"],
        "retentionDuration": {"count": 30, "durationType": "Days"}
      }
    },
    "backupManagementType": "AzureIaasVM"
  }'

Restore a VM

# List recovery points
az backup recoverypoint list \
  --resource-group myRG \
  --vault-name myBackupVault \
  --container-name myVM \
  --item-name myVM \
  --backup-management-type AzureIaasVM

# Restore to a new VM
az backup restore restore-azurevm \
  --vault-name myBackupVault \
  --resource-group myRG \
  --container-name myVM \
  --item-name myVM \
  --rp-name <recovery-point-id> \
  --restore-mode CreateNewConfig \
  --target-resource-group restore-rg \
  --target-virtual-machine-name myVM-restored

Soft delete

Azure Backup has soft delete enabled by default — deleted backup data is retained for 14 days, preventing accidental data loss. Disable only if absolutely required.

Azure Site Recovery (ASR)

Azure Site Recovery provides business continuity and disaster recovery by replicating VMs to a secondary Azure region.

Enable replication for a VM

Portal: VM → Disaster Recovery → Target region → Enable replication

This:
1. Creates a Recovery Services Vault in target region
2. Installs the Mobility Service on the VM
3. Begins replication (initial sync: full copy)
4. Ongoing replication: continuous (RPO typically < 30 sec)

# Enable replication via CLI
az site-recovery protected-item create \
  --resource-group myRG \
  --vault-name myASRVault \
  --fabric-name "Azure" \
  --protection-container-name myContainer \
  --name myVM \
  --policy-id /subscriptions/.../replicationPolicies/24-hour-retention

Failover and failback

Test failover (non-disruptive):
  → Spins up a test VM in target region in isolated network
  → Verify app works
  → Clean up test failover

Planned failover (zero data loss):
  → Gracefully shut down primary VM
  → Wait for all data to replicate
  → Bring up VM in secondary region
  → Update DNS (Azure Traffic Manager or manual)

Unplanned failover (disaster scenario):
  → Primary region unavailable
  → Failover to secondary with minimal RPO
  → Data loss possible (uncommitted transactions)

Failback:
  → After primary region recovers
  → Replicate back from secondary to primary
  → Test failback → commit

Recovery Plans

Recovery plans orchestrate the failover order of multiple VMs, including pre/post scripts:

Recovery Plan: "Ecommerce-DR-Plan"
  Group 1: Database VMs (fail over first)
  Group 2: App VMs (wait 5 min after group 1)
  Group 3: Web VMs
  Pre-script: notify on-call team
  Post-script: run health check URL

Hands-on: End-to-End Monitoring Setup

Goal: Full observability for a production VM.

1. Create Log Analytics Workspace:
   az monitor log-analytics workspace create -g myRG -n myWorkspace --sku PerGB2018

2. Enable VM Insights:
   Azure Monitor → Virtual Machines → myVM → Enable
   (installs AMA, enables performance + map)

3. Configure diagnostic settings:
   VM → Monitoring → Diagnostic settings → Send to Log Analytics

4. Create action group:
   Azure Monitor → Alerts → Action groups → Create
   Add: email (ops@company.com), SMS, Azure Function (auto-remediate)

5. Create alerts:
   a. CPU > 85% for 5 min → action group
   b. Available Memory < 512 MB → action group
   c. Heartbeat missed 5 min → action group (VM down)
   d. Activity Log: VM deallocated → action group

6. Create dashboard:
   Azure Monitor → Dashboards → New
   Pin: CPU chart, Memory chart, Disk I/O chart, Alert count

7. Set up backup:
   VM → Backup → Create vault → DefaultPolicy
   Trigger initial backup: az backup protection backup-now ...

8. Enable Site Recovery for DR:
   VM → Disaster Recovery → Target: West Europe → Enable replication
   Run a test failover quarterly to validate DR

Hands-on: Activity Log Alert for a Deleted Resource

Goal: Alert when someone deletes a resource group or critical resource.

Open Azure Monitor > Alerts > Create > Alert rule.
Scope the alert to the subscription or a specific resource group.
Choose signal type Activity Log.
Filter for operation name such as Delete Resource Group or a specific provider delete operation.
Create or select an action group with your email.
Name the alert az104-delete-alert.
Save the rule.
Delete a non-critical test resource.
Confirm the alert fires and notification is received.

Hands-on: Query VM Heartbeat in Log Analytics

Open your Log Analytics workspace.
Go to Logs.
Run:

Heartbeat
| summarize LastHeartbeat=max(TimeGenerated) by Computer
| order by LastHeartbeat desc

Stop a test VM or disconnect its agent.
Run the query again after several minutes.
Create a log search alert for missing heartbeat if this is a production pattern.

Q: What KQL query finds VMs that haven't sent a heartbeat in the last 10 minutes?

Heartbeat
| summarize LastHB = max(TimeGenerated) by Computer
| where LastHB < ago(10m)

Q: You need to be alerted when someone deletes a Key Vault. What alert type? An Activity Log alert — filter on category = Administrative and operationName = Microsoft.KeyVault/vaults/delete.

Q: How does Azure Site Recovery achieve its RPO? ASR continuously replicates data to the target region using the Mobility Service (installed on the source VM). Data is replicated asynchronously with crash-consistent recovery points every 5 minutes and app-consistent points every 1-12 hours (configurable). Typical RPO is under 30 seconds for crash-consistent and under 1 hour for app-consistent.

Q: A VM was accidentally deleted. What do you need to restore it? An Azure Backup recovery point in a Recovery Services Vault. If soft delete is enabled, the recovery point is retained for 14 days after deletion. Restore using az backup restore restore-azurevm or via the portal.

What to Learn Next

Azure Identity & Governance (AZ-104) — RBAC, policies, and compliance
Creating and Managing Virtual Machines — VM management fundamentals
Azure Networking Cheatsheet — network architecture reference

More in Microsoft Azure

AZ-104 Hands-on Lab Guide

5 hours

AZ-305: Azure Solutions Architect Expert — Study Guide

10-12 weeks

AZ-400: Azure DevOps Engineer Expert — Study Guide

8-10 weeks

Back to Microsoft Azure