Troubleshooting Azure Load Balancers

IntermediateTopic20 min6 min readAzure

AZ-104 notes: Troubleshooting Azure Load Balancers. Covers key concepts for the Azure Administrator Associate exam.

  • This lesson focuses on how to systematically troubleshoot issues in Azure Load Balancer.

Azure Load Balancer troubleshooting usually falls into 3 main categories:

  • Configuration Issues
  • Connectivity Issues
  • Performance Issues
  • Because Load Balancer is a Layer 4 service, troubleshooting is mostly about validating configuration, network flow, and health probes.

1️⃣ Configuration Issues

These are the most common problems.

A. Frontend IP Configuration

Check:

  • Is it Public or Internal?
  • Is the public IP assigned?
  • Is the IP associated with the correct load balancer?
  • Correct region?

Common issue:

  • Missing or incorrect frontend IP
  • Wrong port exposed

B. Backend Pool Configuration

Verify:

  • Correct VMs or NICs added?
  • Correct VNet?
  • Instances healthy?
  • If backend VM is not in the pool → no traffic will reach it.

C. Load Balancing Rules

Check:

  • Protocol (TCP/UDP)
  • Frontend port
  • Backend port
  • Associated health probe
  • If ports don’t match application port → traffic fails.

Example from transcript:

  • SSH service runs on port 22
  • NAT rule backend port was incorrectly set to 1000
  • Result: SSH connection failed
  • Fix: Change backend port to 22
  • This is a classic misconfiguration scenario.

D. Health Probes

  • Health probes determine if backend instances receive traffic.

If probe fails:

  • Instance removed from rotation
  • Load balancer sends traffic to other healthy instances

Check:

  • Correct protocol (TCP/HTTP/HTTPS)
  • Correct port
  • Application listening on that port
  • NSG allows probe traffic
  • Probe failure = no traffic distribution.

2️⃣ Connectivity Issues

  • These issues occur when traffic cannot reach the backend VMs.

Connectivity troubleshooting should follow this path:

  • Client → Frontend IP → Load Balancer → Backend VM → Application

A. Network Security Groups (NSGs)

Check:

  • Is port allowed inbound?
  • Is port allowed outbound?
  • Are health probe ports allowed?
  • NSG misconfiguration is a top cause of failure.

B. Firewall or NVA

  • If chained through: Azure Gateway Load Balancer

Then:

  • Firewall rules
  • Deep packet inspection
  • Route tables
  • May block traffic.

C. Outbound Connectivity Problems

Example from transcript:

VM attempted:

  • sudo apt install nginx

It failed because:

  • No public IP
  • No NAT Gateway
  • No outbound rules
  • Default outbound access deprecated

Azure provides 4 outbound options:

  • Public IP on VM
  • Load Balancer outbound rules
  • NAT Gateway (recommended)
  • Default outbound (deprecated)

Best Practice for Production

  • Use: Azure NAT Gateway
  • Why?
  • Prevents SNAT port exhaustion
  • Scales better
  • Separates inbound and outbound traffic
  • In demo: Public IP was added to VM → outbound worked.

3️⃣ Performance Issues

  • Performance troubleshooting involves monitoring.
  • Azure Load Balancer itself rarely becomes bottleneck — backend pool usually does.

Common performance issues:

  • Too few backend instances
  • CPU/memory exhaustion on VMs
  • SNAT port exhaustion
  • Uneven flow distribution
  • Service health issues

Monitoring & Diagnostics Tools

Under Load Balancer → Monitoring:

A. Metrics

Important metrics:

  • Byte count
  • Data path availability
  • SNAT connections
  • Used SNAT ports
  • Flows count

Use metrics to detect:

  • Throughput bottlenecks
  • High SNAT usage
  • Traffic spikes

B. Diagnostic Settings

Send logs to:

  • Log Analytics
  • Storage Account
  • Event Hub

Enables:

  • Long-term analysis
  • Advanced queries

C. Log Analytics Queries

You can query:

  • LoadBalancerAlertEvent
  • LoadBalancerProbeHealthStatus
  • LoadBalancerRuleCounter
  • Requires Log Analytics workspace.

D. Load Balancer Insights

Shows:

  • Backend pool health
  • Rule mapping
  • Flow distribution
  • VM availability
  • Data throughput trends

Useful for:

  • Identifying traffic imbalance
  • Checking health probe failure rates
  • Understanding backend saturation

Deep Understanding: How Azure Load Balancer Works Internally

Azure Load Balancer uses:

5-tuple hash:

  • Source IP
  • Source Port
  • Destination IP
  • Destination Port
  • Protocol
  • This determines traffic distribution.

It does NOT:

  • Inspect HTTP headers
  • Perform SSL termination
  • Do URL routing
  • That’s Layer 7 functionality.

SNAT & Port Exhaustion Explained

When backend VMs make outbound connections:

  • Load Balancer performs SNAT
  • Each outbound connection consumes a port
  • Limited ephemeral port range

If exhausted:

  • New outbound connections fail
  • Apps hang or timeout

Solution:

  • Use NAT Gateway
  • Increase backend instances
  • Optimize connection reuse

Step-by-Step Troubleshooting Checklist

Step 1: Validate Configuration

  • Frontend IP exists
  • Backend pool correct
  • Rule ports match app ports
  • Health probe configured correctly

Step 2: Check Health Probes

  • Healthy?
  • Port correct?
  • App listening?
  • NSG allows probe?

Step 3: Validate Connectivity

  • NSG rules
  • UDRs
  • Firewall/NVA
  • Outbound configuration

Step 4: Check Metrics

  • SNAT usage
  • Flow distribution
  • Byte count
  • Backend health

Step 5: Scale if Needed

  • Add more backend VMs
  • Use VM Scale Sets
  • Add NAT Gateway

Reference Documentation

  • Azure Load Balancer Overview
  • Troubleshoot Azure Load Balancer
  • Outbound connections
  • Health probes
  • SNAT and port exhaustion
  • NAT Gateway
  • Load Balancer Insights

Interview-Ready Key Points

  • Q: Most common Load Balancer issue? → Port mismatch or NSG blocking traffic.
  • Q: Why does health probe matter? → If probe fails, backend receives no traffic.
  • Q: Why avoid default outbound access? → Deprecated and unreliable.
  • Q: Best outbound solution for production? → NAT Gateway.
  • Q: How to detect SNAT exhaustion? → Monitor SNAT ports used metric.

Key Takeaways

  • ✔ Troubleshooting is mainly configuration validation ✔ Always verify ports match actual application ports ✔ Health probe is critical ✔ NSGs and outbound configuration are common blockers ✔ Use monitoring + insights for performance diagnostics ✔ NAT Gateway is best practice for outbound traffic

If you’d like, I can now convert this into:

  • 🔎 50 Interview Q&A format
  • 📊 PowerPoint slides (deep-dive version)
  • 🧠 Flashcards for certification
  • 🏗 Real-world troubleshooting lab scenario
  • 📄 One-page printable cheat sheet
  • Tell me your preferred format.

More in Microsoft Azure