Troubleshooting Azure Load Balancers
AZ-104 notes: Troubleshooting Azure Load Balancers. Covers key concepts for the Azure Administrator Associate exam.
- This lesson focuses on how to systematically troubleshoot issues in Azure Load Balancer.
Azure Load Balancer troubleshooting usually falls into 3 main categories:
- Configuration Issues
- Connectivity Issues
- Performance Issues
- Because Load Balancer is a Layer 4 service, troubleshooting is mostly about validating configuration, network flow, and health probes.
1️⃣ Configuration Issues
These are the most common problems.
A. Frontend IP Configuration
Check:
- Is it Public or Internal?
- Is the public IP assigned?
- Is the IP associated with the correct load balancer?
- Correct region?
Common issue:
- Missing or incorrect frontend IP
- Wrong port exposed
B. Backend Pool Configuration
Verify:
- Correct VMs or NICs added?
- Correct VNet?
- Instances healthy?
- If backend VM is not in the pool → no traffic will reach it.
C. Load Balancing Rules
Check:
- Protocol (TCP/UDP)
- Frontend port
- Backend port
- Associated health probe
- If ports don’t match application port → traffic fails.
Example from transcript:
- SSH service runs on port 22
- NAT rule backend port was incorrectly set to 1000
- Result: SSH connection failed
- Fix: Change backend port to 22
- This is a classic misconfiguration scenario.
D. Health Probes
- Health probes determine if backend instances receive traffic.
If probe fails:
- Instance removed from rotation
- Load balancer sends traffic to other healthy instances
Check:
- Correct protocol (TCP/HTTP/HTTPS)
- Correct port
- Application listening on that port
- NSG allows probe traffic
- Probe failure = no traffic distribution.
2️⃣ Connectivity Issues
- These issues occur when traffic cannot reach the backend VMs.
Connectivity troubleshooting should follow this path:
- Client → Frontend IP → Load Balancer → Backend VM → Application
A. Network Security Groups (NSGs)
Check:
- Is port allowed inbound?
- Is port allowed outbound?
- Are health probe ports allowed?
- NSG misconfiguration is a top cause of failure.
B. Firewall or NVA
- If chained through: Azure Gateway Load Balancer
Then:
- Firewall rules
- Deep packet inspection
- Route tables
- May block traffic.
C. Outbound Connectivity Problems
Example from transcript:
VM attempted:
- sudo apt install nginx
It failed because:
- No public IP
- No NAT Gateway
- No outbound rules
- Default outbound access deprecated
Azure provides 4 outbound options:
- Public IP on VM
- Load Balancer outbound rules
- NAT Gateway (recommended)
- Default outbound (deprecated)
Best Practice for Production
- Use: Azure NAT Gateway
- Why?
- Prevents SNAT port exhaustion
- Scales better
- Separates inbound and outbound traffic
- In demo: Public IP was added to VM → outbound worked.
3️⃣ Performance Issues
- Performance troubleshooting involves monitoring.
- Azure Load Balancer itself rarely becomes bottleneck — backend pool usually does.
Common performance issues:
- Too few backend instances
- CPU/memory exhaustion on VMs
- SNAT port exhaustion
- Uneven flow distribution
- Service health issues
Monitoring & Diagnostics Tools
Under Load Balancer → Monitoring:
A. Metrics
Important metrics:
- Byte count
- Data path availability
- SNAT connections
- Used SNAT ports
- Flows count
Use metrics to detect:
- Throughput bottlenecks
- High SNAT usage
- Traffic spikes
B. Diagnostic Settings
Send logs to:
- Log Analytics
- Storage Account
- Event Hub
Enables:
- Long-term analysis
- Advanced queries
C. Log Analytics Queries
You can query:
- LoadBalancerAlertEvent
- LoadBalancerProbeHealthStatus
- LoadBalancerRuleCounter
- Requires Log Analytics workspace.
D. Load Balancer Insights
Shows:
- Backend pool health
- Rule mapping
- Flow distribution
- VM availability
- Data throughput trends
Useful for:
- Identifying traffic imbalance
- Checking health probe failure rates
- Understanding backend saturation
Deep Understanding: How Azure Load Balancer Works Internally
Azure Load Balancer uses:
5-tuple hash:
- Source IP
- Source Port
- Destination IP
- Destination Port
- Protocol
- This determines traffic distribution.
It does NOT:
- Inspect HTTP headers
- Perform SSL termination
- Do URL routing
- That’s Layer 7 functionality.
SNAT & Port Exhaustion Explained
When backend VMs make outbound connections:
- Load Balancer performs SNAT
- Each outbound connection consumes a port
- Limited ephemeral port range
If exhausted:
- New outbound connections fail
- Apps hang or timeout
Solution:
- Use NAT Gateway
- Increase backend instances
- Optimize connection reuse
Step-by-Step Troubleshooting Checklist
Step 1: Validate Configuration
- Frontend IP exists
- Backend pool correct
- Rule ports match app ports
- Health probe configured correctly
Step 2: Check Health Probes
- Healthy?
- Port correct?
- App listening?
- NSG allows probe?
Step 3: Validate Connectivity
- NSG rules
- UDRs
- Firewall/NVA
- Outbound configuration
Step 4: Check Metrics
- SNAT usage
- Flow distribution
- Byte count
- Backend health
Step 5: Scale if Needed
- Add more backend VMs
- Use VM Scale Sets
- Add NAT Gateway
Reference Documentation
- Azure Load Balancer Overview
- Troubleshoot Azure Load Balancer
- Outbound connections
- Health probes
- SNAT and port exhaustion
- NAT Gateway
- Load Balancer Insights
Interview-Ready Key Points
- Q: Most common Load Balancer issue? → Port mismatch or NSG blocking traffic.
- Q: Why does health probe matter? → If probe fails, backend receives no traffic.
- Q: Why avoid default outbound access? → Deprecated and unreliable.
- Q: Best outbound solution for production? → NAT Gateway.
- Q: How to detect SNAT exhaustion? → Monitor SNAT ports used metric.
Key Takeaways
- ✔ Troubleshooting is mainly configuration validation ✔ Always verify ports match actual application ports ✔ Health probe is critical ✔ NSGs and outbound configuration are common blockers ✔ Use monitoring + insights for performance diagnostics ✔ NAT Gateway is best practice for outbound traffic
If you’d like, I can now convert this into:
- 🔎 50 Interview Q&A format
- 📊 PowerPoint slides (deep-dive version)
- 🧠 Flashcards for certification
- 🏗 Real-world troubleshooting lab scenario
- 📄 One-page printable cheat sheet
- Tell me your preferred format.
