- Published on
- ·18 min read
Fixing Azure WAF False Positives: A 7-Phase Diagnostic Guide
If users are suddenly getting 403 errors that weren't there yesterday, or a deployment pipeline that was green last week is now failing health checks, Azure WAF is the first place to look. A WAF false positive — where the firewall blocks a legitimate request because it pattern-matches a security rule — is one of the most common causes of unexplained production incidents in Azure environments.
The mistake most teams make is toggling WAF to Detection mode to stop the bleeding, then never fixing the root cause. This guide gives you a repeatable 7-phase process to diagnose exactly which rule fired, reproduce the block in a test environment, scope the minimum exclusion required, and validate that you haven't inadvertently opened a gap.
Symptoms This Guide Covers
- Users receiving HTTP 403 with body
The request was blocked by the web application firewallafter a code deployment - New API endpoints returning 403 immediately after launch
- File uploads failing with no client-side error (WAF drops the request silently from the client's perspective)
- Specific HTTP methods or content-types blocked inconsistently across environments
- WAF in Detection mode generates alerts but no blocks — after switching to Prevention mode, things break
- A third-party integration (webhook, payment gateway, CI runner) returns 403 that works fine from outside Azure
Understanding Azure WAF Architecture
Azure WAF runs in two deployment modes: attached to Application Gateway (regional, OSI layer 7) or attached to Azure Front Door (global edge). The rule engine is the same — OWASP Core Rule Set (CRS) plus Microsoft-managed rules — but the log schemas, exclusion configuration, and diagnostic settings differ between the two.
Key structural points to understand before diagnosing:
- WAF evaluates rules in order. A request that matches an IP reputation rule (910xxx) is blocked before OWASP CRS rules (920xxx–944xxx) are evaluated.
- In Detection mode, matched requests are logged but not blocked. Use this for initial observation but never as a permanent state.
- In Prevention mode, matched requests are blocked with a 403. This is the production state you need to return to after exclusions are in place.
- A single request can match multiple rules. The logs show all matched rules, but only the highest-priority match causes the block.
OWASP CRS Rule ID Reference
Before querying logs, knowing which rule ID range covers which attack category helps you interpret results faster.
| Rule ID range | Category | Common false positive trigger |
|---|---|---|
| 910100–910999 | IP reputation (client IP) | NAT/proxy exit IPs flagged by GeoIP lists |
| 911100–911999 | Method enforcement | Uncommon HTTP methods (PATCH, PROPFIND) in REST APIs |
| 912000–912999 | DoS protection | Burst traffic from CI load tests, batch jobs |
| 913100–913999 | Scanner detection | Security scanner headers in pen test tools |
| 920100–920999 | Protocol enforcement | Non-standard Content-Type, chunked encoding, large cookies |
| 921100–921999 | Protocol attack | HTTP request smuggling patterns in custom headers |
| 930100–930999 | Local file inclusion (LFI) | File path parameters like ../, Windows paths |
| 931100–931999 | Remote file inclusion (RFI) | URL parameters containing full http:// URIs |
| 932100–932999 | Remote code execution (RCE) | Shell characters in form fields, search queries |
| 933100–933999 | PHP injection | PHP function names in content (base64_decode, eval) |
| 941100–941999 | Cross-site scripting (XSS) | HTML in rich text editors, Markdown APIs, SVG uploads |
| 942100–942999 | SQL injection | SQL-like syntax in search fields, JWT payloads, base64-encoded data |
| 943100–943999 | Session fixation | Cookie names or parameters resembling session identifiers |
| 944100–944999 | Java attacks | Java class names in payloads, JNDI-like patterns |
Rule set versions change. The ranges above are based on OWASP CRS 3.2. If you're on DRS 2.1 (Default Rule Set, Microsoft-managed), rule IDs may differ. Always cross-reference with the actual rule ID from your logs.
Phase 1 — Collect Impact Data
Before touching any WAF configuration, establish the scope of the problem.
Questions to answer before proceeding:
- Which endpoint is being blocked? (method + URI)
- Which client IPs or user agents are affected?
- When did it start? (correlate with deployments, rule set updates, or config changes)
- Is it 100% of requests to this endpoint, or intermittent?
- Does it happen in DEV/UAT too, or only PROD?
Check for recent WAF policy changes:
# List recent WAF policy update operations in Activity Log
az monitor activity-log list \
--resource-group rg-prod \
--resource-type "Microsoft.Network/ApplicationGatewayWebApplicationFirewallPolicies" \
--start-time "$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" \
--query "[].{time: eventTimestamp, op: operationName.value, caller: caller, status: status.value}" \
--output table
If the timeline shows a WAF policy update or a rule set version upgrade coinciding with when the 403s started, that's your culprit. Skip to Phase 3.
Phase 2 — Query WAF Logs
WAF blocked-request details live in Log Analytics. The table and field names differ between Application Gateway and Front Door.
Application Gateway WAF Logs
AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayFirewallLog"
| where TimeGenerated > ago(2h)
| where action_s == "Blocked"
| project
TimeGenerated,
clientIp = clientIp_s,
requestUri = requestUri_s,
method = Message,
ruleId = ruleId_s,
ruleGroup = ruleGroup_s,
message = details_message_s,
matchedData = details_data_s,
matchedField = details_file_s
| order by TimeGenerated desc
Aggregate by rule to see the highest-volume blocks:
AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayFirewallLog"
| where TimeGenerated > ago(24h)
| where action_s == "Blocked"
| summarize
count = count(),
sampleUri = any(requestUri_s),
sampleData = any(details_data_s)
by ruleId_s, details_message_s
| order by count desc
Front Door WAF Logs
AzureDiagnostics
| where Category == "FrontdoorWebApplicationFirewallLog"
| where TimeGenerated > ago(2h)
| where action_s == "Block"
| project
TimeGenerated,
clientIp = clientIp_s,
requestUri = requestUri_s,
ruleName = ruleName_s,
action = action_s,
matchedData = details_matches_s,
policyMode = policyMode_s
| order by TimeGenerated desc
What to look for in the output:
ruleId_s(App Gateway) /ruleName_s(Front Door): the specific rule that fired. This is what you'll need in Phase 3.details_data_s/details_matches_s: the actual payload fragment that matched. This tells you exactly what the WAF found suspicious.details_file_s: which request field was inspected (REQUEST_HEADERS, REQUEST_URI, ARGS, REQUEST_BODY).requestUri_s: the path. If it's always the same endpoint, scope your exclusion to that path.
Phase 3 — Identify the Rule
Once you have the rule ID, look it up in the Microsoft documentation or the OWASP CRS GitHub repository to understand what it's designed to detect.
# List available rule sets and their rules for App Gateway WAF
az network application-gateway waf-policy managed-rule rule-set list \
--resource-group rg-prod \
--policy-name waf-policy-prod \
--output table
# Show which rules are currently active in a specific rule group
az network application-gateway waf-policy managed-rule ruleset add \
--resource-group rg-prod \
--policy-name waf-policy-prod \
--type OWASP \
--version 3.2 \
--group-name SQLI
The matched data from the log tells you exactly what triggered it. Common patterns you'll see:
| Matched data fragment | Rule triggering | Typical cause |
|---|---|---|
select, union, from | 942xxx (SQLi) | Search field with SQL-like keywords |
<script, onerror=, javascript: | 941xxx (XSS) | Rich-text editor HTML output |
../, ..\, /etc/passwd | 930xxx (LFI) | File path in URL parameter |
Authorization: Bearer eyJ... | 942xxx (SQLi) | JWT decode to base64 that resembles SQL |
multipart/form-data body with binary | 920xxx (Protocol) | File upload with unusual content type |
Class names: com.sun., java.lang. | 944xxx (Java) | Java serialization in API payload |
The JWT base64 false positive is especially common: a JWT payload decoded to base64 may contain fragments that pattern-match SQLi rules because SQL keywords are common English words that appear in data.
Phase 4 — Reproduce the False Positive
Before writing any exclusion, reproduce the block in a non-production environment. This confirms your diagnosis and gives you a test case to validate against after the exclusion is applied.
# Reproduce using curl — send the request that's failing
# Replace with your actual endpoint and headers
curl -v \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..." \
-d '{"query": "SELECT name FROM products WHERE category = ?", "params": ["electronics"]}' \
"https://api-dev.contoso.com/search"
# Expected: 403 with WAF block message
# HTTP/1.1 403 Forbidden
# Content-Type: text/html
# "The request was blocked by the web application firewall"
If DEV has WAF in Detection mode (logs but doesn't block), temporarily switch it to Prevention to reproduce:
az network application-gateway waf-policy update \
--resource-group rg-dev \
--name waf-policy-dev \
--state Enabled \
--mode Prevention
Remember to return it to Detection after testing.
If you cannot reproduce in DEV: The block may be caused by a specific payload that's hard to construct, a specific client IP in an IP reputation list, or a difference in WAF policy version between environments. In that case, check whether DEV and PROD use the same rule set version.
Phase 5 — Decide: Tune or Accept
Not every WAF block warrants an exclusion. Before writing one, answer these:
Is the matched pattern actually dangerous from this field? A
<script>tag in a JSON API body field that your application renders as a string (never as HTML) is a real false positive — the WAF is overfiring because the payload looks like XSS, but your application doesn't render it. An exclusion is warranted.Is the matched pattern dangerous from this field but your application sanitizes it? If your application sanitizes all user input before rendering (CSP header, output encoding, parameterized queries), the WAF block is defense in depth. You can exclude it, but document why your application-layer controls are sufficient.
Is the pattern actually dangerous and your application is vulnerable? If so, the WAF is doing its job. Fix the application, not the WAF rule.
Is this a developer convenience (Postman, CI runner) that shouldn't hit production endpoints? Don't tune WAF for developer tooling. Adjust your tooling instead.
Phase 6 — Configure the Exclusion
The minimum exclusion principle: scope your exclusion to the narrowest combination of rule, field, and path that resolves the false positive. Broad exclusions like "disable rule 942100 globally" weaken your posture unnecessarily.
Three-Axis Exclusion Model
Every exclusion has three axes:
| Axis | What it scopes | Examples |
|---|---|---|
| Match variable | Which part of the request to exclude from inspection | RequestHeaderNames, RequestArgNames, RequestBodyPostArgNames, RequestCookieNames |
| Selector operator | How to match the field name | Equals, StartsWith, EndsWith, Contains |
| Selector value | The specific field name to exclude | authorization, search, content |
An exclusion of RequestHeaderNames Equals authorization tells WAF: "do not inspect the authorization header against any rule." This is appropriate if the Authorization JWT triggers SQLi rules, because:
- The header is signed and tamper-evident
- Your backend validates the JWT before using it
- The token value is never rendered or executed
Azure CLI — Add an Exclusion
# Exclude the Authorization header from all rule inspection (App Gateway WAF policy)
az network application-gateway waf-policy exclusion add \
--resource-group rg-prod \
--policy-name waf-policy-prod \
--match-variable "RequestHeaderNames" \
--selector "authorization" \
--selector-match-operator "Equals"
# Verify the exclusion was applied
az network application-gateway waf-policy show \
--resource-group rg-prod \
--policy-name waf-policy-prod \
--query "managedRules.exclusions" \
--output table
For a search parameter that triggers SQLi rules:
# Exclude the 'q' query parameter from SQLi rule group inspection
az network application-gateway waf-policy exclusion add \
--resource-group rg-prod \
--policy-name waf-policy-prod \
--match-variable "RequestArgNames" \
--selector "q" \
--selector-match-operator "Equals"
Bicep/ARM — Exclusion in IaC
Always codify exclusions in infrastructure-as-code. A WAF exclusion applied only through the portal will be overwritten the next time Terraform or Bicep runs against the same policy.
resource wafPolicy 'Microsoft.Network/ApplicationGatewayWebApplicationFirewallPolicies@2023-11-01' = {
name: 'waf-policy-prod'
location: location
properties: {
managedRules: {
managedRuleSets: [
{
ruleSetType: 'OWASP'
ruleSetVersion: '3.2'
}
]
exclusions: [
{
// Exclude Authorization header from all rule inspection
// Justification: JWT tokens contain base64-encoded data that
// pattern-matches SQLi rules 942xxx. JWTs are validated server-side
// before use and never rendered as HTML.
matchVariable: 'RequestHeaderNames'
selectorMatchOperator: 'Equals'
selector: 'authorization'
}
{
// Exclude 'q' search parameter from SQLi inspection
// Justification: full-text search field accepts arbitrary user text.
// SQL injection is prevented by parameterized queries in the API layer.
matchVariable: 'RequestArgNames'
selectorMatchOperator: 'Equals'
selector: 'q'
}
]
}
policySettings: {
state: 'Enabled'
mode: 'Prevention'
requestBodyCheck: true
maxRequestBodySizeInKb: 128
fileUploadLimitInMb: 100
}
}
}
Common False Positive Patterns and Fixes
| Scenario | Triggering rule(s) | Root cause | Exclusion |
|---|---|---|---|
| JWT in Authorization header | 942100, 942200 (SQLi) | Base64 payload decodes to SQL-like fragments | RequestHeaderNames Equals authorization |
| Rich text / Markdown editor content | 941100, 941150 (XSS) | HTML tags in POST body | RequestBodyPostArgNames Equals content |
| File path parameter | 930100, 930110 (LFI) | ../ in file system traversal field | RequestArgNames Equals filepath |
| Webhook payload from third-party | 920xxx, 942xxx | Webhook body contains encoded data | RequestBodyPostArgNames Equals payload |
| Full-text search parameter | 942100, 942200 (SQLi) | SQL keywords in natural language search | RequestArgNames Equals q |
| GraphQL query body | 932xxx, 942xxx | GraphQL DSL resembles code injection | RequestBodyPostArgNames Equals query |
| OpenAPI/Swagger JSON payload | 942xxx | Schema contains SQL type names | RequestBodyPostArgNames Equals schema |
Phase 7 — Validate and Monitor
After applying the exclusion, validate in this order:
1. Re-run the reproduction request:
# The same curl command from Phase 4 should now return 200
curl -v \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..." \
-d '{"query": "SELECT name FROM products WHERE category = ?", "params": ["electronics"]}' \
"https://api-dev.contoso.com/search"
# Expected: 200 OK
2. Verify the rule no longer fires in logs:
AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayFirewallLog"
| where TimeGenerated > ago(30m)
| where requestUri_s contains "/search"
| where action_s == "Blocked"
| project TimeGenerated, ruleId_s, requestUri_s, details_data_s
If the exclusion is working, this query returns no results for requests to /search.
3. Confirm WAF is still blocking actual attacks on the excluded field:
Exclusions scope by field name, not by field value. An exclusion on RequestArgNames Equals q prevents WAF from inspecting the q parameter, but WAF still inspects every other parameter. Verify this by sending a genuine attack pattern on a different parameter:
# This should still be blocked — WAF still inspects 'category'
curl -v "https://api-dev.contoso.com/search?q=normal&category=<script>alert(1)</script>"
# Expected: 403 Blocked
4. Switch back to Prevention mode and run your full test suite:
az network application-gateway waf-policy update \
--resource-group rg-prod \
--name waf-policy-prod \
--state Enabled \
--mode Prevention
Run your integration tests or Playwright/k6 suite against the environment. Any remaining false positives will surface as 403 failures.
5. Set an alert for new WAF blocks after the change:
az monitor scheduled-query create \
--resource-group rg-prod \
--name "WAF-NewBlocksAfterTuning" \
--scopes "/subscriptions/.../resourceGroups/rg-prod/providers/microsoft.operationalinsights/workspaces/log-prod" \
--condition-query "AzureDiagnostics | where ResourceType == 'APPLICATIONGATEWAYS' | where Category == 'ApplicationGatewayFirewallLog' | where action_s == 'Blocked' | summarize count() by bin(TimeGenerated, 5m)" \
--condition-time-aggregation "Count" \
--condition-operator "GreaterThan" \
--condition-threshold 50 \
--evaluation-frequency "PT5M" \
--window-size "PT5M" \
--severity 2 \
--action-groups "/subscriptions/.../resourceGroups/rg-prod/providers/microsoft.insights/actionGroups/ag-network"
A spike in blocks after WAF tuning indicates either a new attack pattern, a deployment that introduced a new endpoint WAF hasn't been tuned for, or an exclusion that wasn't broad enough.
Deployment Practices That Prevent WAF-Induced Outages
Most WAF false positive incidents are deployment failures in disguise. These practices make them preventable:
Deploy WAF policy changes separately from application code. A combined deployment that changes both the application and the WAF policy makes it impossible to know which change caused the 403. Deploy the WAF policy first, observe logs for 30 minutes, then deploy the application.
Run WAF in Detection mode for 48 hours after a rule set version upgrade. The difference between OWASP CRS 3.1 and 3.2, or between DRS 2.0 and DRS 2.1, can introduce new rules that your application has never been tested against. Detection mode gives you visibility before Prevention mode blocks real users.
Include WAF log queries in your incident runbook. Teams that haven't diagnosed a WAF false positive before lose 30+ minutes trying to find the right Log Analytics table. Pre-write the queries and link them from your on-call runbook.
Gate deployments on WAF log checks. After deploying to a staging environment, add a build step that queries WAF logs for new blocks and fails the pipeline if any appear:
# Post-deployment WAF check — fail if new blocks appear after deploy
BLOCK_COUNT=$(az monitor log-analytics query \
--workspace "/subscriptions/.../workspaces/log-staging" \
--analytics-query "AzureDiagnostics | where Category == 'ApplicationGatewayFirewallLog' | where TimeGenerated > ago(5m) | where action_s == 'Blocked' | count" \
--query "[0].Count" \
--output tsv)
if [ "$BLOCK_COUNT" -gt "0" ]; then
echo "WAF blocked $BLOCK_COUNT requests after deployment — investigate before promoting to production"
exit 1
fi
Governance: Auditing Exclusions Over Time
WAF exclusions accumulate. An exclusion added in 2023 for a now-deprecated endpoint may still be active, silently widening the attack surface. Build a quarterly review process:
// Show all active exclusions and whether each excluded field is still receiving traffic
// Run in your Log Analytics workspace
AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayFirewallLog"
| where TimeGenerated > ago(90d)
// Get distinct request fields that WAF inspected in the last 90 days
| summarize lastSeen = max(TimeGenerated) by details_file_s
| order by lastSeen asc
Cross-reference the output against your current exclusion list. Any exclusion that covers a field or path with no traffic in the past 90 days is a candidate for removal.
When documenting an exclusion in Bicep or Terraform, always include:
- Justification: why the false positive occurs (e.g., "JWT contains base64 payload matching SQLi pattern")
- Compensating control: what other security control makes this safe (e.g., "JWT validated server-side, never rendered as HTML")
- Review date: when this exclusion should be reviewed (e.g., "review when upgrading from OWASP CRS 3.2 to 3.3")
Key Takeaways
Never disable WAF entirely to fix a false positive. Switching to Detection mode is acceptable for diagnosis, but Prevention must be restored. Disabling WAF to fix a false positive trades a user-facing 403 for unchecked inbound attacks.
The minimum exclusion principle. Scope every exclusion to the narrowest combination of match variable, selector, and path.
RequestArgNames Equals qis better thanRequestArgNames StartsWith any.Reproduce before excluding. If you can't reproduce the false positive in a test environment with the same request, you don't fully understand what WAF is reacting to. An exclusion based on guesswork may miss the actual trigger or exclude too broadly.
Codify exclusions in IaC immediately. A WAF exclusion applied only through the portal will be lost on the next Terraform/Bicep apply. The exclusion exists to allow legitimate traffic — losing it means users get blocked again unexpectedly.
Rule set upgrades need their own deployment window. Upgrading from CRS 3.1 to 3.2 or from DRS 2.0 to 2.1 is not a zero-risk change. Treat it as a deployment that needs Detection mode observation and a rollback plan.
Log retention matters. WAF logs are critical for post-incident diagnosis. Set Log Analytics retention to at least 30 days. If you're in a regulated industry, extend to 90+ days and consider exporting to storage for long-term archival.