fix: handle multi-byte UTF-8 chars in SQL special char detection#4458
Conversation
|
📊 Quantitative test results for language: |
|
Need to try using just the include and prefix and suffix. The only thing that should change is the amount of chars in the suffix 🤷 |
e5a106b to
a5ea2fd
Compare
Worked as a charm! You need to ❤️ the crs-toolchain devs! 🎸 |
There was a problem hiding this comment.
Pull request overview
Updates CRS SQLi “special character anomaly” detection to correctly handle multi-byte UTF-8 quote-like characters, preventing false positives on non‑Latin scripts while keeping behavior consistent across supported engines.
Changes:
- Refactors 5 SQLi anomaly regexes (942420/942421/942430/942431/942432) to match UTF-8 multi-byte characters via alternation (byte sequences) rather than inside character classes.
- Introduces a shared regex-assembly include (
sql-special-chars-anomaly.ra) plus composable.rasources for each of the 5 rules. - Expands regression coverage with new positive/negative cases (including non‑Latin text) for all affected rules.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/regression/tests/REQUEST-942-APPLICATION-ATTACK-SQLI/942420.yaml | Adds/updates cookie-focused regression tests, including UTF‑8 quotes and non‑Latin negative cases. |
| tests/regression/tests/REQUEST-942-APPLICATION-ATTACK-SQLI/942421.yaml | Adds/updates PL4 cookie anomaly regression tests with UTF‑8 quote and non‑Latin negatives. |
| tests/regression/tests/REQUEST-942-APPLICATION-ATTACK-SQLI/942430.yaml | Adds extensive args anomaly regression tests, including UTF‑8 quotes/acute accent and multiple non‑Latin negatives. |
| tests/regression/tests/REQUEST-942-APPLICATION-ATTACK-SQLI/942431.yaml | Adds args anomaly regression tests (incl. UTF‑8 quote) and improves existing negative array-name cases. |
| tests/regression/tests/REQUEST-942-APPLICATION-ATTACK-SQLI/942432.yaml | Adds args anomaly regression tests for UTF‑8 quotes and multiple negatives to validate FP reductions. |
| rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf | Updates the 5 rule regexes to avoid byte-by-byte matching of multi-byte UTF‑8 chars. |
| regex-assembly/include/sql-special-chars-anomaly.ra | New shared include defining ASCII and UTF‑8 special-char matching as safe alternations. |
| regex-assembly/942420.ra | New regex-assembly source for rule 942420 using the shared include. |
| regex-assembly/942421.ra | New regex-assembly source for rule 942421 using the shared include. |
| regex-assembly/942430.ra | New regex-assembly source for rule 942430 using the shared include. |
| regex-assembly/942431.ra | New regex-assembly source for rule 942431 using the shared include. |
| regex-assembly/942432.ra | New regex-assembly source for rule 942432 using the shared include. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Extract multi-byte UTF-8 characters (´ U+00B4, ' U+2018, ' U+2019) from regex character classes into alternations to prevent byte-by-byte matching that caused false positives with non-Latin scripts (Chinese, Japanese, Arabic, Korean, Hebrew). Affects rules: 942420, 942421, 942430, 942431, 942432. Creates shared include file sql-special-chars-anomaly.ra and composable .ra files for all 5 rules using named assemblies. Closes #3325
a5ea2fd to
bd801f3
Compare
what
regex-assembly/include/sql-special-chars-anomaly.rainclude file and composable.rafiles for all 5 rules using named assembliesTest plan
crs-toolchain regex compareconfirms generated regex matches for all 5 rulesrefs