fix(942410): cleaning of duplicates with 942151#4336
Conversation
for more information, see https://pre-commit.ci
|
📊 Quantitative test results for language: |
|
I wonder why we don't use the same regexp on this one 🤔 |
|
You’re right, there are duplicate sequences - they should probably be cleaned up. I’ll take a look at this as soon as possible. This rule contains more high false-positive risk sequences (including “mod”) https://www.diffchecker.com/fr/1ZnlZDrj/ |
for more information, see https://pre-commit.ci
|
I think there’s another typo in “cr32”; it should be “crc32” in MySQL. But maybe “cr32” exists in a database system I’m not thinking of (like Oracle or PostgreSQL). |
for more information, see https://pre-commit.ci
|
@fzipi i clean up duplicate - if you want check : i fix another typo on last_insert_id in #4337 - reason why you will not find it in this PR. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
I had to reorganize the unit tests. Sorry in advance for the headache it’ll give to whoever does the review… |
|
@touchweb-vincent Can you fix conflicts? |
|
@fzipi done |
fzipi
left a comment
There was a problem hiding this comment.
This PR cleans up duplicate SQL function names between rules 942151 (PL1) and 942410 (PL2), removing 174 duplicate functions from 942410 while maintaining all detection capabilities.
Rule Architecture
Rule 942151 (Paranoia Level 1)
- Formula:
include/sql-injection-function-names.raMINUSexclude/sql-injection-function-names-fps-pl1.ra - Contains: 268 SQL functions
- Excludes: 10 high-FP functions (
convert,left,lower,position,likelihood,unlikely, etc.)
Rule 942410 (Paranoia Level 2)
- Purpose: Detect high-FP SQL functions that are too risky for PL1
- Before: 239 functions (171 duplicates + 68 unique)
- After: 67 unique functions
Changes Made
✓ Removes Duplicates (174 functions)
- Removed 171 SQL functions already covered by rule 942151
- Removed 3 obsolete entries (typos like
cr32instead ofcrc32)
✓ Adds Missing Functions (2 functions)
likelihoodandunlikely(were in exclude list but missing from old 942410)
✓ Test Redistribution (50 tests)
- Moved 50 tests from
942410.yaml→942151.yaml - Tests for functions that now belong to 942151
- Perfect match: 50 removed = 50 added
✓ Documentation Improvements
- Added comment to
942410.raexplaining it's an extended version with high-FP sequences - Added comment to
exclude/sql-injection-function-names-fps-pl1.ranoting additions should also go to942410.ra
Final State: 942410 Now Contains 67 Functions
Category 1: From Exclude List (10 functions)
High FP at PL1, checked at PL2:
convert, degrees, elt, left, likelihood, lower, position, quarter, space, unlikely
Category 2: Other Unique Functions (57 functions)
Very common words that would cause massive FPs at PL1:
abs, acos, avg, bin, cast, char, charset, chr, count, date, day, default,
field, floor, format, hour, if, in, is, last, length, ln, local, log, max,
min, minute, mod, month, now, password, pi, power, rawtonhex, rawtonhextoraw,
repeat, replace, reverse, right, round, second, sign, sleep, stddev, sum,
tan, time, to_char, to_days, to_nchar, to_seconds, upper, user, values,
version, week, year
Rationale: These are legitimate SQL functions but also extremely common English words/programming terms.
Validation Results
| Check | Status | Details |
|---|---|---|
| Logic | ✅ | No detection capabilities lost |
| Duplicates | ✅ | All duplicates properly removed |
| PL1/PL2 separation | ✅ | Maintains correct paranoia level distinction |
| Tests | ✅ | 50 tests cleanly migrated, none lost |
| Documentation | ✅ | Clear explanations added |
| False positives | ✅ | Quantitative testing passed (no new FPs) |
Benefits
- Eliminates 172 lines of duplicate regex patterns - smaller compiled rules
- Improves maintainability - clear separation of concerns between PL1/PL2
- Better documentation - explains relationship between files
- Prevents future duplication - guidance for contributors
- Maintains functionality - all existing detections still work
Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>
Hello,
This PR initially started as a simple typo fix, but then @fzipi pointed out a duplication that didn’t make much sense either.
As a result, the sequences from regex-assembly/942410.ra that were present in regex-assembly/include/sql-injection-function-names.ra but missing from regex-assembly/exclude/sql-injection-function-names-fps-pl1.ra have been removed.
The unit tests for rule 94210, which were no longer relevant, have been redistributed under the unit tests for 942151.
Two sequences (likelihood and unlikely) that were present in regex-assembly/exclude/sql-injection-function-names-fps-pl1.ra but missing from regex-assembly/942410.ra have been added back to the set.
Comments have also been added at the top of the unit test files to explain how to handle false positives across the different rules.