JS-1706 Fix S5868 false positive for regex fragment boundaries#7124
Conversation
Tests cover dynamically concatenated regex character classes where Unicode adjacency is created across fragment boundaries. They verify that seam-crossing combining characters, surrogate pairs, emoji modifiers, regional indicators, and ZWJ sequences are accepted while same-fragment grapheme clusters still raise issues. Relates to JS-1706
Track concatenation fragment boundaries while extracting regex patterns and use them in S5868 to suppress Unicode grapheme reports created only across dynamic RegExp string joins. Reports for suspicious Unicode sequences that remain within a single source fragment are still raised, following the approved Jira implementation guidelines. Relates to JS-1706.
ESLint 9 flat-config verification no longer calls the custom pragma patch hooks, so inline configs and disable comments using ESLint rule aliases were not mapped to sonarjs/Sxxxx rule ids before linting. Patch directive comments before verification so alias-based inline configs keep Sonar defaults and disable directives suppress the intended wrapped rules. Update the linter expectation for prefer-const severity-only inline config to match the preserved Sonar defaults.
Remove stale imports from the split linter test files so the bridge compile no longer fails with TS6133 unused-import errors. No SonarQube issues or security hotspots were present in the local scan inputs, and no NOSONAR suppressions were added.
Rule Profile
"cc̈d̈d".replace(/[c̈d̈]/g, "X"); // result is "XXXXXX" and not expected "cXXd"False Positive PatternThe false positives happen when a var nonASCIIidentifierStart = new RegExp("[" + nonASCIIidentifierStartChars + "]");
var nonASCIIidentifier = new RegExp("[" + nonASCIIidentifierStartChars + nonASCIIidentifierChars + "]");False Negative RiskThe risk is bounded because suppression applies only when the suspicious adjacent const c = "👍";
const p = "[" + c + "]";
new RegExp(p);Fix SummaryRegex pattern extraction now carries seam metadata alongside the flattened pattern and flags, plus an
|
Ruling Report✅ No changes to ruling expected issues in this PR |
Replace the ES2023 Array.prototype.findLast usage in S5868 with an explicit reverse ancestor scan so the rule compiles under the ESLint Plugin TypeScript target while preserving the same RegExp constructor lookup behavior.
|
The new seam suppression is broader than the intended fix. For example: const part1 = '\\uD83';
const part2 = 'D\\uDC4D';
new RegExp('[' + part1 + part2 + ']');Here the concatenation happens inside the escape Could you narrow the seam check to the exact boundary between the adjacent It would also be good to add an |
Comment: The new seam suppression is broader than the intended fix. `spansSeam()` currently treats any concatenation seam between `charA.start` and `charB.start` as if it were a boundary between the two parsed regex characters, but that also matches seams that land inside the raw text of `charA`.
For example:
```js
const part1 = '\\uD83';
const part2 = 'D\\uDC4D';
new RegExp('[' + part1 + part2 + ']');
```
Here the concatenation happens inside the escape `\uD83D`, not between `\uD83D` and `\uDC4D`. The final regex is still `[\uD83D\uDC4D]`, so S5868 should still report it, but the current predicate suppresses it.
Could you narrow the seam check to the exact boundary between the adjacent `regexpp` characters instead of any seam before `charB.start`? Since seams are tracked in pattern space and `regexpp` offsets include the opening slash, the aligned case is `seam + 1 === charB.start` (equivalently `seam === charA.end - 1`, with `charA.end === charB.start`).
It would also be good to add an `invalid` regression test for the escape-split example above, so we keep suppressing only true fragment-boundary cases.
The S5868 unit test for a RegExp constructor built from split string parts was missing the expected unicode-flag suggestion. ESLint RuleTester requires test errors to declare suggestions when the rule emits them, so add the expected suggestion output for that case.
|
Improvement suggestion: the seam metadata should not treat every static Literal-only concatenation is still a fully static regex source. For example, A concrete fix would be to track whether an extracted pattern came from a pure inline literal expression: type ExtractedRegexPattern = {
pattern: string;
flags: string;
seams: number[];
isPureLiteral: boolean;
};Then literals, static templates, and const splitSeams = left.isPureLiteral && right.isPureLiteral ? [] : [splitOffset];
return {
pattern: left.pattern + right.pattern,
flags: '',
seams: [...left.seams, ...splitSeams, ...right.seams.map(s => s + splitOffset)],
isPureLiteral: left.isPureLiteral && right.isPureLiteral,
};This keeps the false-positive fix for variable/fragment boundaries, while preserving true positives for hard-coded regexes split only for readability. Regression tests should cover both sides: keep the variable-fragment cases valid, and add invalid cases for |
Comment: Improvement suggestion: the seam metadata should not treat every static `+` boundary as a boundary where S5868 must suppress adjacency checks. Literal-only concatenation is still a fully static regex source. For example, `new RegExp('[A' + '\\u0301]')` is equivalent to `new RegExp('[A\\u0301]')`, so S5868 should still report the Unicode grapheme cluster inside the character class. The same applies to split surrogate literals such as `new RegExp('[\\uD83D' + '\\uDC4D]')`. A concrete fix would be to track whether an extracted pattern came from a pure inline literal expression: ```ts type ExtractedRegexPattern = { pattern: string; flags: string; seams: number[]; isPureLiteral: boolean; }; ``` Then literals, static templates, and `String.raw` templates return `isPureLiteral: true`; identifier resolution returns the resolved pattern with `isPureLiteral: false`; and binary `+` only records the current split offset when at least one side is not pure literal: ```ts const splitSeams = left.isPureLiteral && right.isPureLiteral ? [] : [splitOffset]; return { pattern: left.pattern + right.pattern, flags: '', seams: [...left.seams, ...splitSeams, ...right.seams.map(s => s + splitOffset)], isPureLiteral: left.isPureLiteral && right.isPureLiteral, }; ``` This keeps the false-positive fix for variable/fragment boundaries, while preserving true positives for hard-coded regexes split only for readability. Regression tests should cover both sides: keep the variable-fragment cases valid, and add invalid cases for `new RegExp('[A' + '\\u0301]')` and `new RegExp('[\\uD83D' + '\\uDC4D]')`.
|




Relates to JS-1706
S5868 now avoids reporting Unicode grapheme sequences that are created only by concatenating separately sourced regex fragments into a dynamic character class. The fix tracks fragment seams in reconstructed
RegExpconstructor patterns and suppresses only reports that cross a real non-literal fragment boundary.regexppcharacter boundaries so split escapes are still analyzed correctly.