JS-1706 Fix S5868 false positive for regex fragment boundaries by sonar-nigel[bot] · Pull Request #7124 · SonarSource/SonarJS

sonar-nigel · 2026-05-29T10:24:12Z

Relates to JS-1706

S5868 now avoids reporting Unicode grapheme sequences that are created only by concatenating separately sourced regex fragments into a dynamic character class. The fix tracks fragment seams in reconstructed RegExp constructor patterns and suppresses only reports that cross a real non-literal fragment boundary.

Keeps reporting grapheme-like sequences that are fully contained in one fragment, including identifier-resolved fragments.
Keeps reporting fully static regex sources split across literal-only concatenation.
Narrows seam matching to exact regexpp character boundaries so split escapes are still analyzed correctly.

Tests cover dynamically concatenated regex character classes where Unicode adjacency is created across fragment boundaries. They verify that seam-crossing combining characters, surrogate pairs, emoji modifiers, regional indicators, and ZWJ sequences are accepted while same-fragment grapheme clusters still raise issues. Relates to JS-1706

Track concatenation fragment boundaries while extracting regex patterns and use them in S5868 to suppress Unicode grapheme reports created only across dynamic RegExp string joins. Reports for suspicious Unicode sequences that remain within a single source fragment are still raised, following the approved Jira implementation guidelines. Relates to JS-1706.

ESLint 9 flat-config verification no longer calls the custom pragma patch hooks, so inline configs and disable comments using ESLint rule aliases were not mapped to sonarjs/Sxxxx rule ids before linting. Patch directive comments before verification so alias-based inline configs keep Sonar defaults and disable directives suppress the intended wrapped rules. Update the linter expectation for prefer-const severity-only inline config to match the preserved Sonar defaults.

Remove stale imports from the split linter test files so the bridge compile no longer fails with TS6133 unused-import errors. No SonarQube issues or security hotspots were present in the local scan inputs, and no NOSONAR suppressions were added.

sonar-nigel · 2026-05-29T10:24:14Z

Rule Profile

Field	Value
Rule	`S5868` — Unicode Grapheme Clusters should be avoided inside regex character classes
Severity / type	Major BUG
Sonar Way	Yes
Scope	Main

"cc̈d̈d".replace(/[c̈d̈]/g, "X"); // result is "XXXXXX" and not expected "cXXd"

False Positive Pattern

The false positives happen when a RegExp constructor builds a character class through concatenation, for example by adding [ and ] around variables that contain Unicode ranges or escapes. The flattened final pattern can place adjacent code points across source-fragment boundaries, but that adjacency is an artifact of dynamic regex construction rather than a grapheme cluster written inside one character-class fragment.

var nonASCIIidentifierStart = new RegExp("[" + nonASCIIidentifierStartChars + "]");
var nonASCIIidentifier = new RegExp("[" + nonASCIIidentifierStartChars + nonASCIIidentifierChars + "]");

False Negative Risk

The risk is bounded because suppression applies only when the suspicious adjacent regexpp characters straddle an exact recorded seam from a non-literal fragment boundary. Same-fragment Unicode sequences are still reported, including when the fragment comes through an identifier, and literal-only concatenations remain fully static so they are still checked.

const c = "👍";
const p = "[" + c + "]";
new RegExp(p);

Fix Summary

Regex pattern extraction now carries seam metadata alongside the flattened pattern and flags, plus an isPureLiteral marker to distinguish readability-only literal concatenation from dynamic fragment construction. S5868 consults those seams before reporting combined characters, surrogate pairs, modified emoji, regional indicators, and ZWJ sequences.

Records concatenation seams only when at least one side is not a pure inline literal.
Suppresses only reports crossing the exact boundary between adjacent parsed regex characters.
Adds regression coverage for valid dynamic-fragment cases and invalid same-fragment, split-escape, and literal-only concatenation cases.

github-actions · 2026-05-29T10:35:23Z

Ruling Report

✅ No changes to ruling expected issues in this PR

Replace the ES2023 Array.prototype.findLast usage in S5868 with an explicit reverse ancestor scan so the rule compiles under the ESLint Plugin TypeScript target while preserving the same RegExp constructor lookup behavior.

francois-mora-sonarsource · 2026-05-29T15:08:36Z

The new seam suppression is broader than the intended fix. spansSeam() currently treats any concatenation seam between charA.start and charB.start as if it were a boundary between the two parsed regex characters, but that also matches seams that land inside the raw text of charA.

For example:

const part1 = '\\uD83';
const part2 = 'D\\uDC4D';
new RegExp('[' + part1 + part2 + ']');

Here the concatenation happens inside the escape \uD83D, not between \uD83D and \uDC4D. The final regex is still [\uD83D\uDC4D], so S5868 should still report it, but the current predicate suppresses it.

Could you narrow the seam check to the exact boundary between the adjacent regexpp characters instead of any seam before charB.start? Since seams are tracked in pattern space and regexpp offsets include the opening slash, the aligned case is seam + 1 === charB.start (equivalently seam === charA.end - 1, with charA.end === charB.start).

It would also be good to add an invalid regression test for the escape-split example above, so we keep suppressing only true fragment-boundary cases.

Comment: The new seam suppression is broader than the intended fix. `spansSeam()` currently treats any concatenation seam between `charA.start` and `charB.start` as if it were a boundary between the two parsed regex characters, but that also matches seams that land inside the raw text of `charA`. For example: ```js const part1 = '\\uD83'; const part2 = 'D\\uDC4D'; new RegExp('[' + part1 + part2 + ']'); ``` Here the concatenation happens inside the escape `\uD83D`, not between `\uD83D` and `\uDC4D`. The final regex is still `[\uD83D\uDC4D]`, so S5868 should still report it, but the current predicate suppresses it. Could you narrow the seam check to the exact boundary between the adjacent `regexpp` characters instead of any seam before `charB.start`? Since seams are tracked in pattern space and `regexpp` offsets include the opening slash, the aligned case is `seam + 1 === charB.start` (equivalently `seam === charA.end - 1`, with `charA.end === charB.start`). It would also be good to add an `invalid` regression test for the escape-split example above, so we keep suppressing only true fragment-boundary cases.

The S5868 unit test for a RegExp constructor built from split string parts was missing the expected unicode-flag suggestion. ESLint RuleTester requires test errors to declare suggestions when the rule emits them, so add the expected suggestion output for that case.

francois-mora-sonarsource · 2026-05-29T16:33:54Z

Improvement suggestion: the seam metadata should not treat every static + boundary as a boundary where S5868 must suppress adjacency checks.

Literal-only concatenation is still a fully static regex source. For example, new RegExp('[A' + '\\u0301]') is equivalent to new RegExp('[A\\u0301]'), so S5868 should still report the Unicode grapheme cluster inside the character class. The same applies to split surrogate literals such as new RegExp('[\\uD83D' + '\\uDC4D]').

A concrete fix would be to track whether an extracted pattern came from a pure inline literal expression:

type ExtractedRegexPattern = {
  pattern: string;
  flags: string;
  seams: number[];
  isPureLiteral: boolean;
};

Then literals, static templates, and String.raw templates return isPureLiteral: true; identifier resolution returns the resolved pattern with isPureLiteral: false; and binary + only records the current split offset when at least one side is not pure literal:

const splitSeams = left.isPureLiteral && right.isPureLiteral ? [] : [splitOffset];

return {
  pattern: left.pattern + right.pattern,
  flags: '',
  seams: [...left.seams, ...splitSeams, ...right.seams.map(s => s + splitOffset)],
  isPureLiteral: left.isPureLiteral && right.isPureLiteral,
};

This keeps the false-positive fix for variable/fragment boundaries, while preserving true positives for hard-coded regexes split only for readability. Regression tests should cover both sides: keep the variable-fragment cases valid, and add invalid cases for new RegExp('[A' + '\\u0301]') and new RegExp('[\\uD83D' + '\\uDC4D]').

Comment: Improvement suggestion: the seam metadata should not treat every static `+` boundary as a boundary where S5868 must suppress adjacency checks. Literal-only concatenation is still a fully static regex source. For example, `new RegExp('[A' + '\\u0301]')` is equivalent to `new RegExp('[A\\u0301]')`, so S5868 should still report the Unicode grapheme cluster inside the character class. The same applies to split surrogate literals such as `new RegExp('[\\uD83D' + '\\uDC4D]')`. A concrete fix would be to track whether an extracted pattern came from a pure inline literal expression: ```ts type ExtractedRegexPattern = { pattern: string; flags: string; seams: number[]; isPureLiteral: boolean; }; ``` Then literals, static templates, and `String.raw` templates return `isPureLiteral: true`; identifier resolution returns the resolved pattern with `isPureLiteral: false`; and binary `+` only records the current split offset when at least one side is not pure literal: ```ts const splitSeams = left.isPureLiteral && right.isPureLiteral ? [] : [splitOffset]; return { pattern: left.pattern + right.pattern, flags: '', seams: [...left.seams, ...splitSeams, ...right.seams.map(s => s + splitOffset)], isPureLiteral: left.isPureLiteral && right.isPureLiteral, }; ``` This keeps the false-positive fix for variable/fragment boundaries, while preserving true positives for hard-coded regexes split only for readability. Regression tests should cover both sides: keep the variable-fragment cases valid, and add invalid cases for `new RegExp('[A' + '\\u0301]')` and `new RegExp('[\\uD83D' + '\\uDC4D]')`.

sonarqube-next · 2026-05-29T16:56:53Z

Quality Gate passed

Issues
0 New issues
10 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
0 Dependency risks
94.6% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

Vibe Bot added 4 commits May 29, 2026 08:59

sonar-nigel Bot temporarily deployed to sca-checking May 29, 2026 10:24 Inactive

Fix ESLint Plugin build issues

5433692

Replace the ES2023 Array.prototype.findLast usage in S5868 with an explicit reverse ancestor scan so the rule compiles under the ESLint Plugin TypeScript target while preserving the same RegExp constructor lookup behavior.

sonar-nigel Bot temporarily deployed to sca-checking May 29, 2026 11:41 Inactive

sonar-nigel Bot marked this pull request as ready for review May 29, 2026 12:20

sonar-nigel Bot requested a review from francois-mora-sonarsource May 29, 2026 12:20

JS-1706 Remove unrelated linter changes

aee78f0

francois-mora-sonarsource temporarily deployed to sca-checking May 29, 2026 13:16 — with GitHub Actions Inactive

Vibe Bot added 2 commits May 29, 2026 15:37

sonar-nigel Bot temporarily deployed to sca-checking May 29, 2026 15:40 Inactive

sonar-nigel Bot temporarily deployed to sca-checking May 29, 2026 16:43 Inactive

francois-mora-sonarsource requested review from a team and removed request for francois-mora-sonarsource May 29, 2026 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JS-1706 Fix S5868 false positive for regex fragment boundaries#7124

JS-1706 Fix S5868 false positive for regex fragment boundaries#7124
sonar-nigel[bot] wants to merge 9 commits into
masterfrom
fix/JS-1706-fix-fp-on-s5868-unicode-in-variables-concatenated-into-regex-character-classes-gpt-5.5

sonar-nigel Bot commented May 29, 2026 •

edited by atlassian Bot

Loading

Uh oh!

sonar-nigel Bot commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

francois-mora-sonarsource commented May 29, 2026

Uh oh!

francois-mora-sonarsource commented May 29, 2026

Uh oh!

sonarqube-next Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sonar-nigel Bot commented May 29, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonar-nigel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rule Profile

False Positive Pattern

False Negative Risk

Fix Summary

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ruling Report

Uh oh!

francois-mora-sonarsource commented May 29, 2026

Uh oh!

francois-mora-sonarsource commented May 29, 2026

Uh oh!

sonarqube-next Bot commented May 29, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sonar-nigel Bot commented May 29, 2026 •

edited by atlassian Bot

Loading

sonar-nigel Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading