Skip to content

feat(931131): removing off domain check#4379

Merged
EsadCetiner merged 8 commits into
coreruleset:mainfrom
touchweb-vincent:patch-25
Jan 12, 2026
Merged

feat(931131): removing off domain check#4379
EsadCetiner merged 8 commits into
coreruleset:mainfrom
touchweb-vincent:patch-25

Conversation

@touchweb-vincent
Copy link
Copy Markdown
Contributor

Hello,

I don’t think we need to check for off-domain references in this rule.

Have we ever seen any false positives on this rule ? A scheme appearing inside REQUEST_FILENAME (which does not include query parameters by design) is highly suspicious and should be blocked by default in my opinion.

Based on our feedback, we could also move this to PL1, there is no false positive since years.

What do you think?

Vincent

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 9, 2025

📊 Quantitative test results for language: eng, year: 2023, size: 10K, paranoia level: 1:
🚀 Quantitative testing did not detect new false positives

Copy link
Copy Markdown
Member

@EsadCetiner EsadCetiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really see how a same origin domain URL within REQUEST_FILENAME could really be abused, and at the same time I don't see this really increasing/reducing false positives except for maybe for an specific edge case.

Is there a specific attack your observing that this change will block?

@touchweb-vincent
Copy link
Copy Markdown
Contributor Author

Not attacks, but it does block quite a bit of parasitic traffic coming from LLMs, which often crawl in a very uncontrolled way.

IMO, this change is completely harmless and can only help reduce server load caused by corrupted traffic.

@EsadCetiner
Copy link
Copy Markdown
Member

@touchweb-vincent

Not attacks, but it does block quite a bit of parasitic traffic coming from LLMs, which often crawl in a very uncontrolled way.

Not all users will want to block LLMs, in some cases it may be desirable for LLMs to crawl their site and others may not care. What do you mean by "parasitic traffic coming from LLMs"?

IMO, this change is completely harmless and can only help reduce server load caused by corrupted traffic.}

Makes sense, but what kind of traffic are we talking, do you mean LLMs?

@touchweb-vincent
Copy link
Copy Markdown
Contributor Author

touchweb-vincent commented Dec 31, 2025

Esad, I’m not talking about blocking LLM traffic. Here, we provide certain LLMs with the same network highways on our infrastructure as those granted to SERPs, given their contribution to traffic.

I’m only referring to corrupted requests issued by some LLMs, which incorrectly parse HTML and attempt to extract or crawl nonsensical elements, sometimes resulting in an unnecessary increase in server load.

For example, they encounter <a href="proxy.php?url=https%3A%2F%2Fwww.test2.fr"> in the HTML source of https://www.test.fr, and then incorrectly attempt to crawl https://www.test.fr/https://www.test2.fr

Yes i'm speaking about corrupted LLM trafic - mostly from Facebook’s LLM, which appears to be severely bugged.

@EsadCetiner
Copy link
Copy Markdown
Member

@touchweb-vincent hm... ok. Can you add an test to make sure same origin domains are blocked?

@touchweb-vincent
Copy link
Copy Markdown
Contributor Author

@EsadCetiner done

Comment thread tests/regression/tests/REQUEST-931-APPLICATION-ATTACK-RFI/931131.yaml Outdated
…31.yaml

Co-authored-by: Esad Cetiner <104706115+EsadCetiner@users.noreply.github.com>
@EsadCetiner EsadCetiner added this pull request to the merge queue Jan 12, 2026
Merged via the queue into coreruleset:main with commit 4507ef8 Jan 12, 2026
8 checks passed
@touchweb-vincent touchweb-vincent deleted the patch-25 branch January 13, 2026 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants