feat(931131): removing off domain check#4379
Conversation
|
📊 Quantitative test results for language: |
EsadCetiner
left a comment
There was a problem hiding this comment.
I can't really see how a same origin domain URL within REQUEST_FILENAME could really be abused, and at the same time I don't see this really increasing/reducing false positives except for maybe for an specific edge case.
Is there a specific attack your observing that this change will block?
|
Not attacks, but it does block quite a bit of parasitic traffic coming from LLMs, which often crawl in a very uncontrolled way. IMO, this change is completely harmless and can only help reduce server load caused by corrupted traffic. |
Not all users will want to block LLMs, in some cases it may be desirable for LLMs to crawl their site and others may not care. What do you mean by "parasitic traffic coming from LLMs"?
Makes sense, but what kind of traffic are we talking, do you mean LLMs? |
|
Esad, I’m not talking about blocking LLM traffic. Here, we provide certain LLMs with the same network highways on our infrastructure as those granted to SERPs, given their contribution to traffic. I’m only referring to corrupted requests issued by some LLMs, which incorrectly parse HTML and attempt to extract or crawl nonsensical elements, sometimes resulting in an unnecessary increase in server load. For example, they encounter Yes i'm speaking about corrupted LLM trafic - mostly from Facebook’s LLM, which appears to be severely bugged. |
|
@touchweb-vincent hm... ok. Can you add an test to make sure same origin domains are blocked? |
|
@EsadCetiner done |
…31.yaml Co-authored-by: Esad Cetiner <104706115+EsadCetiner@users.noreply.github.com>
Hello,
I don’t think we need to check for off-domain references in this rule.
Have we ever seen any false positives on this rule ? A scheme appearing inside REQUEST_FILENAME (which does not include query parameters by design) is highly suspicious and should be blocked by default in my opinion.
Based on our feedback, we could also move this to PL1, there is no false positive since years.
What do you think?
Vincent