Skip to content

feat: update restricted files and file extensions#4299

Merged
EsadCetiner merged 14 commits into
coreruleset:mainfrom
EsadCetiner:feat-update-restricted-files
Jan 26, 2026
Merged

feat: update restricted files and file extensions#4299
EsadCetiner merged 14 commits into
coreruleset:mainfrom
EsadCetiner:feat-update-restricted-files

Conversation

@EsadCetiner
Copy link
Copy Markdown
Member

@EsadCetiner EsadCetiner commented Oct 21, 2025

This PR was compiled from multiple sources:

I've removed entries that were not relevant for web applications to keep false positives to a minimum, and commented out ones that cause false positives.

These rules should be overhauled in the medium to long term, but this should be fine for now. I've opened an issue in #4300 to discuss further improvements to these rules.

closes #4248

@EsadCetiner EsadCetiner added the release:new-detection In this PR we introduce a new detection label Oct 21, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 21, 2025

📊 Quantitative test results for language: eng, year: 2023, size: 10K, paranoia level: 1:
🚀 Quantitative testing did not detect new false positives

Comment thread rules/lfi-os-files.data Outdated
Comment thread rules/lfi-os-files.data Outdated
azurit
azurit previously approved these changes Oct 24, 2025
Copy link
Copy Markdown
Member

@azurit azurit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but it is a bigger PR so i suggest also someone else looks at it.

@touchweb-vincent
Copy link
Copy Markdown
Contributor

A heartfelt thought for all the Ms. and Mr. PACs who will no longer be able to register on sites using email addresses like XX.pac@

This file is one of those I plan to propose splitting, if you agree with the approach detailed in the other PR.

Believe it or not, we’ve actually seen some “Mister Boto” accounts on the E-commerce sites we manage.

Indeed, you can say that OWASP seniors will create proper exclusions, since the scope of these rules is far from arguments such as email addresses.

However, the vast majority of hosting professionals will simply disable the rule in a hurry after receiving a telling-off from their e-commerce clients - and never re-enable it again

@EsadCetiner
Copy link
Copy Markdown
Member Author

@touchweb-vincent Thanks for picking that up, turns out .pac doesn't have any security impact for web apps so I've removed it.

@touchweb-vincent
Copy link
Copy Markdown
Contributor

touchweb-vincent commented Oct 25, 2025

Thanks.

However, the issue remains the same with all sequences of fewer than 5 characters that match the pattern

^[\.\w]{1,4}$

And this dataset is also unusable as-is in e-commerce contexts because of base64/high entropy inputs (bank signature / marketing cookie etc), where we need to isolate

^[\w\/]{1,6}$

That's why it would be great if you allowed me to split theses datasets.

I understand that this may be an unpleasant moment for the long-time members - ourselves included - but we’re preparing for the future by making things easier for newcomers.

@EsadCetiner
Copy link
Copy Markdown
Member Author

@touchweb-vincent That's because mainly (In my opinion at least) we're reaching the limit as to what can be done using the @PmFromFile operator. I opened an issue at #4248 to discuss moving these rules to using a regex (Among other things) which should allow us to build more complex rules and maybe create a suffix we can add to all entries to reduce false positives to a minimum.

I'm happy to make a small change to this PR to reduce false positives, but bigger changes will take more effort.

@fzipi
Copy link
Copy Markdown
Member

fzipi commented Oct 25, 2025

So, .pac files are for proxy autoconfiguration. I've used them in the past and they might disclose information about internal services. Just my 2 cents.

@azurit
Copy link
Copy Markdown
Member

azurit commented Oct 26, 2025

Should be readded.

@touchweb-vincent
Copy link
Copy Markdown
Contributor

I agree that switching from PmFromFile to regex assembly can help make the rules more precise and therefore reduce false positives.

However, splitting becomes much harder for third parties - especially for newcomers. It’s far easier to split and manage a simple file than to break apart a complex regex assembly.

@dune73
Copy link
Copy Markdown
Member

dune73 commented Oct 27, 2025

Do you see users splitting rules so they can exclude some of the patterns, but not all, as part of their local integration?

I would argue that only PROs with an advanced understanding of CRS are doing that and for them making the step to regex assembly is doable. Also based on the comments provided with the rule.

@touchweb-vincent
Copy link
Copy Markdown
Contributor

After talking extensively with many of them while running the diagnoser, the overwhelming majority simply disable the rule because of false positives. But for the small percentage who try to do it properly, they take the time to rationalize the rules based on the risk of false positives.

For those users, moving from PmFromFile to regex assembly makes things more complicated than it actually solves.

At least in its current state, and without splitting those datasets by false positive risk - typically against base64 or high-entropy inputs.

@dune73
Copy link
Copy Markdown
Member

dune73 commented Oct 27, 2025

These insights are very interesting. Thank you.

@azurit
Copy link
Copy Markdown
Member

azurit commented Oct 27, 2025

I still don't get why writing a proper exclusion is not a solution here.

@touchweb-vincent
Copy link
Copy Markdown
Contributor

Azurit - please don’t take this as criticism or anything unpleasant; that’s really not my intention. And if it comes across that way, it’s only because of my limited command of English.

Have you ever deployed these rules across more than a hundred PHP e-commerce sites, each generating over €1M per year, and therefore relying on a very large volume of third-party integrations, including ad networks?

I can assure you, it’s an abject chaos. The people trying to deploy OWASP rules don’t have the time to do it properly unless we help them at least a little by rationalizing these datasets by false positive risk.

@azurit
Copy link
Copy Markdown
Member

azurit commented Oct 27, 2025

I'm running CRS globally, without option for users to disable it, on my own webhosting services which i'm providing for ~10 thousands of domains (all sort of web software - WordPress, Joomla, Drupal, OpenCart, Magento, Prestashop, ...).

@touchweb-vincent
Copy link
Copy Markdown
Contributor

How many purely e-commerce sites generating over one million per year do you manage - sites that therefore very likely have dozens of dependencies with third-party services such as logistics providers, mailing companies, or retargeting/marketing platforms?

Have you enabled all the rules up to at least PL2? If so, at what score do you start blocking? 30? 60?

I’d be curious to run our WAF diagnoser on your infrastructure to better understand your tolerance levels.

@azurit
Copy link
Copy Markdown
Member

azurit commented Oct 27, 2025

I'm running on PL1 which is recommended if you don't want to mess up with FPs too much. Blocking from 5.

@azurit
Copy link
Copy Markdown
Member

azurit commented Oct 27, 2025

fw

@touchweb-vincent
Copy link
Copy Markdown
Contributor

If you are who you claim to be, Azurit, then please understand that you’re either among those who have shown total dedication for years to reach that level.

We’re trying to be that way as well. But we also see the difficulties our peers are facing - and many of them would really appreciate a bit of help.

If you have a SIEM that alerts you on every single block, then you know that high-entropy inputs are everywhere on ecommerce context - and that means thousands of exclusions to create.

If we rationalized this based on false positive risk, our peers would have significantly fewer issues, since they could more broadly disable the rule sets with a high false positive risk on high-entropy inputs.

I completely understand that it’s frustrating to think we’re adding more work for ourselves with additional exclusions - since we separate these datasets, we’ll also have extra work here, simply because third parties don’t have the time to handle things properly and end up disabling everything hastily at the first false positive.

Now, there’s a deeper reflection to be had about whether OWASP should acknowledge that the vast majority of people simply don’t have the time to manage this cleanly - and, as such, try to approach the problem differently.

Disabling a fragmented rule on a more limited scope is still less harmful than disabling the entire rule altogether.

This approach helps our peers take ownership of the project, and as a result, we help open-source solutions better protect themselves.

@touchweb-vincent
Copy link
Copy Markdown
Contributor

We don’t have the statistical data here to quickly isolate the exclusions related only to PL1 - it’s true that this volume is mainly due to the fact that we strive to work at PL2, and partly at PL3.

We currently have more than 4,956 iterations of the keyword ctl:ruleRemove on our shared segment, and 5,003 on client-specific segments.

Please understand that when most third parties see dozens or even hundreds of exclusions to configure, they completely panic.

@EsadCetiner
Copy link
Copy Markdown
Member Author

@azurit If you have 740 rule exclusions at PL-1 then I can see rule exclusions being in the thousands if your running at PL-2, anecdotally the false positives roughly double at PL-2. I don't see why @touchweb-vincent would exaggerate that claim, I can easily see that happening.

@touchweb-vincent @azurit Can you please at least move this discussion about splitting rules into an issue instead of discussing this in PRs that have absolutely no relevance to the subject.

@azurit
Copy link
Copy Markdown
Member

azurit commented Oct 27, 2025

Running on higher PL than 1 comes with FPs and exclusions. That's how it is (official from us, developers).

@fzipi
Copy link
Copy Markdown
Member

fzipi commented Oct 27, 2025

@touchweb-vincent @azurit The discussions on experience, usage and other rules in different PLs are relevant but not here. Please create a different issue to discuss 🙏 🙏 🙏 .

@fzipi
Copy link
Copy Markdown
Member

fzipi commented Oct 27, 2025

@touchweb-vincent The restricted extensions apply on rule 920440 only, and the rule applies to REQUEST_BASENAME only, not on any ARGS. When you mentioned about Mr.pac, is this a common finding? E.g. applications are sending POST /register/surname/Mr.pac as REST parameters?

@touchweb-vincent
Copy link
Copy Markdown
Contributor

@fzipi .pac (lfi-os-files.data) was in lfi-os-files.data which has a broader scope :

SecRule REQUEST_COOKIES|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@pmFromFile lfi-os-files.data" \

@fzipi
Copy link
Copy Markdown
Member

fzipi commented Oct 27, 2025

Perfect, now it makes more sense.

@EsadCetiner
Copy link
Copy Markdown
Member Author

I updated this PR to include files for GitLab omnibus.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the restricted files and file extensions lists used by the OWASP Core Rule Set (CRS) to detect and block access to sensitive files. The updates are compiled from multiple security sources including Nikto Scan, CrowdSec's security lists, GitHub's gitignore templates, and GitLab omnibus configuration files.

Changes:

  • Added numerous sensitive configuration files, credentials files, cache directories, and system files to the restricted lists
  • Added new restricted file extensions (.jks, .swap, .swo, .tfstate) to prevent access to Java keystores, swap files, and Terraform state files
  • Consolidated some directory paths (e.g., replaced specific var/www subdirectories with the parent var/www)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
rules/restricted-upload.data Added sensitive files and directories to upload restrictions; contains duplicate entries and missing cross-file consistency
rules/restricted-files.data Added sensitive files, GitLab configs, Python cache, Windows system files, MySQL configs, log files, and directory roots; has alphabetical ordering issues
rules/lfi-os-files.data Added sensitive files and /etc directories for various services (GitLab, Docker, mail servers, etc.); has alphabetical ordering issue and a typo
rules/REQUEST-901-INITIALIZATION.conf Updated default restricted extensions list to include .jks, .swap, .swo, and .tfstate
crs-setup.conf.example Updated documentation to reflect new restricted extensions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rules/restricted-upload.data Outdated
Comment thread rules/restricted-upload.data Outdated
Comment thread rules/restricted-upload.data
Comment thread rules/restricted-files.data Outdated
Comment thread rules/lfi-os-files.data Outdated
Comment thread rules/lfi-os-files.data Outdated
Comment thread rules/restricted-upload.data Outdated
Comment thread rules/restricted-upload.data Outdated
Copy link
Copy Markdown
Member

@fzipi fzipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go on and approve this one, as it is compliant on how we have managed things up to now.

There is context in this PR that should go to an issue for further discussion, as I think there is value in trying to figure it out what is next for the project.

Who will take the lead and follow up?

@EsadCetiner
Copy link
Copy Markdown
Member Author

@fzipi The discussion that happened in this was completely unrelated to this PR, it was about splitting CRS rules based on their character length to different rules. There originally was this PR which proposed to do such a thing for one rule, but then this PR was hijacked to discuss that topic #4304.

I did open another issue though that was generally about improving the local file inclusion rules to improve the detection of true positives: #4300

@EsadCetiner EsadCetiner added this pull request to the merge queue Jan 26, 2026
Merged via the queue into coreruleset:main with commit 3f83126 Jan 26, 2026
8 checks passed
@EsadCetiner EsadCetiner deleted the feat-update-restricted-files branch January 26, 2026 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release:new-detection In this PR we introduce a new detection

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review gitignore templates for adding to restrictes-files or lfi data files

6 participants