feat: update restricted files and file extensions#4299
Conversation
|
📊 Quantitative test results for language: |
azurit
left a comment
There was a problem hiding this comment.
Looks good but it is a bigger PR so i suggest also someone else looks at it.
|
A heartfelt thought for all the Ms. and Mr. PACs who will no longer be able to register on sites using email addresses like XX.pac@ This file is one of those I plan to propose splitting, if you agree with the approach detailed in the other PR. Believe it or not, we’ve actually seen some “Mister Boto” accounts on the E-commerce sites we manage. Indeed, you can say that OWASP seniors will create proper exclusions, since the scope of these rules is far from arguments such as email addresses. However, the vast majority of hosting professionals will simply disable the rule in a hurry after receiving a telling-off from their e-commerce clients - and never re-enable it again |
|
@touchweb-vincent Thanks for picking that up, turns out |
|
Thanks. However, the issue remains the same with all sequences of fewer than 5 characters that match the pattern ^[\.\w]{1,4}$And this dataset is also unusable as-is in e-commerce contexts because of base64/high entropy inputs (bank signature / marketing cookie etc), where we need to isolate ^[\w\/]{1,6}$That's why it would be great if you allowed me to split theses datasets. I understand that this may be an unpleasant moment for the long-time members - ourselves included - but we’re preparing for the future by making things easier for newcomers. |
|
@touchweb-vincent That's because mainly (In my opinion at least) we're reaching the limit as to what can be done using the I'm happy to make a small change to this PR to reduce false positives, but bigger changes will take more effort. |
|
So, |
|
Should be readded. |
|
I agree that switching from PmFromFile to regex assembly can help make the rules more precise and therefore reduce false positives. However, splitting becomes much harder for third parties - especially for newcomers. It’s far easier to split and manage a simple file than to break apart a complex regex assembly. |
|
Do you see users splitting rules so they can exclude some of the patterns, but not all, as part of their local integration? I would argue that only PROs with an advanced understanding of CRS are doing that and for them making the step to regex assembly is doable. Also based on the comments provided with the rule. |
|
After talking extensively with many of them while running the diagnoser, the overwhelming majority simply disable the rule because of false positives. But for the small percentage who try to do it properly, they take the time to rationalize the rules based on the risk of false positives. For those users, moving from PmFromFile to regex assembly makes things more complicated than it actually solves. At least in its current state, and without splitting those datasets by false positive risk - typically against base64 or high-entropy inputs. |
|
These insights are very interesting. Thank you. |
|
I still don't get why writing a proper exclusion is not a solution here. |
|
Azurit - please don’t take this as criticism or anything unpleasant; that’s really not my intention. And if it comes across that way, it’s only because of my limited command of English. Have you ever deployed these rules across more than a hundred PHP e-commerce sites, each generating over €1M per year, and therefore relying on a very large volume of third-party integrations, including ad networks? I can assure you, it’s an abject chaos. The people trying to deploy OWASP rules don’t have the time to do it properly unless we help them at least a little by rationalizing these datasets by false positive risk. |
|
I'm running CRS globally, without option for users to disable it, on my own webhosting services which i'm providing for ~10 thousands of domains (all sort of web software - WordPress, Joomla, Drupal, OpenCart, Magento, Prestashop, ...). |
|
How many purely e-commerce sites generating over one million per year do you manage - sites that therefore very likely have dozens of dependencies with third-party services such as logistics providers, mailing companies, or retargeting/marketing platforms? Have you enabled all the rules up to at least PL2? If so, at what score do you start blocking? 30? 60? I’d be curious to run our WAF diagnoser on your infrastructure to better understand your tolerance levels. |
|
I'm running on PL1 which is recommended if you don't want to mess up with FPs too much. Blocking from 5. |
|
If you are who you claim to be, Azurit, then please understand that you’re either among those who have shown total dedication for years to reach that level. We’re trying to be that way as well. But we also see the difficulties our peers are facing - and many of them would really appreciate a bit of help. If you have a SIEM that alerts you on every single block, then you know that high-entropy inputs are everywhere on ecommerce context - and that means thousands of exclusions to create. If we rationalized this based on false positive risk, our peers would have significantly fewer issues, since they could more broadly disable the rule sets with a high false positive risk on high-entropy inputs. I completely understand that it’s frustrating to think we’re adding more work for ourselves with additional exclusions - since we separate these datasets, we’ll also have extra work here, simply because third parties don’t have the time to handle things properly and end up disabling everything hastily at the first false positive. Now, there’s a deeper reflection to be had about whether OWASP should acknowledge that the vast majority of people simply don’t have the time to manage this cleanly - and, as such, try to approach the problem differently. Disabling a fragmented rule on a more limited scope is still less harmful than disabling the entire rule altogether. This approach helps our peers take ownership of the project, and as a result, we help open-source solutions better protect themselves. |
|
We don’t have the statistical data here to quickly isolate the exclusions related only to PL1 - it’s true that this volume is mainly due to the fact that we strive to work at PL2, and partly at PL3. We currently have more than 4,956 iterations of the keyword ctl:ruleRemove on our shared segment, and 5,003 on client-specific segments. Please understand that when most third parties see dozens or even hundreds of exclusions to configure, they completely panic. |
|
@azurit If you have 740 rule exclusions at PL-1 then I can see rule exclusions being in the thousands if your running at PL-2, anecdotally the false positives roughly double at PL-2. I don't see why @touchweb-vincent would exaggerate that claim, I can easily see that happening. @touchweb-vincent @azurit Can you please at least move this discussion about splitting rules into an issue instead of discussing this in PRs that have absolutely no relevance to the subject. |
|
Running on higher PL than 1 comes with FPs and exclusions. That's how it is (official from us, developers). |
|
@touchweb-vincent @azurit The discussions on experience, usage and other rules in different PLs are relevant but not here. Please create a different issue to discuss 🙏 🙏 🙏 . |
|
@touchweb-vincent The restricted extensions apply on rule 920440 only, and the rule applies to REQUEST_BASENAME only, not on any ARGS. When you mentioned about Mr.pac, is this a common finding? E.g. applications are sending |
|
@fzipi .pac (lfi-os-files.data) was in lfi-os-files.data which has a broader scope : |
|
Perfect, now it makes more sense. |
|
I updated this PR to include files for GitLab omnibus. |
There was a problem hiding this comment.
Pull request overview
This PR updates the restricted files and file extensions lists used by the OWASP Core Rule Set (CRS) to detect and block access to sensitive files. The updates are compiled from multiple security sources including Nikto Scan, CrowdSec's security lists, GitHub's gitignore templates, and GitLab omnibus configuration files.
Changes:
- Added numerous sensitive configuration files, credentials files, cache directories, and system files to the restricted lists
- Added new restricted file extensions (.jks, .swap, .swo, .tfstate) to prevent access to Java keystores, swap files, and Terraform state files
- Consolidated some directory paths (e.g., replaced specific var/www subdirectories with the parent var/www)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| rules/restricted-upload.data | Added sensitive files and directories to upload restrictions; contains duplicate entries and missing cross-file consistency |
| rules/restricted-files.data | Added sensitive files, GitLab configs, Python cache, Windows system files, MySQL configs, log files, and directory roots; has alphabetical ordering issues |
| rules/lfi-os-files.data | Added sensitive files and /etc directories for various services (GitLab, Docker, mail servers, etc.); has alphabetical ordering issue and a typo |
| rules/REQUEST-901-INITIALIZATION.conf | Updated default restricted extensions list to include .jks, .swap, .swo, and .tfstate |
| crs-setup.conf.example | Updated documentation to reflect new restricted extensions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
fzipi
left a comment
There was a problem hiding this comment.
I would go on and approve this one, as it is compliant on how we have managed things up to now.
There is context in this PR that should go to an issue for further discussion, as I think there is value in trying to figure it out what is next for the project.
Who will take the lead and follow up?
|
@fzipi The discussion that happened in this was completely unrelated to this PR, it was about splitting CRS rules based on their character length to different rules. There originally was this PR which proposed to do such a thing for one rule, but then this PR was hijacked to discuss that topic #4304. I did open another issue though that was generally about improving the local file inclusion rules to improve the detection of true positives: #4300 |

This PR was compiled from multiple sources:
I've removed entries that were not relevant for web applications to keep false positives to a minimum, and commented out ones that cause false positives.
These rules should be overhauled in the medium to long term, but this should be fine for now. I've opened an issue in #4300 to discuss further improvements to these rules.
closes #4248