Fix three problems with --ignore-multiline-regex#3832
Merged
larsoner merged 4 commits intocodespell-project:mainfrom Nov 24, 2025
Merged
Fix three problems with --ignore-multiline-regex#3832larsoner merged 4 commits intocodespell-project:mainfrom
larsoner merged 4 commits intocodespell-project:mainfrom
Conversation
Consider the following text:
```
$ cat -n test.txt
1 codespell:ignore-begin
2 Thsi 1
3 codespell:ignore-end
4 thsi 2
```
When checking the file, as expected line 2 is ignored:
```
$ codespell \
--ignore-multiline-regex 'codespell:ignore-begin.*codespell:ignore-end' \
test.txt
test.txt:4: thsi ==> this
$
```
However, if we do the same using stdin, line 2 is not ignored:
```
$ cat test.txt \
| codespell \
--ignore-multiline-regex \
'codespell:ignore-begin.*codespell:ignore-end' \
-
2: Thsi 1
Thsi ==> This
4: thsi 2
thsi ==> this
```
Fix this in the filename == "-" handling in parse_file, by using
FileOpener.get_lines instead of io.IOBase.readlines.
Consider the following file:
```
$ cat test.txt
Thsi line contains a typo
While this line is correct
$
```
As expected, using --write-changes fixes the typo in the file:
```
$ codespell --write-changes test.txt
FIXED: test.txt
$ cat test.txt
This line contains a typo
While this line is correct
$
```
However, if we use --ignore-multiline-regex as well, we get instead:
```
$ codespell \
--ignore-multiline-regex 'codespell:ignore-begin.*codespell:ignore-end' \
-w \
test.txt
FIXED: test.txt
$ cat test.txt
This line contains a typoWhile this line is correct$
```
Fix this in the self.ignore_multiline_regex != None case in
FileOpener.get_lines, by using str.splitlines instead of str.split.
Fixes:
- codespell-project#3642
No functional changes. This makes parse_file a bit more readable, and makes the following patch smaller.
Consider the following file:
```
$ cat test.txt
codespell:ignore-begin
Thsi line contains a typo
codespell:ignore-end
Thsi line also contains a typo
While this line is correct
```
If we use codespell to fix the second typo:
```
$ codespell \
--ignore-multiline-regex 'codespell:ignore-begin.*codespell:ignore-end' \
--write-changes \
test.txt
FIXED: test.txt
```
indeed that typo is fixed, but the text matching the multiline regexp is gone:
```
$ cat test.txt
This line also contains a typo
While this line is correct
$
...
The problem is that FileOpener.get_lines (returning a list of strings)
implements --ignore-multiline-regex by blanking out the text matching the
regexp.
Fix this by changing FileOpener.get_lines to return a list of fragments, each
modeled as a tuple (ignored: bool, line_number: int, lines: list[str]), and
handling this new format elsewhere.
Collaborator
|
Looks good to me, I think the coverage issue predates these patches. @larsoner Could you have a look too? |
Member
|
Thanks @vries ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This series consists of four patches.
Patches 1, 2 and 4 each fix a problem with
--ignore-multiline-regex(described in the commit messages), and add a corresponding unit test.Patch 3 is a refactoring patch, making patch 4 smaller.
Patch 2 fixes this issue .
Closes #3642