Documentation
The claim at:
|
* Special characters lose their special meaning inside sets. For example, |
|
``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, |
|
``'*'``, or ``')'``. |
seems wrong at least for
\.
Consider the following example:
>>> bool(re.search(string=b"a\\b",pattern=b"[\\\n\r]"))
False
My expectation would be that after backslash-unescaping the b"…"-string, pattern is assigned the sequence of:
literal \, the line-feed "character", the carriage-return "character"
If it would be true, that "Special characters lose their special meaning inside sets.", then the resolved \ in the unescaped pattern should match the one in my test string b"a\\b", however it does not.
I guess what Python actually "sees" is:
backslash-escaped line-feed "character", the carriage-return "character"
which probably effectively yields:
the line-feed "character", the carriage-return "character"
Now you could argue that the \ is not considered a special-character for the terms of the regular expression syntax... but it is, at least already because of:
|
The special sequences consist of ``'\'`` and a character from the list below. |
|
If the ordinary character is not an ASCII digit or an ASCII letter, then the |
|
resulting RE will match the second character. For example, ``\$`` matches the |
|
character ``'$'``. |
and ff..
Also, even the section that explains […] mentions the escaping functionality of it:
|
``[0-9A-Fa-f]`` will match any hexadecimal digit. If ``-`` is escaped (e.g. |
|
``[a\-z]``) or if it's placed as the first or last character |
I think:
|
* Special characters lose their special meaning inside sets. For example, |
|
``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, |
|
``'*'``, or ``')'``. |
should be improved to document that:
\ is exempt from this
- whether or this is only the case for characters that are actually special with respect to the RE bracket expression, i.e.
[0\-9] is 0, - and 9, because the - was special in that position. But what about [\-9]? Here, the - would not have been special, so it the result \, - and 9 or just - and 9?
- or whether this is simply the case for any character following the
\ ... ones that are special outside and RE bracket expression, like \$, \D. \w or \number... and/or ones that are never special, like \ü.
Thanks,
Chris.
Linked PRs
Documentation
The claim at:
cpython/Doc/library/re.rst
Lines 253 to 255 in d0c6ba9
seems wrong at least for
\.Consider the following example:
My expectation would be that after backslash-unescaping the
b"…"-string,patternis assigned the sequence of:literal
\, the line-feed "character", the carriage-return "character"If it would be true, that "Special characters lose their special meaning inside sets.", then the resolved
\in the unescapedpatternshould match the one in my test stringb"a\\b", however it does not.I guess what Python actually "sees" is:
backslash-escaped line-feed "character", the carriage-return "character"
which probably effectively yields:
the line-feed "character", the carriage-return "character"
Now you could argue that the
\is not considered a special-character for the terms of the regular expression syntax... but it is, at least already because of:cpython/Doc/library/re.rst
Lines 504 to 507 in d0c6ba9
and ff..
Also, even the section that explains
[…]mentions the escaping functionality of it:cpython/Doc/library/re.rst
Lines 249 to 250 in d0c6ba9
I think:
cpython/Doc/library/re.rst
Lines 253 to 255 in d0c6ba9
should be improved to document that:
\is exempt from this[0\-9]is0,-and9, because the-was special in that position. But what about[\-9]? Here, the-would not have been special, so it the result\,-and9or just-and9?\... ones that are special outside and RE bracket expression, like\$,\D.\wor\number... and/or ones that are never special, like\ü.Thanks,
Chris.
Linked PRs