Skip to content

[3.12] gh-82045: Correct and deduplicate "isprintable" docs; add test.#130125

Merged
encukou merged 2 commits into
python:3.12from
StanFromIreland:backport-3402e13-3.12
Feb 17, 2025
Merged

[3.12] gh-82045: Correct and deduplicate "isprintable" docs; add test.#130125
encukou merged 2 commits into
python:3.12from
StanFromIreland:backport-3402e13-3.12

Conversation

@StanFromIreland

@StanFromIreland StanFromIreland commented Feb 14, 2025

Copy link
Copy Markdown
Member

We had the definition of what makes a character "printable" documented in three places, giving two different definitions. The definition in the comment on _PyUnicode_IsPrintable was inverted; correct that.

With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.

Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.

Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.

Also add a thorough test that the implementation agrees with this definition.

Author: Stan Ulbrych stanulbrych@gmail.com

Co-authored-by: Greg Price gnprice@gmail.com
(cherry picked from commit 3402e13)


📚 Documentation preview 📚: https://cpython-previews--130125.org.readthedocs.build/

…pythonGH-130118)

We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.

With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database.  That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.

Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.

Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.

Also add a thorough test that the implementation agrees with this definition.

Author: Stan Ulbrych <stanulbrych@gmail.com>

Co-authored-by: Greg Price <gnprice@gmail.com>
(cherry picked from commit 3402e13)
@StanFromIreland

Copy link
Copy Markdown
Member Author

@encukou

@encukou encukou merged commit 8a598fb into python:3.12 Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants