Skip to content

bpo-41894: Fix UTF-8 errors loading native module#22466

Merged
methane merged 2 commits into
python:masterfrom
kadler:unicode-dynload-error
Oct 15, 2020
Merged

bpo-41894: Fix UTF-8 errors loading native module#22466
methane merged 2 commits into
python:masterfrom
kadler:unicode-dynload-error

Conversation

@kadler

@kadler kadler commented Sep 30, 2020

Copy link
Copy Markdown
Contributor

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

While this is most likely to occur on AIX, which still uses non-UTF-8
locales by default, it could occur on any Unix-like system.

https://bugs.python.org/issue41894

Comment thread Python/dynload_shlib.c Outdated
@taleinat

taleinat commented Oct 2, 2020

Copy link
Copy Markdown
Contributor

@aeros, why the "skip news" label? This is clearly a bug fix and those usually entail NEWS entries.

@taleinat taleinat added the type-bug An unexpected behavior, bug, or error label Oct 2, 2020
@kadler kadler force-pushed the unicode-dynload-error branch from a061b90 to 159f723 Compare October 2, 2020 16:48
@kadler

kadler commented Oct 2, 2020

Copy link
Copy Markdown
Contributor Author

I added a NEWS entry.

Comment thread Python/dynload_aix.c Outdated
Comment thread Python/dynload_aix.c Outdated
Comment thread Python/dynload_aix.c Outdated
@kadler kadler force-pushed the unicode-dynload-error branch from 398b987 to 46b331b Compare October 7, 2020 22:11

@serhiy-storchaka serhiy-storchaka left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread Misc/NEWS.d/next/Core and Builtins/2020-10-02-11-35-33.bpo-41894.ffmtOt.rst Outdated
Comment thread Python/dynload_aix.c Outdated
kadler and others added 2 commits October 13, 2020 18:36
When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.
For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@kadler kadler force-pushed the unicode-dynload-error branch from fdb8782 to 9230f48 Compare October 13, 2020 23:43
@kadler

kadler commented Oct 13, 2020

Copy link
Copy Markdown
Contributor Author

Ok, I believe all suggestions have been addressed. Wherever a filesystem path has been used, I have switched it to use PyUnicode_DecodeFSDefault and for any system error messages I have used PyUnicode_DecodeLocale. While PEP 489 dictates that the function name will always be ASCII, I have not switched to using PyUnicode_DecodeASCII since it does not solve any problem (other than being potentially less efficient), thus reducing the size of the changes.

@serhiy-storchaka serhiy-storchaka left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@methane methane merged commit 2d2af32 into python:master Oct 15, 2020
@miss-islington

Copy link
Copy Markdown
Contributor

Thanks @kadler for the PR, and @methane for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8, 3.9.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 15, 2020
…GH-22466)

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af32)

Co-authored-by: Kevin Adler <kadler@us.ibm.com>
@bedevere-bot

Copy link
Copy Markdown

GH-22704 is a backport of this pull request to the 3.9 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 15, 2020
…GH-22466)

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af32)

Co-authored-by: Kevin Adler <kadler@us.ibm.com>
@bedevere-bot

Copy link
Copy Markdown

GH-22705 is a backport of this pull request to the 3.8 branch.

miss-islington added a commit that referenced this pull request Oct 15, 2020
When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af32)

Co-authored-by: Kevin Adler <kadler@us.ibm.com>
miss-islington added a commit that referenced this pull request Oct 15, 2020
When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af32)

Co-authored-by: Kevin Adler <kadler@us.ibm.com>
@kadler kadler deleted the unicode-dynload-error branch October 16, 2020 15:25
xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
…GH-22466)

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
…GH-22466)

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants