Skip to content

gh-148798: Fix crash in _interpreters.create() on non-UTF-8 string in config#148799

Open
rajkripal wants to merge 1 commit intopython:mainfrom
rajkripal:fix/interpconfig-surrogate-crash
Open

gh-148798: Fix crash in _interpreters.create() on non-UTF-8 string in config#148799
rajkripal wants to merge 1 commit intopython:mainfrom
rajkripal:fix/interpconfig-surrogate-crash

Conversation

@rajkripal
Copy link
Copy Markdown

_config_dict_copy_str (Python/interpconfig.c) passed the result of
PyUnicode_AsUTF8() straight to strncpy(). When the string can't be
UTF-8 encoded — e.g. it contains an unpaired surrogate — PyUnicode_AsUTF8
returns NULL and sets UnicodeEncodeError, and strncpy then
dereferences NULL, segfaulting the interpreter.

Lone surrogates are reachable from pure Python ('\udc80', chr(0xDC80))
and also show up via surrogateescape when non-UTF-8 filenames / env
vars / argv get forwarded into a config dict.

Fix: check the return of PyUnicode_AsUTF8 and propagate the error,
mirroring the pattern already used at Modules/_interpretersmodule.c:425.

Regression test lives next to the existing gh-126221 case in
CreateTests.test_in_main.

…n config

_config_dict_copy_str passed the result of PyUnicode_AsUTF8 straight to
strncpy. When the string contained an unpaired surrogate, PyUnicode_AsUTF8
returned NULL and set UnicodeEncodeError, and strncpy dereferenced NULL.

Check the return value and propagate the error, mirroring the pattern used
at Modules/_interpretersmodule.c:425.

Add a regression test alongside the existing pythongh-126221 case.
@@ -0,0 +1,5 @@
Fix a crash in :func:`!_interpreters.create` when a config object passes a
string with an unpaired surrogate as a value (for example ``gil``). The
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be confused with gil being the "bad" word. Instead, just mention that the crash was fixed if the value had unpaires surrogates (but do not mention which keys they could be associated with)

@@ -0,0 +1,5 @@
Fix a crash in :func:`!_interpreters.create` when a config object passes a
string with an unpaired surrogate as a value (for example ``gil``). The
internal helper ``_config_dict_copy_str`` now checks the return of
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is necessary to mention the internals. Just mentioning interpreter's possible failures should be enough.

# GH-126221: Passing an invalid Unicode character used to cause a SystemError
self.assertRaises(UnicodeEncodeError, _interpreters.create, '\udc80')

# A config object with a surrogate in a string field must raise, not crash.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a separate test.

self.assertRaises(UnicodeEncodeError, _interpreters.create, '\udc80')

# A config object with a surrogate in a string field must raise, not crash.
class BadConfig:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to set all other fields?

Comment thread Python/interpconfig.c
strncpy(buf, PyUnicode_AsUTF8(item), bufsize-1);
const char *utf8 = PyUnicode_AsUTF8(item);
if (utf8 == NULL) {
Py_DECREF(item);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe add a note to the just-raised exception to indicate for which key it failed. But this can be done as a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants