Skip to content

fix: decode exiftool JSON output as UTF-8 instead of locale encoding#1994

Open
hanhan761 wants to merge 1 commit into
microsoft:mainfrom
hanhan761:fix/exiftool-utf8-encoding-1972
Open

fix: decode exiftool JSON output as UTF-8 instead of locale encoding#1994
hanhan761 wants to merge 1 commit into
microsoft:mainfrom
hanhan761:fix/exiftool-utf8-encoding-1972

Conversation

@hanhan761
Copy link
Copy Markdown

Summary

ExifTool always outputs JSON in UTF-8 (per RFC 8259), but the exiftool_metadata() function decoded the output using locale.getpreferredencoding(). On systems with non-UTF-8 locale encoding (e.g. Windows with Chinese locale cp936), this caused UnicodeDecodeError when ExifTool returned non-ASCII metadata.

Changes

  • _exiftool.py: Replace text=True with encoding=utf-8 in the version-check subprocess call, so stdout is decoded as UTF-8.
  • _exiftool.py: Decode JSON output bytes with utf-8 instead of locale.getpreferredencoding().
  • _exiftool.py: Remove unused import locale.
  • test_module_misc.py: Add test_exiftool_metadata_decodes_utf8 - a mocked unit test that verifies non-ASCII characters are correctly decoded from UTF-8.

Verification

  • python -m pytest tests/test_module_misc.py::test_exiftool_metadata_decodes_utf8 -v - PASSED
  • python -m pytest tests/test_module_vectors.py -v -x - all 109 PASSED

Backward Compatibility

  • No behavior change for UTF-8 locales (the common case)
  • Fixes decoding on non-UTF-8 locales (e.g. Windows cp936)
  • No new dependencies

Fixes #1972

ExifTool always outputs JSON in UTF-8 (per RFC 8259), but the
exiftool_metadata() function decoded the output using
locale.getpreferredencoding(). On systems with non-UTF-8 locale
encoding (e.g. Windows with Chinese locale cp936), this caused
UnicodeDecodeError when ExifTool returned non-ASCII metadata.

Fix:
- Use encoding='utf-8' instead of text=True in version-check
  subprocess.run() so stdout is decoded as UTF-8.
- Decode JSON output bytes with 'utf-8' instead of
  locale.getpreferredencoding().
- Remove unused import of locale.

Fixes microsoft#1972
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: exiftool JSON output decoded with locale encoding instead of UTF-8

1 participant