gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler #129648

serhiy-storchaka · 2025-02-04T14:12:43Z

If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().

Issue: Use-after-free in unicode_escape decoder with error handler #133767

If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().

gpshead · 2025-02-04T21:55:53Z

Nice! This is similar enough, but clearly far more polished, than what I quickly whipped up while trying to understand the problem and linked to on the PSRT mailing list... that I won't bother posting my own draft PR.

I don't have a good feel for if we need to retain the older internal-use-only C APIs or not, but doing this change via ones with a suffix as you seem to be proposing and leaving the old, though now unused by our own internals, ones in place in case something else references them makes sense to me.

serhiy-storchaka · 2025-02-04T22:47:45Z

I experimented with several different solutions. One of them was similar to yours, except that I copied all three bytes. It was also necessary to distinguish "no invalid escape" from "escaped null byte". In the end, the currently proposed solution is the simplest.

This PR does not leave the old C API. I do not think that it is needed. The functions are renamed because an error at link time is more preferable than undefined behavior at run time.

Parser/string_parser.c

serhiy-storchaka · 2025-05-09T17:29:47Z

After adding a NEWS entry it is ready for review.

The code is now more complex, decoding functions now return both the invalid char and its positions. This is because the new code in the Python parser needs the position. It can be returned if there was no decoding errors handled by the error handler. The Python parser does not use the error handler.

bedevere-bot · 2025-05-10T05:32:35Z

🤖 New build scheduled with the buildbot fleet by @gpshead for commit 7194b4d 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F129648%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

serhiy-storchaka requested review from Yhg1s, ericvsmith and sethmlarson February 4, 2025 14:12

gpshead reviewed Feb 4, 2025

View reviewed changes

Parser/string_parser.c Outdated Show resolved Hide resolved

serhiy-storchaka added 2 commits May 9, 2025 19:56

Merge branch 'main' into unicode-escape-decode-errors

be5d80c

Add a NEWS entry.

7194b4d

serhiy-storchaka changed the title ~~Fix use-after-free in the unicode-escape decoder with error handler~~ May 9, 2025

bedevere-app bot mentioned this pull request May 9, 2025

Use-after-free in unicode_escape decoder with error handler #133767

Open

serhiy-storchaka marked this pull request as ready for review May 9, 2025 17:24

serhiy-storchaka requested review from pablogsal and lysnikolaou as code owners May 9, 2025 17:24

bedevere-app bot added the awaiting core review label May 9, 2025

gpshead approved these changes May 10, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels May 10, 2025

gpshead added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels May 10, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 10, 2025

serhiy-storchaka added type-security A security issue needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes labels May 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler #129648

gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler #129648

serhiy-storchaka commented Feb 4, 2025 •

edited by bedevere-app bot

Loading

gpshead commented Feb 4, 2025

serhiy-storchaka commented Feb 4, 2025

serhiy-storchaka commented May 9, 2025

bedevere-bot commented May 10, 2025

gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler #129648

Are you sure you want to change the base?

gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler #129648

Conversation

serhiy-storchaka commented Feb 4, 2025 • edited by bedevere-app bot Loading

gpshead commented Feb 4, 2025

serhiy-storchaka commented Feb 4, 2025

serhiy-storchaka commented May 9, 2025

bedevere-bot commented May 10, 2025

serhiy-storchaka commented Feb 4, 2025 •

edited by bedevere-app bot

Loading