-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
GH-133711: Enable UTF-8 mode by default (PEP 686) #133712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Investigating regular test failures first. |
test_python_legacy_windows_stdio tests pipe encoding, but it should test console I/O encoding. |
assert_python_ok uses PIPE for stdin/stdout/stderr.
(I don't test it yet because I don't use Windows daily.) |
If the UTF-8 mode is disabled, the interpreter defaults to using | ||
the current locale settings, *unless* the current locale is identified | ||
as a legacy ASCII-based locale (as described for :envvar:`PYTHONCOERCECLOCALE`), | ||
and locale coercion is either disabled or fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is PEP 538: Coercing the legacy C locale to a UTF-8 based locale still relevant if UTF-8 mode is enabled by default? It may make disabling the UTF-8 mode more complicated. It's just an open question, I don't have the answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When UTF-8 mode is disabled:
- If locale is not C or POSIX: locale encoding is used.
- If locale is C or POSIX and PYTHONCOERCELOCALE is not set, locale is changed to C.UTF-8.
- Although UTF-8 mode is disabled, locale encoding is UTF-8.
- If locale is C or POSIX and PYTHONCOERCELOCALE is set, locale encoding will be ASCII.
Doc/whatsnew/3.15.rst
Outdated
* Python UTF-8 mode is now enabled by default. | ||
It may be disabled with by setting :envvar:`PYTHONUTF8=0 <PYTHONUTF8>` as | ||
an environment variable or by using the :option:`-X utf8=0 <-X>` flag. | ||
See :pep:`686` for further details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we can probably put some more explanation in here, such as that it affects TextIOWrapper and hence open()
. The current description doesn't sound as scary as it needs to, in my opinion.
Along the lines of: "Python UTF-8 mode is now enabled by default. This means that (files/console/etc.) will now use UTF-8 regardless of system settings, unless specifically overridden in code (typically with an encoding=
argument)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, it's nothing new. But we shouldn't assume that everyone already knows what UTF-8 mode implies. There are many more people out there who haven't ever thought about it than those who are waiting for it to be the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another effect of the UTF-8 Mode is that Python ignores the locale encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've expanded the What's New entry, please see https://cpython-previews--133712.org.readthedocs.build/en/133712/whatsnew/3.15.html#other-language-changes
@@ -952,6 +952,9 @@ UTF-8 mode | |||
========== | |||
|
|||
.. versionadded:: 3.7 | |||
.. versionchanged:: next | |||
|
|||
Python UTF-8 mode is now enabled by default (:pep:`686`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a link to the UTF-8 Mode section (utf8-mode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure what was best because the main text links to UTF 8 mode (para 3), and I think there have been recent attempts to avoid over-linking. Happy either way though.
@@ -75,7 +75,30 @@ New features | |||
Other language changes | |||
====================== | |||
|
|||
|
|||
* Python now uses UTF-8_ as the default encoding, independent of the system's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might mention the UTF-8 Mode earlier since it has other side effects documented in the UTF-8 Mode section, such as changing sys.stdout error handler and ignoring the locale encoding.
📚 Documentation preview 📚: https://cpython-previews--133712.org.readthedocs.build/