Skip to content

GH-133711: Enable UTF-8 mode by default (PEP 686) #133712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented May 8, 2025

@StanFromIreland
Copy link
Contributor

Can someone please (I don't have permissions to order it) run this buildbot to verify it clears up #133677 for 3.15.

@AA-Turner
Copy link
Member Author

Investigating regular test failures first.

@methane
Copy link
Member

methane commented May 9, 2025

test_python_legacy_windows_stdio tests pipe encoding, but it should test console I/O encoding.
cc: @zooba

@methane
Copy link
Member

methane commented May 9, 2025

assert_python_ok uses PIPE for stdin/stdout/stderr.
spawn_python doesn't use PIPE for stderr. So this test can be rewritten like this.
But this test still requires the test is running in Console.

diff --git a/Lib/test/test_cmd_line.py b/Lib/test/test_cmd_line.py
index 1b40e0d05fe..243069aeb18 100644
--- a/Lib/test/test_cmd_line.py
+++ b/Lib/test/test_cmd_line.py
@@ -972,10 +972,12 @@ def test_python_legacy_windows_fs_encoding(self):

     @unittest.skipUnless(support.MS_WINDOWS, 'Test only applicable on Windows')
     def test_python_legacy_windows_stdio(self):
-        code = "import sys; print(sys.stdin.encoding, sys.stdout.encoding)"
+        # stdin and stdout are PIPE. So we check stderr encoding for Console I/O.
+        code = "import sys; print(sys.stderr.encoding)"
         expected = 'cp'
-        rc, out, err = assert_python_ok('-c', code, PYTHONLEGACYWINDOWSSTDIO='1')
-        self.assertIn(expected.encode(), out)
+        p = spawn_python('-c', code, env={"PYTHONLEGACYWINDOWSSTDIO": "1"})
+        out, rc = _kill_python_and_exit_code(p)
+        self.assertRegex(rb'\Acp\d+\Z', out.strip())

(I don't test it yet because I don't use Windows daily.)

If the UTF-8 mode is disabled, the interpreter defaults to using
the current locale settings, *unless* the current locale is identified
as a legacy ASCII-based locale (as described for :envvar:`PYTHONCOERCECLOCALE`),
and locale coercion is either disabled or fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is PEP 538: Coercing the legacy C locale to a UTF-8 based locale still relevant if UTF-8 mode is enabled by default? It may make disabling the UTF-8 mode more complicated. It's just an open question, I don't have the answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When UTF-8 mode is disabled:

  • If locale is not C or POSIX: locale encoding is used.
  • If locale is C or POSIX and PYTHONCOERCELOCALE is not set, locale is changed to C.UTF-8.
    • Although UTF-8 mode is disabled, locale encoding is UTF-8.
  • If locale is C or POSIX and PYTHONCOERCELOCALE is set, locale encoding will be ASCII.
* Python UTF-8 mode is now enabled by default.
It may be disabled with by setting :envvar:`PYTHONUTF8=0 <PYTHONUTF8>` as
an environment variable or by using the :option:`-X utf8=0 <-X>` flag.
See :pep:`686` for further details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can probably put some more explanation in here, such as that it affects TextIOWrapper and hence open(). The current description doesn't sound as scary as it needs to, in my opinion.

Along the lines of: "Python UTF-8 mode is now enabled by default. This means that (files/console/etc.) will now use UTF-8 regardless of system settings, unless specifically overridden in code (typically with an encoding= argument)."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, it's nothing new. But we shouldn't assume that everyone already knows what UTF-8 mode implies. There are many more people out there who haven't ever thought about it than those who are waiting for it to be the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another effect of the UTF-8 Mode is that Python ignores the locale encoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FFY00 FFY00 removed their request for review May 10, 2025 02:28
@@ -952,6 +952,9 @@ UTF-8 mode
==========

.. versionadded:: 3.7
.. versionchanged:: next

Python UTF-8 mode is now enabled by default (:pep:`686`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a link to the UTF-8 Mode section (utf8-mode).

Copy link
Member Author

@AA-Turner AA-Turner May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what was best because the main text links to UTF 8 mode (para 3), and I think there have been recent attempts to avoid over-linking. Happy either way though.

@@ -75,7 +75,30 @@ New features
Other language changes
======================


* Python now uses UTF-8_ as the default encoding, independent of the system's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might mention the UTF-8 Mode earlier since it has other side effects documented in the UTF-8 Mode section, such as changing sys.stdout error handler and ignoring the locale encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment