Skip to content

gh-121284: Fix email address header folding with parsed encoded-word #122754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 18, 2025

Conversation

medmunds
Copy link
Contributor

@medmunds medmunds commented Aug 6, 2024

Fixes #121284.

[This fixes a #security-issue. PSRT instructed me to handle the fix publicly.]

Email generators using email.policy.default may convert an RFC 2047 encoded-word to unencoded form during header refolding. In a structured header, this could allow 'specials' chars outside a quoted-string, leading to invalid address headers and enabling spoofing. This change ensures a parsed encoded-word that contains specials is kept as an encoded-word while the header is refolded.

The issue is very similar to PR #122753 (and has the same security implications), but this PR involves refolding an encoded-word; the other PR involves refolding a quoted-string. The fixes required are different.

…-word

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.
@medmunds
Copy link
Contributor Author

medmunds commented Sep 9, 2024

Requesting a review from @serhiy-storchaka, who recently reviewed a similar security fix around encoded words in email headers (#122233).

And pinging @bitdancer, who probably knows the most about this section of the code.

(Also hoping both of you might be able to review PR #122753, which tries to fix a related security issue first reported 5 years ago.)

@medmunds
Copy link
Contributor Author

(A nicer fix would be to decide separately whether each refolded segment needs rfc2047 encoding, quoted-string handling, or no special treatment. But that would require giving _refold_parse_tree() info about whether it's working in a structured or unstructured header, which seems too involved for a security patch.)

@medmunds
Copy link
Contributor Author

medmunds commented Jan 19, 2025

Btw, although this may seem like it's too obscure to matter much, it's actually pretty easy to stumble into vulnerable code. E.g., calling email.utils.formataddr() (correctly!) with user-supplied content can generate exactly the sort of (valid) encoded-word that gets mishandled by email.policy.default.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my reading of the code, this is correct.
I'd appreciate an additional review from an email expert, but, if there are no objections I plat to merge this next week.

@bitdancer
Copy link
Member

Please give me another day to look at this before you merge. I think it's fine, but there are a couple things I want to check.

@bitdancer
Copy link
Member

OK, there is actually a pretty straightforward solution to this problem, and the functionality is better:

diff --git a/Lib/email/_header_value_parser.py b/Lib/email/_header_value_parser.py
index c0e856306d3..9a51b943733 100644
--- a/Lib/email/_header_value_parser.py
+++ b/Lib/email/_header_value_parser.py
@@ -1053,7 +1053,7 @@ def get_fws(value):
     fws = WhiteSpaceTerminal(value[:len(value)-len(newvalue)], 'fws')
     return fws, newvalue
 
-def get_encoded_word(value):
+def get_encoded_word(value, terminal_type='vtext'):
     """ encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
 
     """
@@ -1092,7 +1092,7 @@ def get_encoded_word(value):
             ew.append(token)
             continue
         chars, *remainder = _wsp_splitter(text, 1)
-        vtext = ValueTerminal(chars, 'vtext')
+        vtext = ValueTerminal(chars, terminal_type)
         _validate_xtext(vtext)
         ew.append(vtext)
         text = ''.join(remainder)
@@ -1134,7 +1134,7 @@ def get_unstructured(value):
         valid_ew = True
         if value.startswith('=?'):
             try:
-                token, value = get_encoded_word(value)
+                token, value = get_encoded_word(value, 'utext')
             except _InvalidEwError:
                 valid_ew = False
             except errors.HeaderParseError:
@@ -1163,7 +1163,7 @@ def get_unstructured(value):
         # the parser to go in an infinite loop.
         if valid_ew and rfc2047_matcher.search(tok):
             tok, *remainder = value.partition('=?')
-        vtext = ValueTerminal(tok, 'vtext')
+        vtext = ValueTerminal(tok, 'utext')
         _validate_xtext(vtext)
         unstructured.append(vtext)
         value = ''.join(remainder)
@@ -2813,7 +2813,7 @@ def _refold_parse_tree(parse_tree, *, policy):
             continue
         tstr = str(part)
         if not want_encoding:
-            if part.token_type == 'ptext':
+            if part.token_type in ('ptext', 'vtext'):
                 # Encode if tstr contains special characters.
                 want_encoding = not SPECIALSNL.isdisjoint(tstr)
             else:

At this point I no longer remember what 'vtext' was supposed to stand for, but 'utext' is obviously 'unstructured text token' ;) The documentation of these bits of the code could use some improvement (not to mention the code itself!), but this fixes the problem pretty much in the way the original code was intended to work, if we imagine that the failure to check whether or not we were dealing with structured text was a bug as opposed to me forgetting about the distinction ;)

medmunds and others added 2 commits March 5, 2025 10:32
… encoded-word

[Better fix from @bitdancer.]

Co-authored-by: R David Murray <rdmurray@bitdance.com>
@medmunds
Copy link
Contributor Author

medmunds commented Mar 5, 2025

Nice! That feels much better, and allows unencoding encoded-words in refolding when it's safe. I've updated the PR with your change.

Copy link
Member

@bitdancer bitdancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bitdancer bitdancer added needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Mar 17, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 18, 2025
…-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 18, 2025

GH-131403 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Mar 18, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 18, 2025
…-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 18, 2025

GH-131404 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Mar 18, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 18, 2025
…-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
@miss-islington-app
Copy link

Sorry, @medmunds and @encukou, I could not cleanly backport this to 3.10 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 295b53df2aa18deb625a7da41f7e4babfe6ef34b 3.10
@bedevere-app
Copy link

bedevere-app bot commented Mar 18, 2025

GH-131405 is a backport of this pull request to the 3.11 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.11 only security fixes label Mar 18, 2025
@miss-islington-app
Copy link

Sorry, @medmunds and @encukou, I could not cleanly backport this to 3.9 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 295b53df2aa18deb625a7da41f7e4babfe6ef34b 3.9
encukou added a commit to encukou/cpython that referenced this pull request Mar 18, 2025
…encoded-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
encukou added a commit to encukou/cpython that referenced this pull request Mar 18, 2025
…encoded-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 18, 2025

GH-131411 is a backport of this pull request to the 3.10 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.10 only security fixes label Mar 18, 2025
encukou added a commit to encukou/cpython that referenced this pull request Mar 18, 2025
…ncoded-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
encukou added a commit to encukou/cpython that referenced this pull request Mar 18, 2025
…ncoded-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
encukou added a commit to encukou/cpython that referenced this pull request Mar 18, 2025
…ncoded-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 18, 2025

GH-131412 is a backport of this pull request to the 3.9 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.9 only security fixes label Mar 18, 2025
bitdancer added a commit that referenced this pull request Mar 18, 2025
…d-word (GH-122754) (#131403)

gh-121284: Fix email address header folding with parsed encoded-word (GH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
bitdancer added a commit that referenced this pull request Mar 18, 2025
…d-word (GH-122754) (#131404)

gh-121284: Fix email address header folding with parsed encoded-word (GH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------
(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
colesbury pushed a commit to colesbury/cpython that referenced this pull request Mar 20, 2025
…-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------

Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
ambv pushed a commit that referenced this pull request Apr 3, 2025
…d-word (GH-122754) (GH-131405)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
ambv pushed a commit that referenced this pull request Apr 3, 2025
…d-word (GH-122754) (GH-131411)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
ambv pushed a commit that referenced this pull request Apr 3, 2025
…-word (GH-122754) (GH-131412)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
gentoo-bot pushed a commit to gentoo/cpython that referenced this pull request Apr 9, 2025
…ncoded-word (pythonGH-122754) (pythonGH-131412)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

(cherry picked from commit 295b53d)

Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…-word (pythonGH-122754)

Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------

Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants