Skip to content

Commit

Permalink
00435: pythongh-121650: Encode newlines in headers, and verify
Browse files Browse the repository at this point in the history
 headers are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

This patch also contains modified commit cherry picked from
c5bba85.

This commit was backported to simplify the backport of the other commit
fixing CVE. The only modification is a removal of one test case which
tests multiple changes in Python 3.7 and it wasn't working properly
with Python 3.6 where we backported only one change.

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
Co-authored-by: bsiem <[email protected]>
  • Loading branch information
5 people committed Aug 16, 2024
1 parent a03f753 commit 1ee66de
Show file tree
Hide file tree
Showing 11 changed files with 170 additions and 1 deletion.
6 changes: 6 additions & 0 deletions Doc/library/email.errors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,12 @@ The following exception classes are defined in the :mod:`email.errors` module:
:class:`~email.mime.image.MIMEImage`).


.. exception:: HeaderWriteError()

Raised when an error occurs when the :mod:`~email.generator` outputs
headers.


Here is the list of the defects that the :class:`~email.parser.FeedParser`
can find while parsing messages. Note that the defects are added to the message
where the problem was found, so for example, if a message nested inside a
Expand Down
18 changes: 18 additions & 0 deletions Doc/library/email.policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,24 @@ added matters. To illustrate::

.. versionadded:: 3.6


.. attribute:: verify_generated_headers

If ``True`` (the default), the generator will raise
:exc:`~email.errors.HeaderWriteError` instead of writing a header
that is improperly folded or delimited, such that it would
be parsed as multiple headers or joined with adjacent data.
Such headers can be generated by custom header classes or bugs
in the ``email`` module.

As it's a security feature, this defaults to ``True`` even in the
:class:`~email.policy.Compat32` policy.
For backwards compatible, but unsafe, behavior, it must be set to
``False`` explicitly.

.. versionadded:: 3.8.20


The following :class:`Policy` method is intended to be called by code using
the email library to create policy instances with custom settings:

Expand Down
9 changes: 9 additions & 0 deletions Lib/email/_header_value_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@
ASPECIALS = TSPECIALS | set("*'%")
ATTRIBUTE_ENDS = ASPECIALS | WSP
EXTENDED_ATTRIBUTE_ENDS = ATTRIBUTE_ENDS - set('%')
NLSET = {'\n', '\r'}
SPECIALSNL = SPECIALS | NLSET

def quote_string(value):
return '"'+str(value).replace('\\', '\\\\').replace('"', r'\"')+'"'
Expand Down Expand Up @@ -2611,6 +2613,13 @@ def _refold_parse_tree(parse_tree, *, policy):
wrap_as_ew_blocked -= 1
continue
tstr = str(part)
if not want_encoding:
if part.token_type == 'ptext':
# Encode if tstr contains special characters.
want_encoding = not SPECIALSNL.isdisjoint(tstr)
else:
# Encode if tstr contains newlines.
want_encoding = not NLSET.isdisjoint(tstr)
try:
tstr.encode(encoding)
charset = encoding
Expand Down
8 changes: 8 additions & 0 deletions Lib/email/_policybase.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,13 @@ class Policy(_PolicyBase, metaclass=abc.ABCMeta):
message_factory -- the class to use to create new message objects.
If the value is None, the default is Message.
verify_generated_headers
-- if true, the generator verifies that each header
they are properly folded, so that a parser won't
treat it as multiple headers, start-of-body, or
part of another header.
This is a check against custom Header & fold()
implementations.
"""

raise_on_defect = False
Expand All @@ -165,6 +172,7 @@ class Policy(_PolicyBase, metaclass=abc.ABCMeta):
max_line_length = 78
mangle_from_ = False
message_factory = None
verify_generated_headers = True

def handle_defect(self, obj, defect):
"""Based on policy, either raise defect or call register_defect.
Expand Down
4 changes: 4 additions & 0 deletions Lib/email/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ class CharsetError(MessageError):
"""An illegal charset was given."""


class HeaderWriteError(MessageError):
"""Error while writing headers."""


# These are parsing defects which the parser was able to work around.
class MessageDefect(ValueError):
"""Base class for a message defect."""
Expand Down
16 changes: 15 additions & 1 deletion Lib/email/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,14 @@
from copy import deepcopy
from io import StringIO, BytesIO
from email.utils import _has_surrogates
from email.errors import HeaderWriteError

UNDERSCORE = '_'
NL = '\n' # XXX: no longer used by the code below.

NLCRE = re.compile(r'\r\n|\r|\n')
fcre = re.compile(r'^From ', re.MULTILINE)
NEWLINE_WITHOUT_FWSP = re.compile(r'\r\n[^ \t]|\r[^ \n\t]|\n[^ \t]')



Expand Down Expand Up @@ -219,7 +221,19 @@ def _dispatch(self, msg):

def _write_headers(self, msg):
for h, v in msg.raw_items():
self.write(self.policy.fold(h, v))
folded = self.policy.fold(h, v)
if self.policy.verify_generated_headers:
linesep = self.policy.linesep
if not folded.endswith(self.policy.linesep):
raise HeaderWriteError(
f'folded header does not end with {linesep!r}: {folded!r}')
folded_no_linesep = folded
if folded.endswith(linesep):
folded_no_linesep = folded[:-len(linesep)]
if NEWLINE_WITHOUT_FWSP.search(folded_no_linesep):
raise HeaderWriteError(
f'folded header contains newline: {folded!r}')
self.write(folded)
# A blank line always separates headers from body
self.write(self._NL)

Expand Down
62 changes: 62 additions & 0 deletions Lib/test/test_email/test_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from email.message import EmailMessage
from email.generator import Generator, BytesGenerator
from email import policy
import email.errors
from test.test_email import TestEmailBase, parameterize


Expand Down Expand Up @@ -215,6 +216,44 @@ def test_rfc2231_wrapping_switches_to_default_len_if_too_narrow(self):
g.flatten(msg)
self.assertEqual(s.getvalue(), self.typ(expected))

def test_keep_encoded_newlines(self):
msg = self.msgmaker(self.typ(textwrap.dedent("""\
To: nobody
Subject: Bad subject=?UTF-8?Q?=0A?=Bcc: [email protected]
None
""")))
expected = textwrap.dedent("""\
To: nobody
Subject: Bad subject=?UTF-8?Q?=0A?=Bcc: [email protected]
None
""")
s = self.ioclass()
g = self.genclass(s, policy=self.policy.clone(max_line_length=80))
g.flatten(msg)
self.assertEqual(s.getvalue(), self.typ(expected))

def test_keep_long_encoded_newlines(self):
msg = self.msgmaker(self.typ(textwrap.dedent("""\
To: nobody
Subject: Bad subject =?UTF-8?Q?=0A?=Bcc: [email protected]
None
""")))
expected = textwrap.dedent("""\
To: nobody
Subject: Bad subject \n\
=?utf-8?q?=0A?=Bcc:
[email protected]
None
""")
s = self.ioclass()
g = self.genclass(s, policy=self.policy.clone(max_line_length=30))
g.flatten(msg)
self.assertEqual(s.getvalue(), self.typ(expected))


class TestGenerator(TestGeneratorBase, TestEmailBase):

Expand All @@ -223,6 +262,29 @@ class TestGenerator(TestGeneratorBase, TestEmailBase):
ioclass = io.StringIO
typ = str

def test_verify_generated_headers(self):
"""gh-121650: by default the generator prevents header injection"""
class LiteralHeader(str):
name = 'Header'
def fold(self, **kwargs):
return self

for text in (
'Value\r\nBad Injection\r\n',
'NoNewLine'
):
with self.subTest(text=text):
message = message_from_string(
"Header: Value\r\n\r\nBody",
policy=self.policy,
)

del message['Header']
message['Header'] = LiteralHeader(text)

with self.assertRaises(email.errors.HeaderWriteError):
message.as_string()


class TestBytesGenerator(TestGeneratorBase, TestEmailBase):

Expand Down
16 changes: 16 additions & 0 deletions Lib/test/test_email/test_headerregistry.py
Original file line number Diff line number Diff line change
Expand Up @@ -1546,6 +1546,22 @@ def test_set_message_header_from_group(self):

class TestFolding(TestHeaderBase):

def test_address_display_names(self):
"""Test the folding and encoding of address headers."""
for name, result in (
('Foo Bar, France', '"Foo Bar, France"'),
('Foo Bar (France)', '"Foo Bar (France)"'),
('Foo Bar, España', 'Foo =?utf-8?q?Bar=2C_Espa=C3=B1a?='),
('Foo Bar (España)', 'Foo Bar =?utf-8?b?KEVzcGHDsWEp?='),
('Foo, Bar España', '=?utf-8?q?Foo=2C_Bar_Espa=C3=B1a?='),
('Foo, Bar [España]', '=?utf-8?q?Foo=2C_Bar_=5BEspa=C3=B1a=5D?='),
('Foo Bär, France', 'Foo =?utf-8?q?B=C3=A4r=2C?= France'),
('Foo Bär <France>', 'Foo =?utf-8?q?B=C3=A4r_=3CFrance=3E?='),
):
h = self.make_header('To', Address(name, addr_spec='[email protected]'))
self.assertEqual(h.fold(policy=policy.default),
'To: %s <[email protected]>\n' % result)

def test_short_unstructured(self):
h = self.make_header('subject', 'this is a test')
self.assertEqual(h.fold(policy=policy.default),
Expand Down
26 changes: 26 additions & 0 deletions Lib/test/test_email/test_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class PolicyAPITests(unittest.TestCase):
'raise_on_defect': False,
'mangle_from_': True,
'message_factory': None,
'verify_generated_headers': True,
}
# These default values are the ones set on email.policy.default.
# If any of these defaults change, the docs must be updated.
Expand Down Expand Up @@ -265,6 +266,31 @@ def test_short_maxlen_error(self):
with self.assertRaises(email.errors.HeaderParseError):
policy.fold("Subject", subject)

def test_verify_generated_headers(self):
"""Turning protection off allows header injection"""
policy = email.policy.default.clone(verify_generated_headers=False)
for text in (
'Header: Value\r\nBad: Injection\r\n',
'Header: NoNewLine'
):
with self.subTest(text=text):
message = email.message_from_string(
"Header: Value\r\n\r\nBody",
policy=policy,
)
class LiteralHeader(str):
name = 'Header'
def fold(self, **kwargs):
return self

del message['Header']
message['Header'] = LiteralHeader(text)

self.assertEqual(
message.as_string(),
f"{text}\nBody",
)

# XXX: Need subclassing tests.
# For adding subclassed objects, make sure the usual rules apply (subclass
# wins), but that the order still works (right overrides left).
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix serialization of display name in originator or destination address fields with both encoded words and special chars.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:mod:`email` headers with embedded newlines are now quoted on output. The
:mod:`~email.generator` will now refuse to serialize (write) headers that
are unsafely folded or delimited; see
:attr:`~email.policy.Policy.verify_generated_headers`. (Contributed by Bas
Bloemsaat and Petr Viktorin in :gh:`121650`.)

0 comments on commit 1ee66de

Please sign in to comment.