You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into an error today in my app that uses mailchimp3, and tracked it down to the get_subscriber_hash() function calling member_email.lower().encode(). This use of encode() is problematic for three reasons:
Leaving out the encoding argument causes encode() to use the default encoding, which is probably ascii. But it might be something else, and you definitely dont want this function to perform differently on different systems.
Encoding the email address to ascii will throw an exception for perfectly legal email addresses. in the "Note" section on this page of the MailChimp docs shows that MailChimp accepts email addresses with non-ascii characters in the domain name.
If member_email is a raw python string, rather than a Unicode string, calling encode() on it for this purpose is pointless and potentially error-prone (this is only likely in Python 2, since strings are Unicode by default in PY3). A raw python string is, by definition, already encoded. So unless you want to re-encode it to something completely different, like base64, calling encode() has no effect except to throw an exception for potentially valid addresses.
If I knew a really "appropriate" solution to this, I'd have made this a PR instead of an issue. My best guess for a good solution would be something like this:
def get_subscriber_hash(member_email):
"""
The MD5 hash of the lowercase version of the list member's email.
Used as subscriber_hash
:param member_email: The member's email address
:type member_email: :py:class:`str`
:returns: The MD5 hash in hex
:rtype: :py:class:`str`
"""
check_email(member_email)
member_email = member_email.lower().encode('utf8')
m = hashlib.md5(member_email)
return m.hexdigest()
The addition of the 'utf8' argument solves issues 1 and 2, but doesn't really solve 3. I'm not entirely certain if that's a problem, though, since the most likely encoding for a non-Unicode string is ascii. An ascii string is already valid utf8, so the encode('utf8') call should do nothing and not throw an exception. But if it's, say, a cp1251-encoded (windows encoding) string, I don't know what encode('utf8') will do.
The text was updated successfully, but these errors were encountered:
coredumperror
changed the title
get_subscriber_hash() should not call str.encode()
get_subscriber_hash() should not call str.encode()
Apr 27, 2017
coredumperror
changed the title
get_subscriber_hash() should not call str.encode()
get_subscriber_hash() should not call str.encode() with no args
Apr 27, 2017
@coredumperror, @stephenross maybe we can set member_email.lower().encode('utf8') and add from __future__ import unicode_literals add the top of the file to fix the third point ?
Unfortunately, from __future__ import unicode_literals won't do anything to help point 3. All that does is make it so that string literals defined in the current file, like foo = "bar" will be unicode objects in Python 2, instead of str objects. And it doesn't do anything in PY3.
I ran into an error today in my app that uses
mailchimp3
, and tracked it down to theget_subscriber_hash()
function callingmember_email.lower().encode()
. This use ofencode()
is problematic for three reasons:encode()
to use the default encoding, which is probably ascii. But it might be something else, and you definitely dont want this function to perform differently on different systems.member_email
is a raw python string, rather than a Unicode string, callingencode()
on it for this purpose is pointless and potentially error-prone (this is only likely in Python 2, since strings are Unicode by default in PY3). A raw python string is, by definition, already encoded. So unless you want to re-encode it to something completely different, like base64, callingencode()
has no effect except to throw an exception for potentially valid addresses.If I knew a really "appropriate" solution to this, I'd have made this a PR instead of an issue. My best guess for a good solution would be something like this:
The addition of the
'utf8'
argument solves issues 1 and 2, but doesn't really solve 3. I'm not entirely certain if that's a problem, though, since the most likely encoding for a non-Unicode string is ascii. An ascii string is already valid utf8, so theencode('utf8')
call should do nothing and not throw an exception. But if it's, say, acp1251
-encoded (windows encoding) string, I don't know whatencode('utf8')
will do.The text was updated successfully, but these errors were encountered: