get_subscriber_hash() should not call str.encode() with no args #126

coredumperror · 2017-04-27T18:35:45Z

I ran into an error today in my app that uses mailchimp3, and tracked it down to the get_subscriber_hash() function calling member_email.lower().encode(). This use of encode() is problematic for three reasons:

Leaving out the encoding argument causes encode() to use the default encoding, which is probably ascii. But it might be something else, and you definitely dont want this function to perform differently on different systems.
Encoding the email address to ascii will throw an exception for perfectly legal email addresses. in the "Note" section on this page of the MailChimp docs shows that MailChimp accepts email addresses with non-ascii characters in the domain name.
If member_email is a raw python string, rather than a Unicode string, calling encode() on it for this purpose is pointless and potentially error-prone (this is only likely in Python 2, since strings are Unicode by default in PY3). A raw python string is, by definition, already encoded. So unless you want to re-encode it to something completely different, like base64, calling encode() has no effect except to throw an exception for potentially valid addresses.

If I knew a really "appropriate" solution to this, I'd have made this a PR instead of an issue. My best guess for a good solution would be something like this:

def get_subscriber_hash(member_email):
    """
    The MD5 hash of the lowercase version of the list member's email.
    Used as subscriber_hash

    :param member_email: The member's email address
    :type member_email: :py:class:`str`
    :returns: The MD5 hash in hex
    :rtype: :py:class:`str`
    """
    check_email(member_email)
    member_email = member_email.lower().encode('utf8')
    m = hashlib.md5(member_email)
    return m.hexdigest()

The addition of the 'utf8' argument solves issues 1 and 2, but doesn't really solve 3. I'm not entirely certain if that's a problem, though, since the most likely encoding for a non-Unicode string is ascii. An ascii string is already valid utf8, so the encode('utf8') call should do nothing and not throw an exception. But if it's, say, a cp1251-encoded (windows encoding) string, I don't know what encode('utf8') will do.

The text was updated successfully, but these errors were encountered:

charlesthk · 2017-08-18T08:15:34Z

@coredumperror, @stephenross maybe we can set member_email.lower().encode('utf8') and add from __future__ import unicode_literals add the top of the file to fix the third point ?

coredumperror · 2017-08-21T17:40:19Z

Unfortunately, from __future__ import unicode_literals won't do anything to help point 3. All that does is make it so that string literals defined in the current file, like foo = "bar" will be unicode objects in Python 2, instead of str objects. And it doesn't do anything in PY3.

coredumperror changed the title ~~get_subscriber_hash() should not call str.encode()~~ get_subscriber_hash() should not call str.encode() Apr 27, 2017

coredumperror changed the title ~~get_subscriber_hash() should not call str.encode()~~ get_subscriber_hash() should not call str.encode() with no args Apr 27, 2017

stephenross added the bug label Apr 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_subscriber_hash() should not call str.encode() with no args #126

get_subscriber_hash() should not call str.encode() with no args #126

coredumperror commented Apr 27, 2017

charlesthk commented Aug 18, 2017

coredumperror commented Aug 21, 2017

get_subscriber_hash() should not call str.encode() with no args #126

get_subscriber_hash() should not call str.encode() with no args #126

Comments

coredumperror commented Apr 27, 2017

charlesthk commented Aug 18, 2017

coredumperror commented Aug 21, 2017