Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_subscriber_hash() should not call str.encode() with no args #126

Open
coredumperror opened this issue Apr 27, 2017 · 2 comments
Open
Labels

Comments

@coredumperror
Copy link
Contributor

I ran into an error today in my app that uses mailchimp3, and tracked it down to the get_subscriber_hash() function calling member_email.lower().encode(). This use of encode() is problematic for three reasons:

  1. Leaving out the encoding argument causes encode() to use the default encoding, which is probably ascii. But it might be something else, and you definitely dont want this function to perform differently on different systems.
  2. Encoding the email address to ascii will throw an exception for perfectly legal email addresses. in the "Note" section on this page of the MailChimp docs shows that MailChimp accepts email addresses with non-ascii characters in the domain name.
  3. If member_email is a raw python string, rather than a Unicode string, calling encode() on it for this purpose is pointless and potentially error-prone (this is only likely in Python 2, since strings are Unicode by default in PY3). A raw python string is, by definition, already encoded. So unless you want to re-encode it to something completely different, like base64, calling encode() has no effect except to throw an exception for potentially valid addresses.

If I knew a really "appropriate" solution to this, I'd have made this a PR instead of an issue. My best guess for a good solution would be something like this:

def get_subscriber_hash(member_email):
    """
    The MD5 hash of the lowercase version of the list member's email.
    Used as subscriber_hash

    :param member_email: The member's email address
    :type member_email: :py:class:`str`
    :returns: The MD5 hash in hex
    :rtype: :py:class:`str`
    """
    check_email(member_email)
    member_email = member_email.lower().encode('utf8')
    m = hashlib.md5(member_email)
    return m.hexdigest()

The addition of the 'utf8' argument solves issues 1 and 2, but doesn't really solve 3. I'm not entirely certain if that's a problem, though, since the most likely encoding for a non-Unicode string is ascii. An ascii string is already valid utf8, so the encode('utf8') call should do nothing and not throw an exception. But if it's, say, a cp1251-encoded (windows encoding) string, I don't know what encode('utf8') will do.

@coredumperror coredumperror changed the title get_subscriber_hash() should not call str.encode() get_subscriber_hash() should not call str.encode() Apr 27, 2017
@coredumperror coredumperror changed the title get_subscriber_hash() should not call str.encode() get_subscriber_hash() should not call str.encode() with no args Apr 27, 2017
@charlesthk
Copy link
Member

@coredumperror, @stephenross maybe we can set member_email.lower().encode('utf8') and add from __future__ import unicode_literals add the top of the file to fix the third point ?

@coredumperror
Copy link
Contributor Author

Unfortunately, from __future__ import unicode_literals won't do anything to help point 3. All that does is make it so that string literals defined in the current file, like foo = "bar" will be unicode objects in Python 2, instead of str objects. And it doesn't do anything in PY3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants