Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mailto with plus sign incorrect marked as invalid #182

Open
theory opened this issue Dec 30, 2021 · 4 comments
Open

mailto with plus sign incorrect marked as invalid #182

theory opened this issue Dec 30, 2021 · 4 comments
Labels

Comments

@theory
Copy link

theory commented Dec 30, 2021

Describe the bug

On this page, I have a mailto: link like this:

<a href="mailto:[email protected]">subscribe by email</a>

Running htmltest (just installed via go install) it reports:

  invalid email address (invalid format): 'sqitch-users [email protected]' --- 2013/06/sqitch-list/index.html --> mailto:[email protected]

I think this is incorrect: isn't the plus sign valid there, and not representing a space. I tried pasting it into mailtolinkgenerator.com and it also output it with a plus. Looking at rfc6068, there's this table:

      mailtoURI    = "mailto:" [ to ] [ hfields ]
      to           = addr-spec *("," addr-spec )
      hfields      = "?" hfield *( "&" hfield )
      hfield       = hfname "=" hfvalue
      hfname       = *qchar
      hfvalue      = *qchar
      addr-spec    = local-part "@" domain
      local-part   = dot-atom-text / quoted-string
      domain       = dot-atom-text / "[" *dtext-no-obs "]"
      dtext-no-obs = %d33-90 / ; Printable US-ASCII
                     %d94-126  ; characters not including
                               ; "[", "]", or "\"
      qchar        = unreserved / pct-encoded / some-delims
      some-delims  = "!" / "$" / "'" / "(" / ")" / "*"
                   / "+" / "," / ";" / ":" / "@"

If I'm reading it right, the dot-atom-text bit (documented in rfc5322 appears to allow + signs in the local-part:

   atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

   atom            =   [CFWS] 1*atext [CFWS]

   dot-atom-text   =   1*atext *("." 1*atext)

To Reproduce

Steps to reproduce the behaviour:

  1. Create a file with a mailto: anchor with a + sign in the local part
  2. Scan it with htmltest
  3. See error

.htmltest.yml

DirectoryPath: public

Source files

https://justatheory.com/2013/06/sqitch-list/

Expected behaviour

A mailto: address with a + in the local part should be valid.

Actual behaviour

htmltest finds it invalid with this message:

  invalid email address (invalid format): 'sqitch-users [email protected]' --- 2013/06/sqitch-list/index.html --> mailto:[email protected]

Versions

  • OS: macOS 12.1

  • htmltest: [e.g. 0.10.1, run htmltest -v]

    $ htmltest -v
    htmltest 
    
    

Additional context

Thanks!

@theory theory added the bug label Dec 30, 2021
@theory
Copy link
Author

theory commented Dec 31, 2021

Also added

IgnoreURLs:
  - mailto:[email protected]

To my config and htmltest still reports it. http URls on the list are properly ignored.

@wjdp
Copy link
Owner

wjdp commented Jan 28, 2023

This is a problem as I'm currently using github.com/badoux/checkmail to validate emails and their regex is failing the above. Likely need to remove this and replace with a much more forgiving one.

@Qup42
Copy link

Qup42 commented Apr 7, 2024

There is actually another problem at play here:
The mail address with the + is URL decoded with net/url.QueryUnescape first. This converts the + to a space. Notice that the + changed in the beginning of the error message. This is also the reason why your IgnoreURLs entry does not work. A workaround is to encode the + as %2B. A better solution would be to use PathUnescape instead (at least for checking mailto). It does not unescape the + to a space which is common but controversial 1 and strongly discouraged for the mailto schema 2.

Footnotes

  1. https://stackoverflow.com/a/47188851/5031386

  2. https://datatracker.ietf.org/doc/html/rfc6068#section-5

@Qup42
Copy link

Qup42 commented Apr 8, 2024

I dug a bit deeper. The actual mail validation is not the problem. Mails with + are accepted just fine. Just use %2B as a workaround. The decoding of + to space by QueryUnescape is really the only problem here.

This would only concern a small part of the mailto URI handling. RFC6068 states the spaces should be percent encoded and advises against encoding spaces as +. But I also woudn't want to break something just to move closer to the standard.
What are your thoughts on switching QueryUnescape to PathUnescape?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants