Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast isdigit argument as unsigned char #63

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

afresh1
Copy link

@afresh1 afresh1 commented Dec 25, 2022

Theo noticed that these isdigit calls were not obviously correct. He suggested reporting this upstream. The only portable safe way is to always cast to (unsigned char).

http://man.openbsd.org/isdigit#CAVEATS

The argument c must be EOF or representable as an unsigned char;
otherwise, the result is undefined.

Originally filed as Perl/perl5#20649

The only portable safe way is to always cast to (unsigned char).

http://man.openbsd.org/isdigit#CAVEATS
> The argument c must be EOF or representable as an unsigned char;
> otherwise, the result is undefined.
@khwilliamson
Copy link
Member

I suspect that isdigit itself should not be used here. I'll quote Perl/perl5#20649 (comment) here:

My question is why is the code using isdigit() in the first place? To quote perllocale

By default, Perl itself (outside the POSIX module)
ignores the current locale. The "use locale"
pragma tells Perl to use the current locale for some operations.

People are not generally expecting locale-dependent behaviour unless it is explicitly asked for, or documented to always be there, as just a few POSIX module operations do.

Also note the contradiction in the OpenBSD man page that this ticket links to

The complete list of decimal digits is 0 and 1–9, in any locale

and later

On systems supporting non-ASCII single-byte character encodings, different c arguments may correspond to the digits, and the results of isdigit() may depend on the LC_CTYPE locale

ISO 8859-11 contains the Thai characters for 0-9 at positions F0-F9. (Unicode U+0E50 - U+0E59). This is the only single-byte locale I know of that has a second set of digits in it.

Perl furnishes the macros isDIGIT for when not in use locale and isDIGIT_LC for when in it. I have been considering writing a set of macros that XS code can call to DTRT without having to worry about being in use locale scope or not. I haven't come up with a good name paradigm. Suggestions welcome.

Also note that OpenBSD is probably not the best place to find accurate locale-related documentation. This commit message gives a decent explanation, in my opinion:

commit be552ced7556db8582622a745067e1b266e8ca91
Author: Karl Williamson [email protected]
Date: Wed Aug 24 20:47:55 2022 -0600

 run/locale.t: skip illegal locale test when all are legal
  • OpenBSD believes that locales are so inherently insecure that it only
    internally has two locales: C and C.UTF-8, which it knows how to treat
    securely.

    However, to preserve at least the illusion of portability, the locale
    setting operations do not fail whatever garbage is passed as the locale
    name. (There may be unacceptable strings, but any reasonable locale
    name and a lot of unreasonable ones are accepted.) If the input name
    looks like it was meant to have "UTF-8" in its name, the locale is set
    to C.UTF-8. Otherwise, it is set to C. Early releases had bugs in
    switching between the two.

@afresh1
Copy link
Author

afresh1 commented Sep 18, 2023

Whether isdigit() is the correct hammer for this nail is a separate question. As discussed over in the related Bzip PR:

As it happens, upstream already has a ticket for this issue -- see https://sourceware.org/bugzilla/show_bug.cgi?id=28283. It was created in August 2021. I'll add a comment to the ticket.

Also - found a good reference for making the change -- https://wiki.sei.cmu.edu/confluence/display/c/STR37-C.+Arguments+to+character-handling+functions+must+be+representable+as+an+unsigned+char

It seems fairly clear that using isdigit() without the cast is wrong everywhere, not just in OpenBSD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants