Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: extend locale api to better reflect posix #2

Closed
375gnu opened this issue Sep 15, 2013 · 16 comments
Closed

Proposal: extend locale api to better reflect posix #2

375gnu opened this issue Sep 15, 2013 · 16 comments

Comments

@375gnu
Copy link

375gnu commented Sep 15, 2013

It looks for me that current locale tries to be the GCD for all supported platforms, so it behaves like the simplest platform (CGI) behaves. But other platforms may have richer API, i.e. in case of CGI we have only one value for all locale categories, but in POSIX world we may have different values for different categories like LC_MESSAGES, LC_CTYPE, etc.

I didn't think of this when I proposed you this pull request (mutoh#5), especially commit mutoh@0ef254f, which fixes issues with encoding handling (LC_CTYPE), but breaks handling of LC_MESSAGES (for example, http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=690572).

So currently I see three possibilities:

  1. revert commits with LC_CTYPE and break charset handling again (simple but bad)
  2. like 1, but change Locale::Driver::Env.charset to check environmental variables LC_ALL and LC_CTYPE (a bit harder and a bit better);
  3. extend API to better reflect POSIX (the hardest but the best, may change API in incompatible way)

The case 2 may be enough for the short term perspective but for long term case 3 looks better but it may require to refactor/rearchitect the whole gem.

@375gnu
Copy link
Author

375gnu commented Sep 15, 2013

Case 2 may look somehow like that: 375gnu/locale@master...test

@kou
Copy link
Member

kou commented Sep 16, 2013

I'm sorry but I don't understand yet.

Is the following OK?

  • #charset should check LC_ALL, LC_CTYPE and LANG.
  • #locale should check LC_ALL, LC_MESSAGES and LANG.

@375gnu
Copy link
Author

375gnu commented Sep 16, 2013

On 9/16/13, Kouhei Sutou [email protected] wrote:

I'm sorry but I don't understand yet.
Is the following OK?

  • #charset should check LC_ALL, LC_CTYPE and LANG.
  • #locale should check LC_ALL, LC_MESSAGESandLANG`.

Yes, but only in their simplest forms. Real situation requires a bit
more complex approach. For example, Locale.current returns
Locale::Tag, which has it's own #charset, so
Locale::Driver::Env#locale should redefine charset value using values
of LC_CTYPE or LANG (I'm talking about current state and my diff
above).

@kou
Copy link
Member

kou commented Sep 17, 2013

Thanks for confirming it. But I'm still confused...

Do you know a reference for the desired (expected) locale related behavior? locale(7)? I want the specification about locale like the XML specification from W3C for XML.

Does http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=690572 show the behavior expected by the specification? Or does it just show a behavior by the bug reporter?

If I understand ideal behavior, I can fix the current behavior.

@375gnu
Copy link
Author

375gnu commented Sep 17, 2013

On 9/17/13, Kouhei Sutou [email protected] wrote:

Do you know a reference for the desired (expected) locale related behavior?
locale(7)? I want the
specification about locale like the XML specification from
W3C
for XML.

POSIX:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html
GNU Libc:
https://www.gnu.org/software/libc/manual/html_node/Locale-Categories.html
GNU Gettext (becase ruby-locate handles LANGUAGE):
https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html

Does http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=690572 show the
behavior expected by the specification? Or does it just show a behavior by
the bug reporter?

It is behavior expected by the specification.

@kou
Copy link
Member

kou commented Sep 17, 2013

Thanks!
I'll read them!

@kou kou closed this as completed in 6916573 Sep 19, 2013
@kou
Copy link
Member

kou commented Sep 19, 2013

I've pushed fix of the problem. Could you try it?

@375gnu
Copy link
Author

375gnu commented Sep 19, 2013

On 9/19/13, Kouhei Sutou [email protected] wrote:

I've pushed fix of the problem. Could you try it?

Good:
ruby -Ilib -rlocale -e 'puts Locale.current.charset' UTF-8

Bad:
LC_CTYPE=C ruby -Ilib -rlocale -e 'puts Locale.current.charset' UTF-8

Locale::Driver::Env.charset is not called every time you call
something.charset, because that someting may have its own
charset method like Locale::Tag does.

Please see my branch test
(375gnu/locale@master...test) it contains
the code with redefinition of charset (using values from LC_CTYPE or
LANG).

@kou
Copy link
Member

kou commented Sep 19, 2013

For bad case, why is Locale.current.charset used instead of Locale.charset?
gettext gem uses Locale.charset for output encoding: https://github.com/ruby-gettext/gettext/blob/master/lib/gettext/text_domain.rb#L166

What use case do you assume?

@kou
Copy link
Member

kou commented Sep 19, 2013

I forgot to mention...
Thanks for confirming it!

@375gnu
Copy link
Author

375gnu commented Sep 19, 2013

On 9/19/13, Kouhei Sutou [email protected] wrote:

For bad case, why is Locale.current.charset used instead of
Locale.charset?
gettext gem uses Locale.charset for output encoding:
https://github.com/ruby-gettext/gettext/blob/master/lib/gettext/text_domain.rb#L166

What use case do you assume?

May be I (and not only I;)) have misunderstanding in ruby-locale API.

Then, if in the most cases the correct way of using it is just
Locale.charset instead of Locale.current.charset, then your patch work
good.

@375gnu
Copy link
Author

375gnu commented Sep 19, 2013

On 9/19/13, Hleb Valoshka [email protected] wrote:

May be I (and not only I;)) have misunderstanding in ruby-locale API.

As an example, sup-mail uses Locale.current.charset to guest current
charset: https://github.com/sup-heliotrope/sup/blob/develop/lib/sup.rb#L318

kou added a commit that referenced this issue Sep 20, 2013
Becauese existing library such as sub-mail uses the API.

GitHub: fix #2

Debian Bug: #690572

Reported by Stefano Zacchiroli. Thanks!!!
Reported by Hleb Valoshka. Thanks!!!
@kou
Copy link
Member

kou commented Sep 20, 2013

As an example, sup-mail uses Locale.current.charset to guest current charset: https://github.com/sup-heliotrope/sup/blob/develop/lib/sup.rb#L318

Oh...
OK. Now, Locale.current.charset uses LC_CTYPE. Please try again.

@375gnu
Copy link
Author

375gnu commented Sep 24, 2013

On 9/20/13, Kouhei Sutou [email protected] wrote:

OK. Now, Locale.current.charset uses LC_CTYPE. Please try again.

I've tried. It works as expected. Thanks!

@kou
Copy link
Member

kou commented Sep 25, 2013

Thanks for confirming it!
I'll release a new version in a few weeks.

@kou
Copy link
Member

kou commented Sep 29, 2013

Ah, sorry.
The feature was already included in the current release. (2.0.9)
I don't need to release a new version. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants