Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better i18n with gettext: use class-based API #2706

Open
carmenbianca opened this issue Apr 16, 2024 · 3 comments
Open

Better i18n with gettext: use class-based API #2706

carmenbianca opened this issue Apr 16, 2024 · 3 comments

Comments

@carmenbianca
Copy link

carmenbianca commented Apr 16, 2024

Hi lovely Click maintainers,

Currently, Click implements gettext using the classic GNU gettext API. That looks like this:

from gettext import _

print(_("Hello, world!"))

This API depends on a global state in the gettext module. By calling gettext.textdomain(), the active translation domain is changed for all Python modules that use the classic GNU gettext API.

This side effect is usually desirable, except when your module is imported by another module as a library. So you usually don't want to call gettext.textdomain() without putting it behind some sort of function call. With argparse, this is easy: put it in your main function before you even create the ArgumentParser object. With Click, I'm not sure this is possible:

  • Your main group/command is decorated with all manner of Click magic and its contents may not actually be run (e.g. --help).
  • There is no way to pre-hook a group/command with a function call that I am aware of.

So you end up having to call gettext.textdomain() on import of your module containing your Click groups/commands.

We can fix that by switching to the class-based API. Because Click will still need to support the old API as well for backwards compatibility, my proposal looks a little as follows. Create a module click.i18n with the following contents (simplified):

import gettext as _gettext_module

TRANSLATIONS = None

def gettext(message):
    if TRANSLATIONS is None:
        return _gettext_module.gettext(message)
    return TRANSLATIONS.gettext(message)

# alias
_ = gettext

Now, elsewhere in Click, you replace all from gettext import _ with from .i18n import _.

Subsequently, we can create a function install_translations(translations) in i18n.py that replaces the TRANSLATIONS global constant with an instantiated GNUTranslations object. This function would still need to be called before the consumer's main function, but it wouldn't change the gettext global state—it would only change Click's. Which, as far perfectionism goes, is probably tolerable. It would be better still if there was a pre-hook, but this is fine.

Furthermore, the consumer could use different domains for Click's TRANSLATIONS object and their own, allowing them to separate their own translations from Click's, and hypothetically reuse the Click translations in other projects.

In fact, having done this plumbing, Click could even ship its own translation strings, getting rid of duplication efforts of translating the same Click strings. Click's own translations could then be activated using e.g. install_click_translations() without any arguments.

In summary, the problems solved by this:

  • Importing the Click consumer's module no longer changes the global state of the gettext module.
  • Users can separate their own module's translations from Click's.
  • Click would be enabled to provide its own translations to reduce duplication of efforts.

I am not aware of other ways to achieve the above that do not require changes to Click. Adding a pre-hook to groups/commands might partially address the problem.

I am willing to make a PR if this issue is validated.


I wrote a blog post here that provides more context on how I use gettext + Click (+ some other components). It has more context than is necessary to understand this issue.

@carmenbianca
Copy link
Author

Hi click maintainers, I am still ready to help with this issue.

@davidism
Copy link
Member

I'm having some trouble following this, although I think the general idea is "use new gettext local provider instead of global provider"? Is using a "library global" TRANSLATIONS variable and falling back to "gettext global" if it's not set a standard pattern for translations?

I am not aware of other ways to achieve the above that do not require changes to Click.

If we changed Click in some way, would that make the implementation easier or better? I'm open to hearing what changes might be needed.

@carmenbianca
Copy link
Author

carmenbianca commented Oct 22, 2024

Hi @davidism ! I will explain the full context. It's a long answer; summary and answers to your questions at the end.

The classic GNU gettext API depends on a global state in the Python gettext library. If you call gettext.gettext("Hello, world!") (equal to _("Hello, world!")), then gettext has no idea where to get a translation for that string. So before you ever run gettext.gettext() in code, you have to register where to find the translations for your string with the library. You do this by running this snippet (slightly simplified, but entirely correct):

# The translations located at 'path/to/translations' now have the domain
# (read: alias) 'your_module'.
gettext.bindtextdomain("your_module", "path/to/translations")
# Activate 'your_module' as the currently used domain. Henceforth, when
# `gettext.gettext()` is called, it tries to find the translation in
# this domain. It knows which language to use from the user's ENV.
gettext.textdomain("your_module")

(As an aside: You can have multiple domains sourced from different paths, BUT you have to make very sure to constantly call gettext.textdomain() to switch context at the right times.)

Now for Click in particular, the tricky bit is to call gettext.textdomain() at the right time. Important context is that I have included all Click strings and translations in my 'path/to/translations'.

So let's say I have this code:

# Can't wrap docstrings in `_()`, so do this here.
_HELP = _("...")

@click.group(name="your_module", help=_HELP)
def main():
    gettext.bindtextdomain("your_module", "path/to/translations")
    gettext.textdomain("your_module")

If I now run your_module --help, three things (don't) happen:

  • _HELP is not translated, because gettext.gettext() was called BEFORE gettext.textdomain().
  • click's strings such as --help Show this message and exit. in the output are not translated, because the Click library does its stuff BEFORE running the main function.
  • In fact, I'm not sure main is even run here.

So we are forced to move the gettext.textdomain() call before all of that. This is fine, kind of, but also unfortunate. This now means that importing the module which contains the main function changes the global state of the gettext module. We could imagine a scenario where someone imports your_module after doing their own gettext.textdomain() stuff, but now their gettext global state is all wrong.

If we keep the classic API, then the following pseudocode might help to alleviate those problems:

def setup_gettext():
    gettext.bindtextdomain("your_module", "path/to/translations")
    gettext.textdomain("your_module")

@click.group(
    name="your_module",
    # We assume that evaluating this lambda is delayed until AFTER
    # the prehook is run.
    help=lambda: _("..."),
    prehook=setup_gettext,
)
def main():
    pass

Here, prehook is run before everything else in Click. This means that the Click strings will be correctly translated, and if we correctly jig help to allow a callable, our help string will also be correctly translated.

Implementing this is more effort than the alternative, though.

The class-based Python gettext API does not store any global state. Instead, all of the necessary state is placed in a GNUTranslations object. This looks like this:

# Put the state in the object. The "your_module" string is a bit superfluous
# here, but apparently it is needed.
TRANSLATIONS: GNUTranslations = gettext.translation("your_module", "path/to/translations")

# Instead of globally activating "your_module" as the gettext domain, just
# ask the object to translate stuff.
print(TRANSLATIONS.gettext("Hello, world!"))

Now obviously, this GNUTranslations object needs to be instantiated somewhere. If we instantiate it in the click library itself, then we have a problem: which directory does Click get its translation strings from? There are no translations shipped with Click. And also, users of the Click library already have their own translations of the Click strings that they probably want to use. And also, users already use the gettext.textdomain() call, which wouldn't work if Click switched wholesale to the class-based API.

To keep compatibility, and to offload the need to translate strings downstream, I proposed the following code in click.i18n:

import gettext as _gettext_module

TRANSLATIONS: _gettext_module.GNUTranslations | None = None

def gettext(message):
    if TRANSLATIONS is None:
        return _gettext_module.gettext(message)
    return TRANSLATIONS.gettext(message)

# alias
_ = gettext

If the rest of the Click library then does from .i18n import _ instead of from gettext import gettext as _, the following happens:

  • If the user does nothing, then the translations will simply use the classic GNU gettext API, and the user will have to run gettext.textdomain() to get any use out of it.
  • If the user populates click.i18n.TRANSLATIONS with some object, then the classic GNU gettext API is ignored, and all translations are sourced from that object.

So using the prior example, that looks like this:

click.i18n.TRANSLATIONS = gettext.translation("click", "path/to/click/translations")
MY_TRANSLATIONS = gettext.translation("your_module", "path/to/my/translations")

_HELP = MY_TRANSLATIONS.gettext("...")

@click.group(name="your_module", help=_HELP)
def main():
    pass

Click gets its translations from its own object, your_module has its own separate translations, and everything is great and Just Works.

Manually setting an object to click.i18n.TRANSLATIONS isn't super amazing, though, so you could envision creating a convenience function click.i18n.install_translations(obj: GNUTranslations) that does this for you. TRANSLATIONS could then become a private global 'constant'.

In fact, once this is set up, Click could even begin shipping its own translations, to reduce the duplicated efforts downstream. Because the API is class-based, you don't constantly have to call gettext.textdomain() to swap between active domains. This might look a little like this in click.i18n:

_TRANSLATIONS: GNUTranslations | None = None

def install_translations(translations: GNUTranslations | None) -> GNUTranslations:
    if translations is None:
        translations = gettext.translation(
            "click",
            # resolves to `click/locale`, wherever `click` is installed.
            # There would need to be valid translations in this directory,
            # obviously.
            os.path.join(os.path.dirname(__file__), "locale"),
        )
    _TRANSLATIONS = translations
    return translations

In summary:

  • Using the class-based API gets rid of unwanted global state that may cause bugs.
  • There must still be compatibility with the classic API, however.
  • Using the class-based API allows Click to ship its own translations.

To answer your questions precisely:

I think the general idea is "use new gettext local provider instead of global provider"?

Both, for backwards-compatibility reasons. The global provider is the fallback if nothing is done by the user, which matches the status quo.

Is using a "library global" TRANSLATIONS variable and falling back to "gettext global" if it's not set a standard pattern for translations?

No. Common practice is this:

  • If a library provides no translations itself, it uses the classic API, and depends on downstream to provide translations somehow. (current click)
  • If a library provides its own translations, it uses the "library-global" TRANSLATIONS object via the class-based API. This will Just Work™ for users of the library, without them needing to do anything.

Click is unique here because backwards compatibility is desirable (I think; maybe I'm mistaken), and because downstream may want to ship their own translations.

If we changed Click in some way, would that make the implementation easier or better?

I wrote about prehook above. But the prehook workaround is not needed if the class-based API is used by Click.


I hope this helps! Thanks for your maintainer work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants