Skip to content

Commit

Permalink
Merge pull request #103 from janezd/update-multilingual-doc
Browse files Browse the repository at this point in the history
Add documentation for multilingual mode
  • Loading branch information
janezd authored Dec 26, 2024
2 parents fa857ae + 4bd9fdf commit 167e6db
Show file tree
Hide file tree
Showing 9 changed files with 146 additions and 22 deletions.
6 changes: 3 additions & 3 deletions docs/command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Trubar is invoked by
: Prints help and exits.

`--conf <conf-file>`
: Specifies the [configuration file](../configuration). If not given, Trubar searches for `.trubarconfig.yaml` and `trubar-config.yaml` in current directory, directory with messages, and in source directory (for `collect` and `translate`).
: Specifies the [configuration file](configuration.md). If not given, Trubar searches for `.trubarconfig.yaml` and `trubar-config.yaml` in current directory, directory with messages, and in source directory (for `collect` and `translate`).

Action must be one of the following:

Expand All @@ -29,7 +29,7 @@ trubar collect [-h] [-p pattern] [-r removed-translations] [-q] [-n]
-s source-dir messages
```

Collects strings from the specified source tree, skipping files that don't end with `.py` or whose path includes `tests/test_`. (The latter can be changed in [configuration file](../configuration).) Strings with no effect are ignored; this is aimed at docstrings, but will also skip any other unused strings.
Collects strings from the specified source tree, skipping files that don't end with `.py` or whose path includes `tests/test_`. (The latter can be changed in [configuration file](configuration.md).) Strings with no effect are ignored; this is aimed at docstrings, but will also skip any other unused strings.

If the output file already exists, it is updated: new messages are merged into it, existing translations are kept, and obsolete messages are removed. The latter can be recorded using the option `-r`.

Expand Down Expand Up @@ -81,7 +81,7 @@ Translates files with extension .py and writes them to destination directories,
: A pattern that the file path must include to be considered.

`--static <static-files-path>`
: Copies the file from the given path into destination tree; essentially `cp -R <static-files-path> <dest-path>/<static-file-path>`. This is used, for instance, for [adding modules with target-language related features](../localization/#plural-forms), like those for plural forms. This option can be given multiple times. If given, this argument overrides `static-files` from config file.
: Copies the file from the given path into destination tree; essentially `cp -R <static-files-path> <dest-path>/<static-file-path>`. This is used, for instance, for [adding modules with target-language related features](localization.md/#plural-forms), like those for plural forms. This option can be given multiple times. If given, this argument overrides `static-files` from config file.

`-q`, `--quiet`
: Supresses output messages, except for critical. Overrides option `-v`.
Expand Down
42 changes: 41 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
## Configuration file

Trubar can be configured to replace messages with translations (single-language setup) or with lookups into tables of messages (multilingual setup).

Configuration file is a yaml file with options-value pairs, for instance

```
smart-quotes: false
auto-prefix: true
auto-import: "from orangecanvas.utils.localization.si import plsi, plsi_sz"
auto-import: "from orangecanvas.localization.si import plsi, plsi_sz"
```

If configuration is not specified, Truber looks for `.trubarconfig.yaml` and `trubar-config.yaml`,respectively, first in the current working directory and then in directory with message file, and then in source directory, as specified by `-s` argument (only for `collect` and `translate`).
Expand All @@ -29,3 +31,41 @@ The available options are

`encoding` (default: `"utf-8"`)
: Characted encoding for .jaml files, such as `"utf-8"` or `"cp-1252"`.

### Multilingual setup

In a multilingual setup, the configuration file includes a section with languages. Each language is specified by a key, which is the language code, and a dictionary with options. Options include a name of the language, an international name, and any language-specific auto-import directives. For instance

`name` (required)
: The native name of the language. Put into the table of messages at index 0.

`international-name` (required)
: The international name of the language. Put into the table of messages at index 1.

`auto-import` (default: none)
: Same as `auto-import` in single-language setup, but for the specific language. This text (if any) is added to other auto imports (if any).

`original` (default: false)
: If set to `true`, the language is considered the original language of the source code.

### Example

This is a multilingual setup for two languages that is used in Orange at the time of writing this document.

```yaml
languages:
en:
name: English
original: true
si:
name: Slovenščina
international-name: Slovenian
auto-import: from orangecanvas.localization.si import plsi, plsi_sz, z_besedo # pylint: disable=wrong-import-order
auto-import: |2
from orangecanvas.localization import Translator # pylint: disable=wrong-import-order
_tr = Translator("Orange", "biolab.si", "Orange")
del Translator
encoding: "utf-8"
```
For more on auto-imports, see the section on [multilingual use](multilingual.md).
12 changes: 6 additions & 6 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Note that, unlike in the gettext framework, messages are not "marked" for transl

### Collecting messages

To collect all strings in the project, use [collect](../command-line/#collect).
To collect all strings in the project, use [collect](command-line.md/#collect).

```
trubar collect -s code/sample sample.jaml
Expand All @@ -76,7 +76,7 @@ farm/pigs.py:
{self.n} little pigs went for a walk: null
```

See the section about [Message files](../message-files) for details about the file format.
See the section about [Message files](message-files.md) for details about the file format.

### Translating messages

Expand All @@ -97,16 +97,16 @@ farm/pigs.py:
{self.n} little pigs went for a walk: {self.n} prašičkov se je šlo sprehajat.
```

We translated `__main__` as `false`, which indicates that this string must not be translated. Other options are explained [later](../message-files/#translations).
We translated `__main__` as `false`, which indicates that this string must not be translated. Other options are explained [later](message-files.md#translations).

### Applying translations

In most scenarios, we first need to prepare a copy of the entire project, because Trubar will only copy the files within its scan range. Suppose that `../project_copy` contains such a copy.
In the simplest scenario, we want to produce new sources in which the original strings are replaced by translations. We first prepare a copy of the entire project because Trubar will only copy the files within the scanned directories. Suppose that `../project_copy` contains such a copy.

Now run [translate](../command-line/#translate).
Now run [translate](command-line.md#translate).

```
trubar translate -s code/sample -d ../project_copy/code/sample sample.jaml
```

That's it.
In most cases, we want to also add a [configuration file](configuration.md).
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Trubar

A tool for translation and localization of Python programs via modification of source files.
A tool for translation and localization of Python programs via modification of source files. It replaces the original strings either with translations to produce a sources in a different language, or with lookups into table of translations.

Trubar supports f-strings and does not require any changes to the original source code, such as marking strings for translation.

See [Getting Started](getting-started) for a simple introduction.
See [Getting Started](getting-started.md) for a simple introduction.

## Installation

Expand Down
8 changes: 4 additions & 4 deletions docs/localization.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ If the original source contains an f-string, Trubar will keep the f-prefix in tr
If the original string is not an f-string but the translation contains braces and prefixing this string with f- makes it a syntactically valid f-string, Trubar will add an f-prefix unless:

- the original string already included braces (so this may be a pattern for `str.format`)
- or this behaviour is explicitly disabled in [configuration](../configuration) by setting `auto-prefix: false`.
- or this behaviour is explicitly disabled in [configuration](configuration.md) by setting `auto-prefix: false`.


### Plural forms
Expand Down Expand Up @@ -117,12 +117,12 @@ The same mechanism can be used for other language quirks.

The above examples requires importing the localization functions, such as `plsi` and `plsi_sz`.

First, the translated sources must include the necessary module, which does not exist in the original source. To this end, we need to prepare a directory with static files. In our case, we can have a directory named, for instance `si-local`, containing `si-local/utils/localization/__init__.py`. When translating, we instruct Trubar to copy this into translated source tree by adding an option `--static si-local` to the [`translate` action](../command-line/#translate).
First, the translated sources must include the necessary module, which does not exist in the original source. To this end, we need to prepare a directory with static files. In our case, we can have a directory named, for instance `si-local`, containing `si-local/utils/localization/__init__.py`. When translating, we instruct Trubar to copy this into translated source tree by adding an option `--static si-local` to the [`translate` action](command-line.md#translate).

Second, all translated source files must include the necessary import. We do this using a directive in [configuration file](../configuration):
Second, all translated source files must include the necessary import. We do this using a directive in [configuration file](configuration.md):

```
auto-import: "from orangecanvas.utils.localization.si import plsi, plsi_sz"
auto-import: "from orangecanvas.localization.si import plsi, plsi_sz"
```

Trubar will prepend this line to the beginning of all files with any translations.
Expand Down
4 changes: 2 additions & 2 deletions docs/message-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The first-level keys are file names, with their paths relative to the root passe
Lower levels have keys that

- start with `def` or `class`, and are followed by a subtree that starts in the next line,
- or represents a potentially translatable string, followed by the translation in that same line (except when using [blocks](#blocks)).
- or represents a potentially translatable string, followed by the translation in that same line.

There is no indication about whether a string is an f-string or not, neither does it show what kind of quotes are used in the source, because none of this matters.

Expand All @@ -26,7 +26,7 @@ Translator can treat a string in approximately three ways.
- Mark it with `true`, if the strings that could be translated, but doesn't need it for this particular language or culture. A common example would be symbols like `"©️"`.
- Leave it `null` until (s)he figures out what to do with it.

The difference between `true` and `false` is important only when using this translation to [prepare templates](../scenarios/#preparing-templates) for translations into other languages.
The difference between `true` and `false` is important only when using this translation to [prepare templates](scenarios.md#preparing-templates) for translations into other languages.

### Comments

Expand Down
82 changes: 82 additions & 0 deletions docs/multilingual.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
## Setup for multilingual use

Implementing a multilingual setup requires understanding of the code produced by Trubar.

In single-language setup, strings are replaced by translated strings and f-strings pose no problem. Multilingual setup uses string tables for different languages. F-strings cannot be stored in such tables because they are syntactic elements and not Python objects. Instead, Trubar stores a string that contains an f-string. When the string needs to be used, it compiles and evaluates it in the local context.

### A slightly simplified example

The following example is based on the Orange code base. A similar setup would be used in other projects.

We first need to understand how Trubar modifies the sources in multilingual mode.

- A string `"Data Table"` is replaced by `_tr.m[1651]`. Neither the original string nor any of its translations are f-strings, so the string is replaced by lookup; the element at index 1651 in the English message table is `"Data Table"` and in the Slovenian table it is `"Tabela s podatki"`. We will tell more about where the `_tr` comes from and what it contains later.
- A string `f" ({perc:.1f} % missing data)"))` is replaced by `_tr.e(_tr.c(1717)`. The string at index 1717 in the English message table is `"f\" ({perc:.1f} % missing data)\""` and the Slovenian translation is `"f\" ({perc:.1f} % manjkajočih podatkov)\""`. Note that this is not a string but a string that contains and f-string.

For this to work, the `_tr` must be an object with the following attributes:

- `m` is a list of strings, where the index corresponds to the index in the message table.
- `e` is a function that evaluates a string; in short, `e` is `eval`.
- `c` is a function that compiles a string at the given index; in short, `c` is `compile`.

Trubar provides neither `_tr` nor its methods, and it doesn't import it because this is application specific. Orange's configuration for Trubar has an auto-import directive that inserts the following lines into each source file:

```python
from orangecanvas.localization import Translator # pylint: disable=wrong-import-order
_tr = Translator()
del Translator
```
Other applications would import a similar class from another location and use different arguments for its constructor. The end result must be an object named `_tr` with the requires methods.

The `Translator` class looks roughly like this:

```python
import json

class Translator:
def __init__(self):
path = "i18n/slovenian.jaml" # Replace this with the actual path
with open(path) as handle:
# Note that the actual code is somewhat more complex; see below
self.m = json.load(handle)

e = eval

def c(self, idx, *_):
return compile(self.m[idx], '<string>', 'eval')
```

In Orange, the `Translator`'s constructor requires Qt-related arguments, so the code from auto-import is actually

```python
_tr = Translator("Orange", "biolab.si", "Orange")
```

and the constructor uses these arguments to retrieve the current language from the settings and locates the appropriate file and reads it into `self.m`.

### The actual code

The above description is simplified for clarity. Trubar doesn't replace `"Data Table"` by `tr.m[1651]` but by `tr.m[1651, "Data Table"]`; similarly for f-strings. The second index, `"Data Table"`, is not used and is there only as a comment for any developers checking the translated sources. Translator doesn't load the message table with

```python
self.m = json.load(handle)
```

but wraps the list into a class `_list`:

```python
self.m = json.load(handle)
```

where `_list` is

```python
class _list(list):
# Accept extra argument to allow for the original string
def __getitem__(self, item):
if isinstance(item, tuple):
item = item[0]
return super().__getitem__(item)
```

Note again that Trubar doesn't provide this code, but your application would probably use similar code. Find the complete example at [Orange Canvas Core's Github](https://github.com/biolab/orange-canvas-core/blob/master/orangecanvas/localization/__init__.py).
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@ markdown_extensions:
- markdown.extensions.def_list

nav:
- Home: index.md
- Getting Started: getting-started.md
- Message Files: message-files.md
- Common Scenarios: scenarios.md
- Multilingual Use: multilingual.md
- Command Line: command-line.md
- Configuration: configuration.md
- Localization Issues: localization.md
Expand Down
8 changes: 4 additions & 4 deletions trubar/tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,15 +100,15 @@ def test_languages(self):
si:
name: Slovenščina
international-name: Slovenian
auto-import: from orangecanvas.utils.localization.si import plsi
auto-import: from orangecanvas.localization.si import plsi
en:
name: English
original: true
ua:
international-name: Ukrainian
auto-import: import grain
name: Українська
auto-import: from orangecanvas.utils.localization import pl
auto-import: from orangecanvas.localization import pl
""")
config = Configuration()
with patch("os.path.exists",
Expand All @@ -131,9 +131,9 @@ def test_languages(self):
# Auto-imports are correct
self.assertEqual(
set(config.auto_import),
{'from orangecanvas.utils.localization.si import plsi',
{'from orangecanvas.localization.si import plsi',
'import grain',
'from orangecanvas.utils.localization import pl'})
'from orangecanvas.localization import pl'})
# Base dir is set correctly
base_dir, _ = os.path.split(self.fn)
self.assertEqual(config.base_dir, base_dir)
Expand Down

0 comments on commit 167e6db

Please sign in to comment.