Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align UI locales with Weblate locales #9707

Open
bozana opened this issue Feb 7, 2024 · 10 comments
Open

Align UI locales with Weblate locales #9707

bozana opened this issue Feb 7, 2024 · 10 comments
Assignees
Labels
Enhancement:2:Moderate A new feature or improvement that can be implemented in less than 4 weeks.
Milestone

Comments

@bozana
Copy link
Collaborator

bozana commented Feb 7, 2024

In the project CRAFT OA, in this issue #9425, the submission locales will be separated from the UI locales.
The decision was made to take Weblate locales (s. https://github.com/WeblateOrg/language-data/blob/main/languages.csv) for submission locales and to also align the UI locales.

CRAFT OA project has identified the following mapping between the current UI locales and Weblate locales:
'be@cyrillic' => 'be',
'bs' => 'bs_Latn',
'fr_FR' => 'fr',
'ja_JP' => 'ja',
'nb' => 'nb_NO',
'sr@cyrillic' => 'sr_Cyrl',
'sr@latin' => 'sr_Latn',
'uz@cyrillic' => 'uz',
'uz@latin' => 'uz_Latn',
'zh_CN' => 'zh_Hans',

PRs:

@bozana bozana added this to the 3.5.0 LTS milestone Feb 7, 2024
@jonasraoni
Copy link
Contributor

jonasraoni commented Feb 7, 2024

@bozana I think the languages.csv might be the same that you can extract from ResourceBundle::getLocales('').

I think it makes sense, I raised this concern when the locales were merged, because we would lose the "country" of the submission (@marcbria).

@ajnyga
Copy link
Collaborator

ajnyga commented Feb 9, 2024

@jonasraoni I do not think it is the same, see this comparison: https://docs.google.com/spreadsheets/d/1EFs2cr7Tw2lwR_JVIQHcnqXg91tVdJna8NMSBdLVW_Q/edit?usp=sharing

For example Belarussian in Weblate is listed with script variants and from ResourceBundle::getLocales('') script variants are missing. On the other hand from the Weblate list we are missing things like es_ES (only es mentioned) but in the ResourceBundle::getLocales('') list it is included.

@jonasraoni
Copy link
Contributor

The spreadsheet is good to take a decision! :)
I personally support using an official variant, even if it's missing one thing or another, as it's more likely to fit external systems.

@ajnyga
Copy link
Collaborator

ajnyga commented Mar 1, 2024

Here is a comparison of ResourceBundle::getLocales('') and Weblate languages.csv. The differences where far bigger that I expected. https://craft-test.online/languageComparison/

Also if this is easier to read https://www.diffchecker.com/0HTZe7UH/

@ajnyga
Copy link
Collaborator

ajnyga commented Mar 2, 2024

I think this comparison https://craft-test.online/languageComparison/comparison3.php gives a fairly good idea of the differences between ResourceBundle::getLocales('') and Weblate languages.csv:

  • The locales they have common (277) are probably the ones that are most used
  • ResourceBundle::getLocales('') is missing a lot of locales (462) which do not even have a close alternative. This applies especially to three letter locales.
  • languages.csv are missing even more locales (528) BUT most of these have an alternative.
    • The main reason for the missing locales is that languages.csv does not have that many country specific locales so most of the smaller languages only have a 2-letter or a 3-letter code. For example fi exists but fi_FI does not.
    • There are only a handful of locales missing totally, like dav, agq, sbp, yav
    • Some bigger languages like es and fr lack the country specific locale for es_ES and fr_FR and these are maybe cases where it would have sense to have the ability to specify a country variant.

For me this is a clear indiciation that the Weblate list would work better here although I do understand @jonasraoni comment about using an official variant. The important thing here is that the Weblate list locales are formed according to a standard.

Ideally we could try to include the missing country specific locales OR consider hosting an own languages.csv list.

@marcbria
Copy link
Collaborator

marcbria commented Mar 2, 2024

First, thank you AJ for taking the time to go through all this and give us easy to digest summaries.
Your patience and generosity with your time is commendable.

We've discussed it in various places, but I'll put my position on record in this thread. In short, I am convinced that whatever standard is chosen, we must guarantee three things:

  • The encoding must allow a specific code for each dialect (existing or potential).
  • The coding does NOT force us to define hierarchies between languages ("es vs es_MX").
  • We can modify this list as and when we need to (without relying on third parties).

The reasons? To promote equality between the different languages, to avoid representing them from a colonialist point of view, to encourage and facilitate the task of translators and to have total autonomy to decide, as a project, on a topic as relevant as the localisation of PKP applications.

That said, we could use the weblate list as a starting point and create our own with the changes we consider appropriate?

In this sense, I suggest eliminating any reference to codes without region and (at least in the interface) I would always use the regionalised code (es -> es_ES) instead.

The proposal I am making should be accompanied by developments in line with this:

  • That this code is set at the time of installation (allowing, for example, the administrator to make an informed decision to choose "Spanish from Spain" if the rest of the Spanish translations are not sufficiently complete), but that, for end users, the dialect is not reported but the language (only "Spanish").
  • That the default translations plugin is activated by default and allows the user to define the "fallbacks" with which to complete the translations (allowing to define es_ES > es_MX > es_US).

@ajnyga
Copy link
Collaborator

ajnyga commented Mar 4, 2024

Just to underline:

  • My comparison is there to answer the questions which list/source we should use for providing the options for the new Submission language/locale selection. Here I think it is important that we allow journals to choose whether they want to just use just a two letter code just for the language like "fr" or if they want to specify a dialect like "fr_CA" for their metadata. In any case most of the places where the metadata ends up in do not support the dialect, but of course might in the future.
  • What UI languages we provide and how we define them is another question which can of course be discussed here.

@marcbria
Copy link
Collaborator

marcbria commented Mar 4, 2024

Apologies. I was catching up on this thread (whose title talks about UI) and I forgot to go into the metadata issue.

Although I don't really have a clear opinion on this part.
Short answer: In metadata we should allow both?

I reason out loud and if I say something stupid, you let me know.

As most upstreams do not take regionality into account, I suspect that for metadata it is not so important to define it and, if the admin so wishes (I think it is something the Editor should not be able to change), we should allow languages (i.e. "fr" without region code).

But I understand that if some admin considers that it is relevant for the journal to indicate the region, from a perspective respectful of linguistic diversity, the tool should allow the region to be indicated?

In any case, I wouldn't ask about this with every submission and it should be a global parameter, to be defined once during the installation (or to be modified later by the admin... but VERY carefully).

In this sense, the code-lang selector demo you made some months ago (accompanied with a little explanation about the real impact of the decison they are making) sounds like a great solution to me, as far as it let you stop in the detail you require.

Does it make sense to you?

@bozana
Copy link
Collaborator Author

bozana commented Sep 6, 2024

I am now starting to work on this issue.
We decided to use Weblate locales, for the submission and metadata locales (s. issue ...) as well as for the rest of the system.
As far as I can see we have used sokil library to get the translated locale display names, as well as for conversion between different ISO codes. I think that now we can use the PHP intl functions (e.g. locale_get_display_name) to get the translated locale display names. So no need to use sokil for this any more. However, we will still use sokil library to convert locales into different ISO codes (mostly used in third party services).
Tagging here @jonasraoni for his oppinion, because he worked on the current Locale* implementation, and maybe sees/knows what I haven't seen yet :-)

@bozana
Copy link
Collaborator Author

bozana commented Oct 29, 2024

@jonasraoni, could you please review the pkp-lib and ojs/omp/ops PRs above? The other PRs are mostly just the renaming of the locale folders. I am not sure about the changes in the ui-library -- I adapted the code, but I am not sure where are those parts of the code used, if some of them are needed at all (maybe I can double check it with Jarda when he is back).
Thanks a lot!

bozana added a commit to bozana/omp that referenced this issue Oct 29, 2024
@bozana bozana added the Enhancement:2:Moderate A new feature or improvement that can be implemented in less than 4 weeks. label Oct 29, 2024
bozana added a commit to bozana/ojs that referenced this issue Oct 29, 2024
bozana added a commit to bozana/ops that referenced this issue Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement:2:Moderate A new feature or improvement that can be implemented in less than 4 weeks.
Projects
None yet
Development

No branches or pull requests

4 participants