Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PLURAL support to TranslateWiki integration #77

Open
mzeinstra opened this issue May 6, 2022 · 26 comments
Open

Add PLURAL support to TranslateWiki integration #77

mzeinstra opened this issue May 6, 2022 · 26 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@mzeinstra
Copy link
Collaborator

This idea for further developing this EDTF library is based on the question of #76

Currently, the library does not support PLURAL messages from translatewiki, whereas this would enhance the readability of humanised EDTF strings.

Translatewiki does support PLURAL messages see: https://translatewiki.net/wiki/Plural

internationalisation was build from scratch, see: https://github.com/ProfessionalWiki/EDTF/tree/master/src/PackagePrivate/Humanizer/Internationalization

@mzeinstra mzeinstra added enhancement New feature or request help wanted Extra attention is needed labels May 6, 2022
@mzeinstra
Copy link
Collaborator Author

@verdy-p if you have time we would love to see if you can assist the project by seeing if you can refactor the internationalisation including the PLURAL possibilities of TranslateWiki.

@JeroenDeDauw JeroenDeDauw changed the title Refactor TranslateWiki integration to support PLURAL Add PLURAL support to TranslateWiki integration May 6, 2022
@JeroenDeDauw
Copy link
Member

JeroenDeDauw commented Dec 12, 2022

@mzeinstra are there any specific messages for which PLURAL is needed?

One example:

"edtf-day-and-year": "$1e{{PLURAL:$1|r|}} jour d’un mois inconnu de $2",

@mzeinstra
Copy link
Collaborator Author

That is a good question.

I presume that those strings that handle days have the priority. As they are presented as numbers (first, second of month).

I'll start a conversation with our French and German speaking project members to see if they think that some of them need to have pluralisation enabled.

@verdy-p
Copy link

verdy-p commented Dec 13, 2022 via email

@mzeinstra
Copy link
Collaborator Author

@verdy-p It would be helpful to see which of these needs to be plural:

Or are you only talking about the numbering of the months that need to differentiate between 1st, 2nd, etc.?

Autour du $1 (incertain),
Autour du $1,
$1 (incertain),
Printemps,
Été,
Automne,
Hiver,
Printemps (hémisphère nord),
Été (hémisphère nord),
Automne (hémisphère nord),
Hiver (hémisphère nord),
Printemps (hémisphère sud),
Été (hémisphère sud),
Automne (hémisphère sud),
Hiver (hémisphère sud),
Premier trimestre,
Deuxième trimestre,
Troisième trimestre,
Quatrième trimestre,
Premier quadrimestre,
Deuxième quatrimestre,
Troisième quadrimestre,
Premier semestre,
Second semestre,
$3 $2 $1,
$1 $2,
jour $1 d’un mois inconnu de $2,
$1 avant J.-C.,
Année $1 avant J.-C.,
Année $1,
janvier,
février,
mars,
avril,
mai,
juin,
juillet,
août,
septembre,
octobre,
novembre,
décembre,
De $1 à $2,
Depuis $1 (fin indéterminée),
Jusqu’à $1,
Depuis $1 jusqu’à une fin inconnue,
Depuis un début inconnu jusqu’à $1,
heure locale,
Ensemble vide,
Tous celles-ci :,
Une de celles-ci :,
L’année $1 et toutes les précédentes,
L’année $1 ou la précédente,
L’année $1 et toutes les suivantes,
L’année $1 ou la suivante,
$1 et tous les mois précédents,
$1 et tous les mois suivants,
$1 ou le mois précédent,
$1 ou le mois suivant,
$1 et toutes les dates précédentes,
$1 et toutes les dates suivantes,
$1 ou une date précédente,
$1 ou une date suivante,
$1 ou une saison précédente,
$1 ou une saison suivante,
$1 et toutes les saisons précédentes,
$1 et toutes les saisons suivantes,
$1 et $2,
$1 ou $2,
Tous celles-ci : $1,
Une de celles-ci : $1,
Toutes les années de $1 à $2,
Tous les mois de $1 à $2,
Tous les jours du $1 au $2,
$1, $2 ou une année entre les deux,
$1, $2 ou un mois entre les deux,
$1, $2 ou un jour entre les deux

@verdy-p
Copy link

verdy-p commented Dec 14, 2022 via email

@verdy-p
Copy link

verdy-p commented Dec 14, 2022 via email

@verdy-p
Copy link

verdy-p commented Dec 14, 2022 via email

@mzeinstra
Copy link
Collaborator Author

Thank you @verdy-p for providing additional context.

To get this as concrete as possible for the developers, would you mind going to https://edtf.wikibase.wiki/wiki/Property:P1 (switch language to French in upper rights corner) to see which of the humanisations need pluralisation for french?

Really concrete examples help the developers to focus on those parts of the tool to increase the humanisation of the strings.

For example, is this a good humanisation with pluralisation?

2004-06-~01/2004-06-~20
(De autour du 1er juin 2004 à autour du 20 juin 2004)

@JeroenDeDauw
Copy link
Member

@mzeinstra translators are now able to use PLURAL in their translations (code change). Is there anything else that needs to happen before we can close this task?

@mzeinstra
Copy link
Collaborator Author

mzeinstra commented Jan 6, 2023

@JeroenDeDauw is this also available on Translatewiki? If so we'll ask our team-members to have a look at it for French.

Is it also possible to document the new feature in the main and on the relevant page on translatewiki.

@JeroenDeDauw
Copy link
Member

Yes, translators on TW can now use PLURAL in their translations. Though note that for various reasons, the original ordinal suffixing (ie 1st, 2nd, 3rd, etc) is still used. So translators should not use PLURAL for these values.

What is the relevant page on TW?

@mzeinstra
Copy link
Collaborator Author

I assumed there was a project page on TW, like https://translatewiki.net/wiki/Translating:OpenStreetMap, but there is none.

Would that be https://translatewiki.net/wiki/Translating:EDTF then?

@kghbln
Copy link
Member

kghbln commented Jan 6, 2023

We are already getting translations for this lib via twn. All messages available here are also available on twn: https://translatewiki.net/w/i.php?title=Special:Translate&language=de&group=mwgithub-edtf&filter=&action=translate

Not sure / cannot tell, if twn likes to create an individual project for this lib.

@mzeinstra
Copy link
Collaborator Author

@kghbln Yes we have a lot of translations already, however if I understand the change correctly we now allow all parameters to have pluralisation, right? That means that we have to communicate that to that community that this is possible.

@JeroenDeDauw
Copy link
Member

That is correct @mzeinstra

@verdy-p
Copy link

verdy-p commented Jan 15, 2023

For example, is this a good humanisation with pluralisation?
2004-06-~01/2004-06-~20
(De autour du 1er juin 2004 à autour du 20 juin 2004)

No, it is not correct because of the required contraction of "de autour" into "d’autour". This means that "de $1 à $2" mutates into "d’$1 à $2"

But note that "autour du" (meaning "circa") has also changed the leading preposition. When we have a simple date in $1 (without "circa"), the date takes a article "le" (that MUST be contracted with the previous "de" into "du"). This means also that there's a mutation from "de $1 à $2" into "du $1 au $2" or "d’$1 à $2", depending on the value of $1. But there's also a variation of the second proposition "à" (or "jusqu’à", preferred when $1 contains both a time, because "à" translates both "to" for the end of the interval and "at" for setting a time), into "au" (or "jusqu’au", when $2 starts by a date and not a time).

For all these reasons the English format "from $1 to $2" does not have a a single translation: they depend on the precision of values in $1 and $2. In CLDR data, there are separate items for translating date/time intervals depending on precision of values. Look at Unicode CLDR data: this is much more accurate than what EDTF provides and still cannot translate correctly.

In fact I really think that EDTF is absolutely not needed at all: everything it does (and partially documents) is FULLY covered and documented in CLDR, with MANY examples already translated in many more languages.


So I strongly suggest deprecating EDTF, or reimplementing based on CLDR (eg. with ICU4C, ICU4Java or the newest ICU4x, all documented and supported by the Unicode Consortium and open-sourced; Note that ICU4X is now fully supported and offers many more binfings than ICU4C and ICU4J, to support more languages, it is easier to integrate into Mediawiki, and offers significant security advantes and its code coverage is much more tested, even if some earlier bugs in ICU4C and ICU4J were fixed by retroporting coverage tests and code fixes discovered in in ICU4X).

But if you cannot reimplement EDTF based on CLDR data (or do not want to integrate ICU4X), make sure you look at the data alerady covered: what I described above about French also applies to other languages using common contractions for prepositions and articles, including for example Italian or Spanish, and add the necessary translatable items to fix the initial bad assumptions made on intervals. You may still maintain "compatibility items" in EDTF, but mark them as deprecated in favor of more precise items (where the precision of datetime variables is explicity specified).

Also I request again that you avoid forcing a leading capital in sources (e.g. "from $1 to $2" and not "From $1 to $2", "yesterday" and not "Yesterday"...) and that all translations should use uncapitalized terms (unless these terms are always capitalized like "Monday" or "March" in English), i.e. like entries in dictionaries. The capittlization at start of a sentence or title can be infered. CLDR does not force the capitalisation in any one of these translatable terms.

@JeroenDeDauw
Copy link
Member

CLDR supports EDTF? Don't think so. This is what EDTF stands for https://www.loc.gov/standards/datetime/

Looks like CLDR is a MediaWiki extension that thus cannot be used standalone.

@mzeinstra
Copy link
Collaborator Author

@verdy-p Thanks for the feedback. I see your argument on capitalisation, I will spin that off into its own issue.

We also know that generic humanisation of any dates/times into different natural languages has its limitations. We are probably not able to use generic humanisation when different parts of the date has impact on the other parts of the date. We should accept the limitation of the current iteration and improvements of this tool.

Please remember that as @JeroenDeDauw already said, this is an open source repository for the standalone library for EDTF, which is subsequently used in the extension Wikibase-EDTF. It can also be used in other systems that want to adopt or humanise EDTF strings.

This is not the place for discussions on whether EDTF is fit for purpose at Wikidata, please address those comments to the appropriate platform.

@verdy-p
Copy link

verdy-p commented Jan 15, 2023

My comment was not about if it is fine or not in Wikidata or Wikibase (In fact EDTF is also questionable for its indirect use in Wikibase-EDTF as well). But whever we want to maintain translation of EDTF formats as an integrant part of EDTF, developed separately, ot if we should think about refactor EDTF itself based on CLDR, which already performs (with ICU) and translates (with CLDR data) absolutely EVERYTHING that EDTF wants to support.

What I mean is that it is independant of the choice of EDTF as an interface used in Wikidata or Wikibase: Wikidata/Wikibase are themselves based on MediaWiki, which is ALREADY integrating CDLR data and the ICU library for many things, and I just don't understand the need to deviate from CLDR data with is already vetted and already has a much broader coverage, where all existing issues above have already been highly discussed and are ALREADY solved.

I just then view EDTF as a "poorman" implementation that is far below and wants to reinvent things that have already been solved in CLDR, and is already widely used. The only interest of EDTF is then just to allow integrating overrides as a workround for some translations that CLDR still has not been able to vet and release (because CLDR vetting is extremely slow and for many minority languages, it takes considerable timeto have them supported, whereas Wikimedia can support them faster in a community effort; but as soon as CLDR data is available, it should become the standard and EDTF data would progressively be deprecated, allowing wikis to make a transition if they need some temporary stability, e.g. if EDTF-formatted dates are used in pagenames). We had the same issue with translation of language names: CDLR data is impressive, but MediaWiki includes its own limited set of overrides (to avoid using fallbacks to other supported languages), and wikis themselves have their own local overrides to what Mediawiki features. In conclusion EDTF can remain as a useful transition library but in the long term, it should allow wikis to converge to the international CLDR standard (which is also used in many other non-Wikimedia projects, including various other i18n libraries like the standard libraries in C/C++ or PHP, that all wikis are also using, and many system libraries, components and protocols).

A transition scheme is useful, but already you should ink about rebasing EDTF to solve many existing issues. CLDR is the way to go (and ICU4x allows EDTF to do that quite simply): translating date and time values is a very common task that all development frameworks need to integrate for their i18n support. And here we need convergence (EDTF just fills a small niche but is far bhind what modern apps need and already use today; and this is not jsut about translation, because date and time values have legal concerns and are focused for security, we cannot translate them as we want and must avoid all ambiguities, so we cannot do that alone in a very tiny developers team and a few translators that for now can't properly do their work as expected)

@JeroenDeDauw
Copy link
Member

I am still not sure what exactly you mean. Do with "EDTF" you mean "the i18n code part of the EDTF library"?

@JeroenDeDauw
Copy link
Member

As far as I know, the answer to both of these questions is NO:

  • Does CLDR support parsing and representing EDTF dateas?
  • Can PHP libraries use CLDR? (i.e., does it NOT depend on MediaWiki)

@verdy-p
Copy link

verdy-p commented Jan 15, 2023

@JeroenDeDauw
Copy link
Member

So with "CLDR" are you not talking about https://www.mediawiki.org/wiki/Extension:CLDR?

From the links you provided, it is not at all clear to me there is support for the Extended Date Time Format that we can simply use instead of this library.

@verdy-p
Copy link

verdy-p commented Jan 16, 2023

Please note that this thread was started about translatability (we don't care here about the custom syntax used to represent dates with EDTF in a locale-neutral way, however that EDTF form may be considered as being a specific locale, just like the "root" locale in CLDR, or the "POSIX" locale: EDTF just defines its own "language" just like POSIX does in legacy C libraries)

But beside that, it should parse formatted dates (including those in EDTF form) into some Datetime object or DatetimeRange object, or DatetimeSet object, and then to format these objects into human-readable texts (or back to EDTF form), it can perfectly use CLDR data which contains almost all the needed formats and translations (except may be a couple additional qualifiers for incertainty or approximation, but these should rapidly be supported in CLDR data as well; existing translations made in TWN should then only be needed to increase the coverage for more languages or locales that CLDR still does not provide, or if there's a need for overrides). What will remain to translate for EDTF will be dramatically smaller (and most existing translations may be deprecated as no longer needed).

@JeroenDeDauw
Copy link
Member

A pull request demonstrating translation via CLDR is definitely welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants