Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ESLint rule to cap the length of translation strings #3725

Merged
merged 8 commits into from
Mar 26, 2024

Conversation

sarayourfriend
Copy link
Collaborator

@sarayourfriend sarayourfriend commented Jan 31, 2024

Fixes

Fixes #3378 by @Presskopp

Description

This PR adds a new ESLint rule to cap the length of translation strings to 40 words.

It is a draft because there are 20 strings that need to be broken up for this rule to pass, and I want to make sure this is an acceptable approach before spending the time to do that (as it is quite tedious and careful work). I also need to add the documentation page for the new rule.

I also added a new utility, utilities/generate_test_locales, which generates strings for the test locales. It keeps existing strings and does not overwrite them, which means we can make manual changes to the automatically generated strings as we see fit, and they will be preserved. I have documented that and more in the i18n documentation page.

Testing Instructions

Check out the new ESLint rule. Run it locally with pnpm run eslint and confirm you see 20 strings failing to pass this in en.json5. Try to break the rule by adding new strings or testing any edge cases you can think of. If you come up with any edge cases, please note them in your review so that I can add them to the unit tests.

Please try the new utilitis/generate_test_locales script locally, following the instructions for use in the i18n documentation page.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • [N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@openverse-bot openverse-bot added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 📄 aspect: text Concerns the textual material in the repository 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work labels Jan 31, 2024
@github-actions github-actions bot added the 🧱 stack: frontend Related to the Nuxt frontend label Jan 31, 2024
@sarayourfriend sarayourfriend marked this pull request as ready for review February 1, 2024 02:26
@sarayourfriend sarayourfriend requested a review from a team as a code owner February 1, 2024 02:26
@sarayourfriend
Copy link
Collaborator Author

Looking at the snapshot failures, and it looks like I've messed up a few converstions on the broken-up paragraphs. Redrafting and I'll address those failures tomorrow when I return to work.

@sarayourfriend sarayourfriend marked this pull request as draft February 1, 2024 02:49
@obulat obulat removed the 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work label Feb 4, 2024
@sarayourfriend
Copy link
Collaborator Author

@WordPress/openverse-frontend What is the process for updating the RTL testing locale? Do contributors need to manually enter strings into automated translation services or is there a process to generate them as needed (e.g., from a script)?

@sarayourfriend sarayourfriend force-pushed the add/no-long-translation-strings branch from e8e45c2 to 50577a0 Compare February 5, 2024 04:10
@sarayourfriend sarayourfriend marked this pull request as ready for review February 5, 2024 22:39
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise, LGTM! Personally I don't feel so good about the idea of breaking down translations strings into names like a, b, c because they do not convey any-particular meaning. But I understand it may be the best approach we have.

Is it possible to wrap $t (something like a new util, say $ti) to automatically iterate over a, b, c etc? That would make make this diff...

-   <p>{{ $t("sensitive.faq.two.answerPt2", { openverse: "Openverse" }) }}</p>
+    <p>
+      {{ $t("sensitive.faq.two.answerPt2.a") }}
+      {{ $t("sensitive.faq.two.answerPt2.b") }}
+      {{ $t("sensitive.faq.two.answerPt2.c", { openverse: "Openverse" }) }}
+    </p>

...into something much simpler.

- $t
- $ti

@@ -124,7 +124,6 @@
"chokidar": "^3.5.3",
"comment-json": "^4.2.3",
"css-loader": "^5.2.7",
"eslint-plugin-jsonc": "^2.8.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@sarayourfriend
Copy link
Collaborator Author

Is it possible to wrap $t (something like a new util, say $ti) to automatically iterate over a, b, c etc? That would make make this diff...

That's an interesting idea. I think the complexity there is that we don't know in the code how many parts there without reading the translation blob. Is it possible to do that? It sounds like a nice improvement but I'd prefer it in a separate PR, considering how significant this one already is.

@obulat
Copy link
Contributor

obulat commented Feb 8, 2024

@WordPress/openverse-frontend What is the process for updating the RTL testing locale? Do contributors need to manually enter strings into automated translation services or is there a process to generate them as needed (e.g., from a script)?

The process has usually been the former, I didn't spent time on automating it.

Copy link
Contributor

@obulat obulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great, thank you!

@obulat
Copy link
Contributor

obulat commented Feb 26, 2024

@sarayourfriend, is there something blocking this PR from being merged, aside from the git conflicts?

@sarayourfriend
Copy link
Collaborator Author

The translations. It's honestly so tedious, I spent a couple hours last week trying to automate the translations locally using https://github.com/argosopentech/argos-translate, but of course the template strings make automated translations, even approximations impossible.

I was going to ping you today to ask how you deal with variables in strings when you run them through a translator? I don't know Arabic or Russian, so there's no way I can reliably guess where the template words should go.

If you don't worry about it, then I can push the code I wrote to automate it using Argos. There are a lot of outdated/non-existent strings in the testing translations, and hand maintaining these test translations is so tedious, I'm surprised we actually have them, particularly as the process and approach are undocumented.

@sarayourfriend sarayourfriend requested a review from obulat March 20, 2024 00:43
@sarayourfriend sarayourfriend force-pushed the add/no-long-translation-strings branch from 50577a0 to a254c21 Compare March 20, 2024 00:44
@sarayourfriend sarayourfriend requested a review from a team as a code owner March 20, 2024 00:44
@sarayourfriend
Copy link
Collaborator Author

@obulat I've re-requested a review because I added in the test locale generation script I mentioned to you during our chat earlier this week.

There is new documentation for the utility, and I will update the PR description now to include it in the review instructions. Let me know what you think.

@sarayourfriend sarayourfriend force-pushed the add/no-long-translation-strings branch from a254c21 to 5be4324 Compare March 20, 2024 00:53
@sarayourfriend
Copy link
Collaborator Author

I'm lost on why the translation banner tests fail... when I run them on my computer in the container, they fail there too. But when I run them locally to try to debug why the page isn't loading, it works fine 🤔 It feels like a configuration issue, maybe? I need to look into how the other test locales are handled, if there is anything special that makes them work correctly inside the containers.

@obulat
Copy link
Contributor

obulat commented Mar 20, 2024

I'm lost on why the translation banner tests fail... when I run them on my computer in the container, they fail there too. But when I run them locally to try to debug why the page isn't loading, it works fine 🤔 It feels like a configuration issue, maybe? I need to look into how the other test locales are handled, if there is anything special that makes them work correctly inside the containers.

We only use the translated value in the valid-locales.json to determine whether we should show the translation banner or not, we don't look into the locale's json file at all. So, for testing purposes, we can set the translated value to a number below 90 to force the banner to appear, even if the locale is fully-translated.

@obulat obulat force-pushed the add/no-long-translation-strings branch from 13326d6 to 2f01d0e Compare March 20, 2024 13:54
Copy link
Contributor

@obulat obulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reversed the translation banner test changes because the banner will show up for any language that has translated set to below 90 in the valid-locales.json file.

There is a problem with the current translation generation code: it leaves PLACEHOLDER text instead of converting it back to the original key. This means that when a text has a link inside, this link won't be shown. So, for the translation banner in Russian in this PR didn't have a link, so we wouldn't be able to test that link. That could be a problem in many e2e and VR tests for RTL in the future, if the link is missing.

I tried replacing the PLACEHOLDER texts with the key values. I got stuck trying to figure out why some Arabic texts were not translated (I tried the GUI version of argos and it did translate the untranslated values); some had PLACEHOLDER in them, and some had the original keys (e.g., {openverse}). I guess you only added the translations for keys that didn't exist, and the existing values stayed the same, therefore keeping the original keys, @sarayourfriend ?

I don't want to block this PR because it has so many template changes and I don't want to force you to do any more merge-conflict resolutions.
I think we can remove all locales but ar, which is essential for RTL testing; what do you think?

Copy link

Full-stack documentation: https://docs.openverse.org/_preview/3725

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

Changed files 🔄:

@sarayourfriend
Copy link
Collaborator Author

sarayourfriend commented Mar 21, 2024

Ah! That's an interesting problem.

For those particular keys, we can put a manual translation there and the translation script will keep the previous value. We could, for example, keep the English language value for the translation banner on the Russian one. That should be an easy enough fix!

There is a problem with the current translation generation code: it leaves PLACEHOLDER text instead of converting it back to the original key

This is intention, by the way, because for some languages or strings, Argos will decide to translate the string "PLACEHOLDER" or "{openverse}" (for example) or other placeholders like "###" I tried. I originally had the script keeping track of each of these placeholders and then substituting them back into the original string in order. This actually did work for a great many strings. However, whenever Argos would decide it wanted to translate a particular placeholder (no matter what I tried) it would result in the script not finding enough placeholder values to pop the removed ones back in! There's also no guarantee that the order of the placeholders in the translated text matches that of the original English, for example adjective/noun ordering being opposite in Romance languages from Germanic ones.

So there were lots of issues with trying to maintain the placeholders while using an automated translation tool. I can put back the "kinda working" version, and then for keys that cause an issue in any language, we could output a log file of those and just require manual translations for that subset of keys (with the suggested mangled translation included). Because it always preserves the existing translation, we'd only have to fix it once for each problematic key, while still getting fully working automatic translations with "approximate" placeholders for the vast majority of instances.

I'll work on that tomorrow so I can get this merged ASAP. Thanks very much for the review and the fix on the translation banner tests!

@obulat
Copy link
Contributor

obulat commented Mar 21, 2024

I understand the problem with the placeholders, they are really fiddly :)

For those particular keys, we can put a manual translation there and the translation script will keep the previous value. We could, for example, keep the English language value for the translation banner on the Russian one. That should be an easy enough fix!

It's great that we have an escape hatch, I think this is a good enough solution for testing layouts in VR tests.

What do you think about removing the ru and es locales, and only leaving ar for all locale testing, @sarayourfriend ?

@sarayourfriend
Copy link
Collaborator Author

I think having as many different scripts as possible is a good idea, if we are exercising them, particularly non-latin scripts, but also locales that have accents and such, so that we exercise that (English won't, most of the time)...

I don't see those test locales as being much of a maintenance overhead at this point, if we have automatic generation. What would be the benefit to removing them?

@sarayourfriend
Copy link
Collaborator Author

@obulat if you have time to re-review this changes and merge them, please do. I would like a second set of eyes on this at least, before merging. @dhruvkb if you're able to do so as well, I'd appreciate it.

@sarayourfriend sarayourfriend force-pushed the add/no-long-translation-strings branch from 6a88d85 to 21410aa Compare March 22, 2024 05:20
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! The additional of automatically generated translations is very welcome...

...although I would've liked the all scripting pertaining to translations to be centralised (not necessarily inside frontend/ but together) and co-located.

I didn't go through each modified snapshot (as there are way too many of those) but spot-checked 5-6 and they seem okay 👍.

Copy link
Contributor

@obulat obulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good ✨

@sarayourfriend
Copy link
Collaborator Author

...although I would've liked the all scripting pertaining to translations to be centralised (not necessarily inside frontend/ but together) and co-located.

I was going to do that, but because it's a Python thing, I didn't want to muddy the waters too much.

@sarayourfriend sarayourfriend force-pushed the add/no-long-translation-strings branch from 21410aa to 7a6fa56 Compare March 26, 2024 03:40
@sarayourfriend
Copy link
Collaborator Author

Rebasing just to make sure VR tests pass on main.

@sarayourfriend sarayourfriend merged commit 6b49eed into main Mar 26, 2024
39 checks passed
@sarayourfriend sarayourfriend deleted the add/no-long-translation-strings branch March 26, 2024 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 aspect: text Concerns the textual material in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: frontend Related to the Nuxt frontend
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Don't use very long strings to be translated
4 participants