Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tool] Auto translate #49457

Merged
merged 5 commits into from
Aug 7, 2024
Merged

[Tool] Auto translate #49457

merged 5 commits into from
Aug 7, 2024

Conversation

DanRoscigno
Copy link
Contributor

@DanRoscigno DanRoscigno commented Aug 6, 2024

This PR is to add a workflow that translates docs from English to Chinese. I can provide the GPT and W&B secrets.

Note:

This requires three secrets and a PAT. The details are below.

  1. Detect changed Markdown or MDX files using tj-actions/changed-files
  2. Translate Docusaurus Markdown from English to Chinese using tcapelle/gpt_translate
  3. Automatically open a PR with the Chinese docs using peter-evans/create-pull-request

Setup

Github Personal Access Token

I used a fine-grained token limited to this repo. Here are the perms I gave:

  • Read and Write access to code (content) and pull requests
  • Read and Write access to pull requests

Additionally, GitHub automatically assigned:

  • Read access to metadata

Repository secret

I created three repository secrets:

TRANSLATE_PAT

This contains the PAT created above.

OPENAI_API_KEY

This contains a GPT4 API key

WANDB_API_KEY

This contains the output of W&B authorize

Workflow file notes

  • The workflow runs when a pull request with target main is closed by being merged. It does not run on every commit made to a PR as that is wasteful.
  • The paths and filters in the workflow are specific to the StarRocks docs, you will need to change them.

gpt_translate configuration

The configuration is in the configs dir in this repo. At StarRocks we use a modified prompt and temperature in addition to a custom glossary in configs/language_dicts/zh.yaml. You can compare our prompt with the default from tcapelle/gpt_translate.

About Weights and Biases

You should read Weights & Biases, I am no expert. I do think that the Weave feature is going to be useful as I tune the prompt and glossary I am using. Weave keeps track of the changes and the impact. Additionally, W&B validates the translations for me automatically and provides scores. Seriously, I am no AI expert; check out their site.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

Signed-off-by: DanRoscigno <[email protected]>
Signed-off-by: DanRoscigno <[email protected]>
Signed-off-by: DanRoscigno <[email protected]>
@github-actions github-actions bot added title needs [type] documentation Improvements or additions to documentation labels Aug 6, 2024
Signed-off-by: DanRoscigno <[email protected]>
Signed-off-by: DanRoscigno <[email protected]>
Copy link

github-actions bot commented Aug 6, 2024

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Aug 6, 2024

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@DanRoscigno DanRoscigno changed the title Auto translate [Tool] Auto translate Aug 6, 2024
@kevincai kevincai requested a review from andyziye August 6, 2024 21:24
@alvin-celerdata alvin-celerdata merged commit 0912509 into main Aug 7, 2024
50 of 51 checks passed
@alvin-celerdata alvin-celerdata deleted the auto-translate branch August 7, 2024 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants