Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glossary support for Google Cloud Translation Advanced #12777

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/admin/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -230,12 +230,16 @@ Django REST Framework
- :ref:`mt-google-translate-api-v3`


* -
- `google-cloud-storage <https://pypi.org/project/google-cloud-storage>`_
- Optional glossary support for :ref:`mt-google-translate-api-v3`


* - ``ldap``
- `django-auth-ldap <https://pypi.org/project/django-auth-ldap>`_
- :ref:`ldap-auth`



* - ``mercurial``
- `mercurial <https://pypi.org/project/mercurial>`_
- :ref:`vcs-mercurial`
Expand Down
14 changes: 14 additions & 0 deletions docs/admin/machine.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@

.. seealso::

`DeepL translator <https://www.deepl.com/translator>`_,

Check warning on line 190 in docs/admin/machine.rst

View workflow job for this annotation

GitHub Actions / Linkcheck

https://www.deepl.com/translator to https://www.deepl.com/en/translator

Check warning on line 190 in docs/admin/machine.rst

View workflow job for this annotation

GitHub Actions / Linkcheck

https://www.deepl.com/pro to https://www.deepl.com/en/pro
`DeepL pricing <https://www.deepl.com/pro>`_,
`DeepL API documentation <https://developers.deepl.com/docs>`_

Expand Down Expand Up @@ -241,6 +241,8 @@
+-----------------+---------------------------------------+----------------------------------------------------------------------------------------------------------+
| ``location`` | Google Translate location | Choose a Google Cloud Translation region that is used for the Google Cloud project or is closest to you. |
+-----------------+---------------------------------------+----------------------------------------------------------------------------------------------------------+
| ``bucket_name`` | Google Storage Bucket name | Enter the name of the Google Cloud Storage bucket that is used to store the Glossary files. |
+-----------------+---------------------------------------+----------------------------------------------------------------------------------------------------------+

Machine translation service provided by the Google Cloud services.

Expand All @@ -258,6 +260,18 @@
.. _Enable the Cloud Translation.: https://cloud.google.com/translate/docs/
.. _Setup Authentication.: https://googleapis.dev/python/google-api-core/latest/auth.html


Optionally, you can configure the service to use :ref:`glossary` by setting up a Bucket:

1. `Create a Google Cloud bucket.`_
2. `Set bucket location to "us-central1".`_
3. `Grant 'Storage Admin' permission to the Service Account.`_

.. _Create a Google Cloud bucket.: https://cloud.google.com/storage/docs/creating-buckets
.. _Set bucket location to "us-central1".: https://cloud.google.com/translate/docs/migrate-to-v3#resources_projects_and_locations
.. _Grant 'Storage Admin' permission to the Service Account.: https://cloud.google.com/translate/docs/access-control


.. seealso::

`Google translate documentation <https://cloud.google.com/translate/docs>`_,
Expand Down Expand Up @@ -575,7 +589,7 @@

.. seealso::

* `SAP Translation Hub <https://www.sap.com/products/artificial-intelligence/translation-hub.html>`_

Check failure on line 592 in docs/admin/machine.rst

View workflow job for this annotation

GitHub Actions / Linkcheck

https://www.sap.com/products/artificial-intelligence/translation-hub.html: 403 Client Error: Forbidden for url: https://www.sap.com/products/artificial-intelligence/translation-hub.html
* `SAP Translation Hub API <https://api.sap.com/api/translationhub/overview>`_

.. _mt-systran:
Expand Down
1 change: 1 addition & 0 deletions docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Not yet released.

**Improvements**

* :ref:`mt-google-translate-api-v3` now supports :ref:`glossary-mt` (optional).
* A shortcut to duplicate a component is now available directly in the menu (:guilabel:`Manage` → :guilabel:`Duplicate Component`)
* Included username when generating :ref:`credits`.
* :ref:`bulk-edit` shows a preview of matched strings.
Expand Down
1 change: 1 addition & 0 deletions docs/user/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ Following automatic suggestion services utilize glossaries during the translatio
* :ref:`mt-microsoft-translator`
* :ref:`mt-modernmt`
* :ref:`mt-aws`
* :ref:`mt-google-translate-api-v3`

The glossary is processed before exposed to the service:

Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,8 @@ gerrit = [
"git-review>=2.4.0,<2.5.0"
]
google = [
"google-cloud-translate>=3.13.0,<4.0"
"google-cloud-translate>=3.13.0,<4.0",
"google-cloud-storage>=2.18.2,<3.0"
]
ldap = [
"django-auth-ldap>=4.6.0,<6.0.0"
Expand Down
18 changes: 10 additions & 8 deletions scripts/show-extras
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,13 @@ with open("pyproject.toml", "rb") as handle:
for section, data in toml_dict["project"]["optional-dependencies"].items():
if section == "all":
continue
dependency = re.split("[;<>=[]", data[0])[0].strip()
print(
f"""
* - ``{section}``
- `{dependency} <https://pypi.org/project/{dependency}>`_
-
"""
)
for index, dependency in enumerate(data):
dependency = re.split("[;<>=[]", dependency)[0].strip()
section = f"``{section}``" if index == 0 else ""
print(
f"""
* - {section}
- `{dependency} <https://pypi.org/project/{dependency}>`_
-
"""
)
50 changes: 50 additions & 0 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions weblate/machinery/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
persona: str
style: str
custom_model: str
bucket_name: str


class TranslationResultDict(TypedDict):
Expand Down Expand Up @@ -305,7 +306,7 @@
"mt",
self.mtid,
scope,
calculate_dict_hash(self.settings),

Check failure on line 309 in weblate/machinery/base.py

View workflow job for this annotation

GitHub Actions / mypy

Argument 1 to "calculate_dict_hash" has incompatible type "SettingsDict"; expected "dict[Any, Any]"
*parts,
]
if text is not None:
Expand Down Expand Up @@ -667,7 +668,7 @@
"""Disable rate limiting."""
return False

def get_language_possibilities(self, language: Language) -> Iterator[Language]:

Check failure on line 671 in weblate/machinery/base.py

View workflow job for this annotation

GitHub Actions / mypy

Return type "Iterator[Language]" of "get_language_possibilities" incompatible with return type "Iterator[str]" in supertype "BatchMachineTranslation"
yield get_machinery_language(language)


Expand Down
10 changes: 10 additions & 0 deletions weblate/machinery/forms.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,16 @@ class GoogleV3MachineryForm(BaseMachineryForm):
)
),
)
bucket_name = forms.CharField(
label=pgettext_lazy(
"Automatic suggestion service configuration", "Google Storage Bucket name"
),
help_text=pgettext_lazy(
"Google Cloud Translation configuration",
"Enter the name of the Google Cloud Storage bucket that is used to store the Glossary files.",
),
required=False,
)

def clean_credentials(self):
try:
Expand Down
111 changes: 107 additions & 4 deletions weblate/machinery/googlev3.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,49 @@
from __future__ import annotations

import json
import operator
from typing import TYPE_CHECKING

from django.utils.functional import cached_property
from google.cloud.translate import TranslationServiceClient
from google.cloud import storage

Check failure on line 12 in weblate/machinery/googlev3.py

View workflow job for this annotation

GitHub Actions / mypy

Module "google.cloud" has no attribute "storage"
from google.cloud.translate_v3 import (
GcsSource,
Glossary,
GlossaryInputConfig,
TranslateTextGlossaryConfig,
TranslationServiceClient,
)
from google.oauth2 import service_account

from .base import DownloadTranslations, XMLMachineTranslationMixin
from .base import (
DownloadTranslations,
GlossaryMachineTranslationMixin,
XMLMachineTranslationMixin,
)
from .forms import GoogleV3MachineryForm
from .google import GoogleBaseTranslation

if TYPE_CHECKING:
from weblate.trans.models import Unit


class GoogleV3Translation(XMLMachineTranslationMixin, GoogleBaseTranslation):
class GoogleV3Translation(
XMLMachineTranslationMixin, GoogleBaseTranslation, GlossaryMachineTranslationMixin
):
"""Google Translate API v3 machine translation support."""

name = "Google Cloud Translation Advanced"
max_score = 90
settings_form = GoogleV3MachineryForm

# estimation, actual limit is 10.4 million (10,485,760) UTF-8 bytes
glossary_count_limit = 1000

# Identifier must contain only lowercase letters, digits, or hyphens.
glossary_name_format = (
"weblate__{project}__{source_language}__{target_language}__{checksum}"
)

@classmethod
def get_identifier(cls) -> str:
return "google-translate-api-v3"
Expand All @@ -44,6 +66,17 @@
credentials=credentials, client_options={"api_endpoint": api_endpoint}
)

@cached_property
def storage_client(self):
credentials = service_account.Credentials.from_service_account_info(
json.loads(self.settings["credentials"])
)
return storage.Client(credentials=credentials)

@cached_property
def storage_bucket(self):
return self.storage_client.get_bucket(self.settings["bucket_name"])
gersona marked this conversation as resolved.
Show resolved Hide resolved

@cached_property
def parent(self) -> str:
project = self.settings["project"]
Expand Down Expand Up @@ -72,10 +105,21 @@
"source_language_code": source,
"mime_type": "text/html",
}
glossary_path: str | None = None
if self.settings.get("bucket_name"):
glossary_path = self.get_glossary_id(source, language, unit)

Check failure on line 110 in weblate/machinery/googlev3.py

View workflow job for this annotation

GitHub Actions / mypy

Incompatible types in assignment (expression has type "int | str | None", variable has type "str | None")
request["glossary_config"] = TranslateTextGlossaryConfig(
glossary=glossary_path
)

response = self.client.translate_text(request)

response_translations = (
response.glossary_translations if glossary_path else response.translations
)

yield {
"text": response.translations[0].translated_text,
"text": response_translations[0].translated_text,
"quality": self.max_score,
"service": self.name,
"source": text,
Expand All @@ -95,3 +139,62 @@
replacements[replacement] = "\n"

return text.replace("\n", replacement), replacements

def list_glossaries(self) -> dict[str, str]:
"""Return dictionary with the name/id of the glossary as the key and value."""
return {
glossary.display_name: glossary.display_name
for glossary in self.client.list_glossaries(parent=self.parent)
}

def create_glossary(
self, source_language: str, target_language: str, name: str, tsv: str
) -> None:
"""
Create glossary in the service.

- Uploads the TSV file to gcs bucket
- Creates the glossary in the service
"""
# upload tsv to storage bucket
glossary_bucket_file = self.storage_bucket.blob(f"{name}.tsv")
glossary_bucket_file.upload_from_string(
tsv, content_type="text/tab-separated-values"
)
# create glossary
bucket_name = self.settings["bucket_name"]
gcs_source = GcsSource(input_uri=f"gs://{bucket_name}/{name}.tsv")
input_config = GlossaryInputConfig(gcs_source=gcs_source)

glossary = Glossary(
name=self.get_glossary_resource_path(name),
language_pair=Glossary.LanguageCodePair(
source_language_code=source_language,
target_language_code=target_language,
),
input_config=input_config,
)
self.client.create_glossary(parent=self.parent, glossary=glossary)

def delete_glossary(self, glossary_name: str) -> None:
"""Delete the glossary in service and storage bucket."""
self.client.delete_glossary(name=self.get_glossary_resource_path(glossary_name))

# delete tsv from storage bucket
glossary_bucket_file = self.storage_bucket.blob(f"{glossary_name}.tsv")
glossary_bucket_file.delete()

def delete_oldest_glossary(self) -> None:
"""Delete the oldest glossary if any."""
glossaries = sorted(
self.client.list_glossaries(parent=self.parent),
key=operator.attrgetter("submit_time"),
)
if glossaries:
self.delete_glossary(glossaries[0].display_name)

def get_glossary_resource_path(self, glossary_name: str):
"""Return the resource path used by the Translation API."""
return self.client.glossary_path(
self.settings["project"], self.settings["location"], glossary_name
)
Loading
Loading