Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more BCP-47 default language codes #237

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NeonDaniel
Copy link
Member

@NeonDaniel NeonDaniel commented Aug 2, 2022

Description

Add more default BCP-47 languages. Using defaults based on language region of origin or approximated most populous region (open to suggestions for changes here).

ISO 639-1 language codes are specified by ISO with a publicly available reference [on Wikipedia].
ISO 3166-1 country codes are specified [by ISO] (https://www.iso.org/iso-3166-country-codes.html) with a publicly available reference on Wikipedia.

BCP-47 follows RFC 1766 from IETF.

Examples of other lists:
https://cloud.google.com/speech-to-text/docs/languages
https://web.archive.org/web/20160705080418/https://msdn.microsoft.com/en-us/library/ee825488%28v=cs.20%29.aspx?f=255&MSPPError=-2147217396

Type of PR

If your PR fits more than one category, there is a high chance you should submit more than one PR. Please consider this carefully before opening the PR.
Either delete those that do not apply, or add an x between the square brackets like so: - [x]

  • Bugfix
  • Feature implementation
  • Refactor of code (without functional changes)
  • Documentation improvements
  • Test improvements

Testing

get_full_lang_code will now resolve more BCP-47 language codes

Documentation

No functional changes.

CLA

To protect you, the project, and those who choose to use Mycroft technologies in systems they build, we ask all contributors to sign a Contributor License Agreement.

This agreement clarifies that you are granting a license to the Mycroft Project to freely use your work. Additionally, it establishes that you retain the ownership of your contributed code and intellectual property. As the owner, you are free to use your code in other work, obtain patents, or do anything else you choose with it.

If you haven't already signed the agreement and been added to our public Contributors repo then please head to https://mycroft.ai/cla to initiate the signing process.

@devs-mycroft devs-mycroft added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label Aug 3, 2022
@krisgesling
Copy link
Contributor

krisgesling commented Aug 8, 2022

Hey Daniel,

This won't break anything, but I'm curious what purpose it serves to have a big list of language codes?

The one thing that concerns me is that these have been sourced from the Google Cloud site and I don't expect that they have positive copyright provisions covering that content. Even though they're part of the BCP-47 spec, it still falls into a legal grey area:
https://mycroft.ai/blog/licensing-a-language-how-the-copyright-system-abuses-fundamental-human-rights/

Edit: is there an upstream source or wikipedia article we can source these from that has more explicit copyright info?

Longer term we should probably move to the current standard ISO 639.2 but that's clearly well outside the scope of this change.

@NeonDaniel
Copy link
Member Author

NeonDaniel commented Aug 8, 2022

This won't break anything, but I'm curious what purpose it serves to have a big list of language codes?

The get_full_lang_code is generically useful for getting a default BCP-47 code from an ISO 639 code since those are what are used in core for resources. I use the method in a few places to make sure Message objects always get the BCP-47 code if there was some language translation involved.

The one thing that concerns me is that these have been sourced from the Google Cloud site and I don't expect that they have positive copyright provisions covering that content. Even though they're part of the BCP-47 spec, it still falls into a legal grey area: https://mycroft.ai/blog/licensing-a-language-how-the-copyright-system-abuses-fundamental-human-rights/

Fair enough. This was just a source I had referenced in code previously, but I'll find the upstream spec from ISO or some open source reference. I have updated my PR comment to more precisely specify what the codes are and how they are derived from public sources.

Longer term we should probably move to the current standard ISO 639.2 but that's clearly well outside the scope of this change.

I agree on both points. There is hopefully already some implementation we could use to translate between the 2 specs but that would be a bigger transition to plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants