Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Administrative area keys for China #38

Closed
patilpankaj212 opened this issue Jun 5, 2024 · 3 comments
Closed

Administrative area keys for China #38

patilpankaj212 opened this issue Jun 5, 2024 · 3 comments

Comments

@patilpankaj212
Copy link

patilpankaj212 commented Jun 5, 2024

Hi,

I wanted to inquire about some observations I've made in generated data:

  • Administrative Area Keys: The keys used to identify administrative areas in China appear to be numeric within the data. However, the ISO 3166-2 codes is not numeric. Please refer to https://en.wikipedia.org/wiki/ISO_3166-2:CN.

  • Locality IDs and Language: The locality IDs for China's administrative areas seem to be presented in Han characters for the "en" (English) language. Shouldn't these IDs be in English language?

The current format causes inconsistencies while attempting to interact with the library for Chinese addresses. Also, using Han characters for locality IDs in the "en" language could lead to confusion for users expecting English.

Is it possible to adopt ISO 3166-2 standard for administrative area keys in China and displaying locality IDs in English for the "en" language?

@F21
Copy link
Member

F21 commented Jun 5, 2024

Hey, thanks for opening this issue.
Unfortunately, we are limited by Google's data as that is the source used to generate the data set and the data set might not be consistent with ISO 3166-2 and other standards.

  • For the Administrative Area Keys, we use the isoid from Google's data set. For China, they used some numbers (not sure how they are assigned), but for other countries, such as Australia, it seems to be in line with ISO 3166-2.

  • For the locality IDs, as there doesn't seem to be a internationally recognized database of IDs for each locality, we need to find a way to derive a surrogate. The best way would be to use the name of the locality in the orginal language, for example, if you look at the locality ids for South Korea, the ids are in hangul. The reason for doing so is that there are sometimes mistakes in the English translations for locality names and there is often no official concensus that an English translation of a locality is the official English translation (for some localilties, there might be no English translation at all). For these reasons, we used the original name in the official language of the country as the ID, as we know these are unlikely to be incorrect and would not cause churn if incorrect English translations are updated at a later sage.

  • If you want the English translation for a localilty, you should use the "Name" property, These are almost always in English, unless Google's dataset does not include the English translation, in which case we would fall back to the official language of the country.

I would definitely love to use ISO 3166-2 for the administrative area keys, but for consistency with libaddressinput, we need to follow Google's data set.

For corrections to the data set, especially the ISO 3166-2 codes, I encourage you to open an issue here in the https://github.com/google/libaddressinput repo, it might take a while for Google to update their data set, but as soon as this happens, GitHub actions will automatically create a PR here to update the data set.

@F21 F21 closed this as completed Jun 5, 2024
@patilpankaj212
Copy link
Author

patilpankaj212 commented Jun 5, 2024

Thanks for a quick response @F21, there is already an issue open for correcting the dataset for China google/libaddressinput#195

@F21
Copy link
Member

F21 commented Jun 5, 2024

Thanks, will follow that issue as well. It seems Google is dragging their feet on that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants