Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List the Chinese characters in Unicode? #605

Open
xfq opened this issue Feb 13, 2024 · 5 comments
Open

List the Chinese characters in Unicode? #605

xfq opened this issue Feb 13, 2024 · 5 comments
Labels
i:encoding Characters & encoding

Comments

@xfq
Copy link
Member

xfq commented Feb 13, 2024

It might be useful to list the Chinese characters in Unicode, like klreq and alreq:

  • The basic set (U+4E00-U+9FA5), i.e., ISO/IEC 10646:1993
  • CJK Unified Ideographs Extension A, i.e., U+3400-U+4DB5 in ISO/IEC 10646:1999
  • U+3400-U+9FFF (BMP Chinese characters)
  • U+20000-U+2FFFF, i.e., CJK Unified Ideographs Extension B to Extension F (Extension I in September 2023), commonly known as the Supplementary Ideographic Plane (SIP)
  • U+30000-U+3FFFF, i.e., CJK Unified Ideographs Extension G to Extension H, commonly known as the Tertiary Ideographic Plane (TIP)
  • CJK Compatibility Ideographs in the Basic Multilingual Plane (U+F900-U+FAFF)
@yisibl
Copy link

yisibl commented Apr 18, 2024

Should CJK Compatibility Ideographs be abandoned?

@xfq
Copy link
Member Author

xfq commented Apr 21, 2024

Should CJK Compatibility Ideographs be abandoned?

There seem to be some standard Chinese characters in CJK Compatibility Ideographs. @eisoch?

@xfq xfq added the i:encoding Characters & encoding label Apr 22, 2024
@AmeroHan
Copy link
Contributor

AmeroHan commented Aug 31, 2024

U+3007 (〇) IDEOGRAPHIC NUMBER ZERO in CJK Symbols and Punctuation (U+3000..U+303F) is also considered a hanzi by
standards, dictionaries and UCS according to 「〇」算不算汉字? - 知乎 (Is “〇” a hanzi? - Zhihu).

Additionally, outside the list @xfq provided above, there are some other characters with script property “Han” in UCD, such as U+3005 (々) IDEOGRAPHIC ITERATION MARK and Suzhou numerals (U+3021..U+3029). Should they be listed?

@xfq
Copy link
Member Author

xfq commented Oct 17, 2024

We should probably list the various character sets defined by each region too, like https://www.w3.org/TR/hani-lreq/#h_script_overview

@r12a
Copy link
Contributor

r12a commented Oct 17, 2024

You probably need to revisit this list now that Unicode 16.0 has been released.

Also, there are other Unicode blocks that may need mentioning if mention is made of compatibility block (which i suggest should be mentioned separately, if at all), such as CJK radicals, CJK strokes, kanbun, etc. See a list at https://www.unicode.org/charts/ under East Asian Scripts.

It's probably best to clearly define what types of character should go in the list, and to do that to first be clear about why we're listing characters (ie. who will use the list, and for what).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i:encoding Characters & encoding
Projects
None yet
Development

No branches or pull requests

4 participants