Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly express that any core-strings are in UTF-8 #1788

Open
375gnu opened this issue Jul 21, 2023 · 2 comments
Open

Explicitly express that any core-strings are in UTF-8 #1788

375gnu opened this issue Jul 21, 2023 · 2 comments
Labels
question Further information is requested

Comments

@375gnu
Copy link
Collaborator

375gnu commented Jul 21, 2023

Celestia runs on different platform with different locales. While most of desktop GNU/Linux installations prefer UTF-8 locales, some other systems, e.g. older Windows, prefer pure 8-bit encodings such as CP1251. On the other hand our core code always assumes that all strings are in UTF-8, so in win32 frontend we have conversion functions here and there.

C++ 20 introduced a new primitive type: char8_t (char8_t — type for UTF-8 character representation, required to be large enough to represent any UTF-8 code unit (8 bits). It has the same size, signedness, and alignment as unsigned char (and therefore, the same size and alignment as char and signed char), but is a distinct type) and a new std::basic_string<T>/std::basic_string_view<T> specializations using char8_t: std::u8string and std::u8string_view.

So my proposal is too use this types in our core routines and use std::string and std::string_view only in frontends, i.e. there where non-UTF-8 characters can be used. Of course as we target C++17 we need to implement required types ourselves.

@ajtribick @levinli303 what do you think?

@375gnu 375gnu added the question Further information is requested label Jul 21, 2023
@ajtribick
Copy link
Collaborator

ajtribick commented Jul 21, 2023

I think the usual convention pre-C++20 is to use std::basic_string<unsigned char, CustomTraitsType> for this purpose, which makes sense to me, then ensure we use the various workarounds detailed in P2513R4 Compatibility and Portability Fix.

Is it too early to bump the required standard to C++20?

@375gnu
Copy link
Collaborator Author

375gnu commented Jul 21, 2023

Is it too early to bump the required standard to C++20?

Definitely. I suppose we can switch (if we need) not earlier than 2025. Personally I want only designated initializers and maybe concepts but sometimes they're too ugly (infamous requires requires) and maybe modules, but gcc doesn't support them.

I think the usual convention pre-C++20 is to use std::basic_string<unsigned char, CustomTraitsType> for this purpose, which makes sense to me, then ensure we use the various workarounds detailed in P2513R4 Compatibility and Portability Fix.

I hoped such std::basic_string<unsigned char, CustomTraitsType> can be compatible with char8_t, but it seems not, maybe in C++26 they fix all issue or make everything so bad that rewriting in Rust/Zig/Carbon will be the best solution.

But anyway it makes sense to evaluate std::basic_string<unsigned char, CustomTraitsType>, especially taking into account that char on some platforms is signed while on others it's unsigned so this may lead to weird bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants