Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do with Unicode strings #1

Open
Ashvin-Ranjan opened this issue Jan 2, 2022 · 0 comments
Open

What to do with Unicode strings #1

Ashvin-Ranjan opened this issue Jan 2, 2022 · 0 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@Ashvin-Ranjan
Copy link
Owner

The issue

Currently EMP removes all non-ascii characters from its strings using this line of code, which might make things harder when converting because both JSON and NBT support Unicode of some variety.

JSON

In JSON:

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.

From json.org

There are some useful things to note from this definition, as though it might not work to include UTF-8 or UTF-16 because of the fact that strings over length 16 cannot have the byte 00000100 in it otherwise it will terminate prematurely, messing up the rest of the decoding, we can still include \b, \f, \n, \t, and \r as none of those conflict with the current system.

NBT

According to wiki.vg NBT uses Modified UTF-8, this may be able to be modified further to disallow the usage of the byte 00000100.

@Ashvin-Ranjan Ashvin-Ranjan added help wanted Extra attention is needed question Further information is requested labels Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant