Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when saving in a Windows path localised region #474

Open
ie-rosie opened this issue Oct 15, 2023 · 11 comments
Open

Error when saving in a Windows path localised region #474

ie-rosie opened this issue Oct 15, 2023 · 11 comments

Comments

@ie-rosie
Copy link

ie-rosie commented Oct 15, 2023

Hello,

I stumbled across this error recently and I tried to search for a solution and I think that I might have found one, even though I don't know what exactly I did.
Firstly, when I try to save a text file on Windows in a path that contains Romanian characters (ă, î, â, ș, ț) such as C:\Users\user\ASE\Controlul statistic al calității, the app will give an invalid encoding error message.
I tried to edit the config files, but I quitted shortly after, because I don't know Lua programming.
After that, on the Github i found this issue, where the error in question was detailed. I followed the advice from there #367 (comment) and tried to change the format and region of the OS to Romania and even allow UTF-8, but it didn't help.
image
After that, I searched the manual and found this disclaimer in the Windows Note section: The editor can only open files whose names contain characters in the system’s encoding (e.g. CP1252 for English and most European languages).
So I changed the Unserialize buffers function line 71 in the session.lua config file to
not_found[#not_found + 1] = buf.filename:iconv('ASCII', _CHARSET)
And after this tweak I can edit and save files in paths that contain special characters.
The only downside is that Textadept displays an invalid encoding(s) warning, which doesn't really affect the workflow in the app.
image
Maybe the message can disappear after modifying the Serialize buffers function in the session.lua config file, but I won't play in the settings anymore as I don't have the expertise needed for this.

PS: It is my first creation of an issue, so the text might not work as intended.
Edit: Actually, the text works as intended. :)

@orbitalquark
Copy link
Owner

Sorry that you're experiencing this issue :( Your workaround works because I think that your Romanian characters are in the extended ASCII encoding, which Lua can handle when it comes to I/O (as mentioned in your referenced issue). However, when it comes to display (UTF-8), iconv/Textadept does not know how to make the conversion, hence your screenshot.

If you revert your workaround, do things change if you disable that beta UTF-8 option in the Windows Region Settings? Also, please open the command entry (Tools > Command Entry) and type _CHARSET both with the UTF-8 option enabled and then disabled (please restart Textadept after making changes). I'd like to know what the results are. Perhaps we can identify a better workaround.

Thanks for your patience despite not knowing Lua!

@georgeraraujo
Copy link

Hi @orbitalquark ,
Would lua-unicode be a possible avenue to overcome this limitation? As per the Wireshark documentation,

Wireshark for Windows uses a modified Lua runtime (lua-unicode) to support Unicode (UTF-8) filesystem paths. This brings consistency with other platforms (for example, Linux and macOS).

@orbitalquark
Copy link
Owner

Thanks for the link. It may be possible to use that or something similar.

@ie-rosie
Copy link
Author

Hello! Sorry for the late response!

With UTF-8 locale option enabled I have this as a result from _CHARSET CP65001 and without UTF-8 enabled I have CP1250 (using both ASCII and UTF-8 in line 71 of Unserialize buffers function of session.lua module).

I forgot to mention that the version I am using is 12.1.0.0 Stable installed using the Scoop manager.

I also used the _CHARSET command in the Nightly version of the app from 18th October. When I used it without UTF-8 enabled I got CP1250 (using both ASCII and UTF-8 in line 71 of Unserialize buffers function of session.lua module) as a result and with it enabled I got an Initialiation Error when I first opened the app. There are invalid encodings at line 71, where I have the function convert to ASCII. When running the _CHARSET command I got CP65001 as a result. And with UTF-8 changed in line 71 of the module I got CP65001 from _CHARSET.

@georgeraraujo
Copy link

Thanks for the link. It may be possible to use that or something similar.

Got it to work (I guess).

Here is こんにちは世界.lua:

print("Hello World")

With lua-5.4.2_Win64_bin.zip from LuaBinaries:

C:\TEMP\lua-5.4.2_Win64_bin>lua54 こんにちは世界.lua
lua54: cannot open ???????.lua: Invalid argument

With lua-unicode:

C:\TEMP\lua-unicode-master-5.4\lua-5.4.6\build64\lua-5.4.6-unicode-win64-vc14>lua54 こんにちは世界.lua
Hello World

@orbitalquark
Copy link
Owner

Hello! Sorry for the late response!

No problem.

With UTF-8 locale option enabled I have this as a result from _CHARSET CP65001 and without UTF-8 enabled I have CP1250 (using both ASCII and UTF-8 in line 71 of Unserialize buffers function of session.lua module).

When you disable the UTF-8 locale option, and without your Unserialize buffers change, do you still get an invalid encoding encoding error message when you try to save a text file with Romanian characters in its filename?

@ie-rosie
Copy link
Author

Yes, I do. Bellow are the messages. In the first one it changes ț.txt with ?.txt. And the same thing happens with saving a file in a path with Ro characters. Line 194 is from LuaDoc is in core/.buffer.luadoc function in file_io.lua module.

_C:\Users\user\scoop\apps\textadept\current/core/file_io.lua:194: C:\Users\user\Desktop\?.txt: Invalid argument_

_C:\Users\user\scoop\apps\textadept\current/core/file_io.lua:194: C:\Users\user\MEGA\Controlul statistic al calit�?ii\Proiect\test.txt: Invalid argument_

@orbitalquark
Copy link
Owner

Okay, thanks for the extra information. I'll put it on my TODO list, but it's low priority since I've looked into this before with Greek and came up empty (the issue you originally linked to). I'm glad you have a workaround in the meantime, despite how ugly it might be. I do appreciate your report though!

@georgeraraujo
Copy link

FWIW, Gerald Combs, creator and lead developer of Wireshark, committed an update to lua-unicode for Windows to Lua 5.4 (and added ARM64 support).

@orbitalquark
Copy link
Owner

Interesting, thanks! I can compile Textadept with it and it passes the automated unit test suite. I'll just have to test it manually when I have some time.

@orbitalquark
Copy link
Owner

orbitalquark commented Oct 6, 2024

I looked into lua-unicode, and unfortunately it's more than just Lua's I/O that needs to be reworked; LuaFileSystem (lfs) needs to be updated with wide-character support too. Process spawning too, perhaps.

There's also the issue of how filenames should be handled coming from and going to the Open/Save File dialogs. Right now they're converted to the local code page (_CHARSET). If they were to be converted to UTF-8, I don't know if or how Windows handles translation from UTF-8 to the local code page, particularly when saving files. For example, consider a UTF-8 character that has a representation in the local code page. Which version should you pass to Lua for I/O? Or does Windows handle it in such a way that it doesn't matter? I don't have the expertise to answer and test this.

Simple tests with pure UTF-8 filenames seemed to be okay (ignoring lfs), but I'm unable to confidently say that filenames in local code pages are handled correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants