feat: add utf8 support #24

JarKz · 2023-07-13T05:57:22Z

That solves #14.

For utf8 support was used library from luarocks.org: luautf8. It contains several string methods and utf8 specific methods implemented by the lua specification.

Thanks to this, the code has not more complexity and was simplified from col indexing to offset indexing. With it methods logic was simplified. The reason why I refactored some code was byte width in utf8. In first particular range utf8 contains one byte, in second particular range utf8 contains two bytes and so it goes on. In this case indexing by col more complex than indexing by offset.

Also I changed in README.md in installation section, adding luarock dependency. packer.nvim have native rocks installation, but for lazy.nvim need add nvim_rocks dependency that installs luarock dependency.

chrisgrieser · 2023-07-14T11:01:18Z

Hi,

thanks for the effort of finding a solution here. However, I hesitate to merge this for two reasons:

I am somewhat afraid that supporting these edge cases might in turn break some other edge cases as a side effect, so I'd like to add some tests before merging.
I don't know how I feel about adding two dependencies to cover an edge case, since this would mean that a lot of people who do not need this dedicated utf8 support would still need to install the dependencies, which feels somewhat contradictory to the otherwise lightweight nature of this plugin. So I'd at least make this these dependencies somehow optional.

JarKz · 2023-07-14T11:30:31Z

I understand you.

For the first reason, I will wait before adding tests and I will use my fork at the same time.
For second reason, I think it's possible to add the other branch named utf8-support. That way people using default lightweight spider won't get update about UTF-8 because they don't need it. And if peoples want to add UTF-8 support then they can switch branch and add some dependencies. I don't know about other best solutions because doubling code with two different situations is not good for me.

chrisgrieser · 2023-07-14T17:56:58Z

I think it's possible to add the other branch named utf8-support. That way people using default lightweight spider won't get update about UTF-8 because they don't need it. And if peoples want to add UTF-8 support then they can switch branch and add some dependencies. I don't know about other best solutions because doubling code with two different situations is not good for me.

Hmm, maintaining two branches also does not look ideal to me. Ideally, there is some niche nvim API which deals with strings which might help. Otherwise, a solution might be to just take out parts of the utf8 library which are relevant for this plugin? Not sure. 🤔

JarKz · 2023-07-17T08:41:28Z

All right. Today I spent some time looking for find other "native" solutions. The results are not very encouraging. In Vim and Neovim regexs don't contain special "unicode classes" that starting with \p{C} and instead of C letter we can insert Lu or Ll (Uppercase_Letter or Lowercase_Letter). But it's not necessary, because some external tools can automatically check UTF-8 with basic like [:upper:] or similar sets. This problem is related to the fact that Vim or Lua regexes are unique. It has also been discussed here.

What's left? There are two ways that I found:

to figure out how to merge two solutions: basic and utf8support. Here are some ways to merge: merge code and check accessibility of lua plugin luautf8 using the pcall method or add another branch, or other ways which I didn't find but might be more effective.
use "built-in" support via shell command grep or rg (use second command only if it exists). Also need call the pcall method to check the accessibility of the rg command and use it if it is exists, otherwise use grep. But, here you need to rewrite the regex patterns.

P.S. For the first way, I have thought that we could use metatable with needed string functions: "find", "gmatch", "reverse", "next_chat", "prev_char". And lua will call function that override the metatable if lua plugin luautf8 is present. But in this case need nicest solution with checking and picking between "offset" and "col" 🤔

For utf8 support was used library from luarocks.org: luautf8. It contains several string methods and utf8 specific methods implemented by the lua specification. Thanks to this, the code has not more complexity and was simplified from `col` indexing to `offset` indexing. With it methods logic was simplified. The reason why I refactored some code was byte width in utf8. In first particular range utf8 contains one byte, in second particular range utf8 contains two bytes and so it goes on. In this case indexing by `col` more complex than indexing by `offset`. Also I changed in README.md in installation section, adding luarock dependency. packer.nvim have native rocks installation, but for lazy.nvim need add `nvim_rocks` dependency that installs luarock dependency.

There solve the problem: optional `libutf8` lua library. Users who don't need UTF-8 support, because they only work with latin1 symbols, can continuously use this plugin without any problem. And they can add UTF-8 support if needed in an easy way: just add the dependency in nvim plugin manager. There is solution: adding metatable with universal string functions like "reverse", "find", "gmatch" and others. And use `pcall` method for detection of the lua accessibility plugin `luautf8`. Override string functions in metatable if plugin is exists. So there is no dependency confusion. I've also changed the README.md by restoring the "Installation" section and adding the "UTF-8 support" section.

JarKz · 2023-12-21T07:54:31Z

Hi again!

As you see, I did update this fork to upstream and added 'optionality' of UTF-8 plugin. Before telling my opinion, here's changes:

the fork of this nvim plugin works same as main (but test needs);
if lua detects that lua-utf8 available, then string functions 'switches' to UTF-8 based functions;
moved from col-based computation to offset-based computation.

I'll begin from last clause:

Firstly, it's more understandable than using columns.
Secondly, UTF-8 char may contain at least one byte and "column" is incorrect for string, because any UTF-8 string may contain more bytes than characters.
No need hard thinking about adding or subtracting indices for 'right' position.

Second clause. This detection allow us to use lightweight plugin, when UTF-8 is no need and turn on if needed. As you see in code, I implemented str_func, which contains all string functions. And one simple function pcall, which detect availability of 'lua-utf8' plugin. If detected, then starts remapping table str_func.

First clause is derived from second. But, need tests, because I'm not sure, that I moved right from 'col-based' indices to 'offset-based'.

I don't see tests. Where is they? Also I don't see your issue about test.

Well, there is my opinion. I used my modification (with UTF-8) of this plugin and I don't see any errors or exceptions. What do you think about this?

chrisgrieser · 2023-12-21T11:41:22Z

Hi! Sorry for also putting this off for so long myself. Yeah, I didn't get around to add tests, partly because I maintain too many plugins (🙈), and also because there were no bug reports for any new edge cases, meaning I didn't have any reason to change the apparently stable motion mechanics.

Second clause. This detection allow us to use lightweight plugin, when UTF-8 is no need and turn on if needed. As you see in code, I implemented str_func, which contains all string functions. And one simple function pcall, which detect availability of 'lua-utf8' plugin. If detected, then starts remapping table str_func.

This is a smart solution, neat! Yeah, I think with this, we can safely merge. Thanks for the work!

chrisgrieser · 2023-12-21T12:26:54Z

@JarKz Just a minor remark: You had feat! in one of your commit messages, indicating a breaking change, even though the nice automatic switch to utf-8 actually is fully backwards compatible, meaning you should leave out the !.

Normally no big deal, but in lazy.nvim, commit messages with a ! are emphasized, so I reworded the commit message to avoid confusing users.

JarKz · 2023-12-21T12:34:36Z

Oh, I missed this! Thank you for remark!

Also, please, see your refactored code. It has a problem with the table stringFuncs. At the moment, when lua detect availability of 'lua-utf8' and the cycle will stop by cause "pair function got nil", because initially the table stringFuncs is nil.

chrisgrieser · 2023-12-21T12:45:39Z

oops, you are right, fixed

chrisgrieser force-pushed the main branch from 71cc534 to 846c556 Compare December 21, 2023 00:23

JarKz added 3 commits December 21, 2023 10:31

chore(README.md): removed duplication of text

dd9f775

chrisgrieser merged commit b44e256 into chrisgrieser:main Dec 21, 2023
1 check failed

JarKz changed the title ~~feat!: add utf8 support~~ feat: add utf8 support Dec 21, 2023

JarKz deleted the utf8support branch December 21, 2023 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add utf8 support #24

feat: add utf8 support #24

JarKz commented Jul 13, 2023

chrisgrieser commented Jul 14, 2023 •

edited

Loading

JarKz commented Jul 14, 2023

chrisgrieser commented Jul 14, 2023

JarKz commented Jul 17, 2023

JarKz commented Dec 21, 2023 •

edited

Loading

chrisgrieser commented Dec 21, 2023 •

edited

Loading

chrisgrieser commented Dec 21, 2023

JarKz commented Dec 21, 2023 •

edited

Loading

chrisgrieser commented Dec 21, 2023

feat: add utf8 support #24

feat: add utf8 support #24

Conversation

JarKz commented Jul 13, 2023

chrisgrieser commented Jul 14, 2023 • edited Loading

JarKz commented Jul 14, 2023

chrisgrieser commented Jul 14, 2023

JarKz commented Jul 17, 2023

JarKz commented Dec 21, 2023 • edited Loading

chrisgrieser commented Dec 21, 2023 • edited Loading

chrisgrieser commented Dec 21, 2023

JarKz commented Dec 21, 2023 • edited Loading

chrisgrieser commented Dec 21, 2023

chrisgrieser commented Jul 14, 2023 •

edited

Loading

JarKz commented Dec 21, 2023 •

edited

Loading

chrisgrieser commented Dec 21, 2023 •

edited

Loading

JarKz commented Dec 21, 2023 •

edited

Loading