Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add utf8 support #24

Merged
merged 3 commits into from
Dec 21, 2023
Merged

feat: add utf8 support #24

merged 3 commits into from
Dec 21, 2023

Conversation

JarKz
Copy link
Contributor

@JarKz JarKz commented Jul 13, 2023

That solves #14.

For utf8 support was used library from luarocks.org: luautf8. It contains several string methods and utf8 specific methods implemented by the lua specification.

Thanks to this, the code has not more complexity and was simplified from col indexing to offset indexing. With it methods logic was simplified. The reason why I refactored some code was byte width in utf8. In first particular range utf8 contains one byte, in second particular range utf8 contains two bytes and so it goes on. In this case indexing by col more complex than indexing by offset.

Also I changed in README.md in installation section, adding luarock dependency. packer.nvim have native rocks installation, but for lazy.nvim need add nvim_rocks dependency that installs luarock dependency.

@chrisgrieser
Copy link
Owner

chrisgrieser commented Jul 14, 2023

Hi,

thanks for the effort of finding a solution here. However, I hesitate to merge this for two reasons:

  1. I am somewhat afraid that supporting these edge cases might in turn break some other edge cases as a side effect, so I'd like to add some tests before merging.
  2. I don't know how I feel about adding two dependencies to cover an edge case, since this would mean that a lot of people who do not need this dedicated utf8 support would still need to install the dependencies, which feels somewhat contradictory to the otherwise lightweight nature of this plugin. So I'd at least make this these dependencies somehow optional.

@JarKz
Copy link
Contributor Author

JarKz commented Jul 14, 2023

I understand you.

  • For the first reason, I will wait before adding tests and I will use my fork at the same time.
  • For second reason, I think it's possible to add the other branch named utf8-support. That way people using default lightweight spider won't get update about UTF-8 because they don't need it. And if peoples want to add UTF-8 support then they can switch branch and add some dependencies. I don't know about other best solutions because doubling code with two different situations is not good for me.

@chrisgrieser
Copy link
Owner

I think it's possible to add the other branch named utf8-support. That way people using default lightweight spider won't get update about UTF-8 because they don't need it. And if peoples want to add UTF-8 support then they can switch branch and add some dependencies. I don't know about other best solutions because doubling code with two different situations is not good for me.

Hmm, maintaining two branches also does not look ideal to me. Ideally, there is some niche nvim API which deals with strings which might help. Otherwise, a solution might be to just take out parts of the utf8 library which are relevant for this plugin? Not sure. 🤔

@JarKz
Copy link
Contributor Author

JarKz commented Jul 17, 2023

All right. Today I spent some time looking for find other "native" solutions. The results are not very encouraging. In Vim and Neovim regexs don't contain special "unicode classes" that starting with \p{C} and instead of C letter we can insert Lu or Ll (Uppercase_Letter or Lowercase_Letter). But it's not necessary, because some external tools can automatically check UTF-8 with basic like [:upper:] or similar sets. This problem is related to the fact that Vim or Lua regexes are unique. It has also been discussed here.

What's left? There are two ways that I found:

  1. to figure out how to merge two solutions: basic and utf8support. Here are some ways to merge: merge code and check accessibility of lua plugin luautf8 using the pcall method or add another branch, or other ways which I didn't find but might be more effective.
  2. use "built-in" support via shell command grep or rg (use second command only if it exists). Also need call the pcall method to check the accessibility of the rg command and use it if it is exists, otherwise use grep. But, here you need to rewrite the regex patterns.

P.S. For the first way, I have thought that we could use metatable with needed string functions: "find", "gmatch", "reverse", "next_chat", "prev_char". And lua will call function that override the metatable if lua plugin luautf8 is present. But in this case need nicest solution with checking and picking between "offset" and "col" 🤔

For utf8 support was used library from luarocks.org: luautf8. It
contains several string methods and utf8 specific methods implemented
by the lua specification.

Thanks to this, the code has not more complexity and was simplified from
`col` indexing to `offset` indexing. With it methods logic was
simplified. The reason why I refactored some code was byte width in
utf8. In first particular range utf8 contains one byte, in second
particular range utf8 contains two bytes and so it goes on. In this case
indexing by `col` more complex than indexing by `offset`.

Also I changed in README.md in installation section, adding luarock
dependency. packer.nvim have native rocks installation, but for
lazy.nvim need add `nvim_rocks` dependency that installs luarock
dependency.
There solve the problem: optional `libutf8` lua library.

Users who don't need UTF-8 support, because they only work with latin1
symbols, can continuously use this plugin without any problem. And they
can add UTF-8 support if needed in an easy way: just add the dependency
in nvim plugin manager.

There is solution: adding metatable with universal string functions like
"reverse", "find", "gmatch" and others. And use `pcall` method for
detection of the lua accessibility plugin `luautf8`. Override string
functions in metatable if plugin is exists. So there is no dependency
confusion.

I've also changed the README.md by restoring the "Installation" section
and adding the "UTF-8 support" section.
@JarKz
Copy link
Contributor Author

JarKz commented Dec 21, 2023

Hi again!

As you see, I did update this fork to upstream and added 'optionality' of UTF-8 plugin. Before telling my opinion, here's changes:

  1. the fork of this nvim plugin works same as main (but test needs);
  2. if lua detects that lua-utf8 available, then string functions 'switches' to UTF-8 based functions;
  3. moved from col-based computation to offset-based computation.

I'll begin from last clause:

  • Firstly, it's more understandable than using columns.
  • Secondly, UTF-8 char may contain at least one byte and "column" is incorrect for string, because any UTF-8 string may contain more bytes than characters.
  • No need hard thinking about adding or subtracting indices for 'right' position.

Second clause. This detection allow us to use lightweight plugin, when UTF-8 is no need and turn on if needed. As you see in code, I implemented str_func, which contains all string functions. And one simple function pcall, which detect availability of 'lua-utf8' plugin. If detected, then starts remapping table str_func.

First clause is derived from second. But, need tests, because I'm not sure, that I moved right from 'col-based' indices to 'offset-based'.

I don't see tests. Where is they? Also I don't see your issue about test.

Well, there is my opinion. I used my modification (with UTF-8) of this plugin and I don't see any errors or exceptions. What do you think about this?

@chrisgrieser
Copy link
Owner

chrisgrieser commented Dec 21, 2023

Hi! Sorry for also putting this off for so long myself. Yeah, I didn't get around to add tests, partly because I maintain too many plugins (🙈), and also because there were no bug reports for any new edge cases, meaning I didn't have any reason to change the apparently stable motion mechanics.

Second clause. This detection allow us to use lightweight plugin, when UTF-8 is no need and turn on if needed. As you see in code, I implemented str_func, which contains all string functions. And one simple function pcall, which detect availability of 'lua-utf8' plugin. If detected, then starts remapping table str_func.

This is a smart solution, neat! Yeah, I think with this, we can safely merge. Thanks for the work!

@chrisgrieser chrisgrieser merged commit b44e256 into chrisgrieser:main Dec 21, 2023
1 check failed
@chrisgrieser
Copy link
Owner

@JarKz Just a minor remark: You had feat! in one of your commit messages, indicating a breaking change, even though the nice automatic switch to utf-8 actually is fully backwards compatible, meaning you should leave out the !.

Normally no big deal, but in lazy.nvim, commit messages with a ! are emphasized, so I reworded the commit message to avoid confusing users.

@JarKz
Copy link
Contributor Author

JarKz commented Dec 21, 2023

Oh, I missed this! Thank you for remark!

Also, please, see your refactored code. It has a problem with the table stringFuncs. At the moment, when lua detect availability of 'lua-utf8' and the cycle will stop by cause "pair function got nil", because initially the table stringFuncs is nil.
image

@JarKz JarKz changed the title feat!: add utf8 support feat: add utf8 support Dec 21, 2023
@chrisgrieser
Copy link
Owner

oops, you are right, fixed

@JarKz JarKz deleted the utf8support branch December 21, 2023 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants