-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grapheme clusters and emoji sequences #117
Comments
Yes. It depends on the terminal and its setting. Also, I have several questions:
$ bleopt emoji_@ char_width_mode
$ declare -p _ble_util_c2w_auto_width
$ ble/util/s2chars 🎃
$ echo "${ret[*]}"
$ for c in "${ret[@]}"; do ble/util/c2w "$c"; echo "w=$ret"; done
|
I'm testing in Gnome-terminal, Konsole, Terminator, Alacritty and Kitty
I assume you're not exactly asking if the font looks as an emoji, but I have the CBDT/CBLC ttf-twemoji font which renders most emojis in terminals correctly out of the box (only Kitty prints flag emojis correctly, but maybe it's my configuration) with ble detached. Now, when ble is attached and the emoji is pasted, in Konsole, Kitty and Alacritty the character shape is correctly printed after pasting and after typing. In Gnome-Terminal and Terminator, the shape is also printed correctly if it's not preceded by a single The character shape of the
In Konsole, Alacritty and Kitty yes; in Gnome-Terminal and Terminator no.
Same output in all terminals; when doing it with
Line wrapping problems with flag emojis only occur in Konsole, but not in the other terminals because Konsole is the only terminal with the reprinting problem with flag emojis. Btw, line wrapping problems also don't occur with ♠♥♦♣♟, they only appear along with the reprinting problem. |
OK! Thank you for your answers!
Does it mean an emoji occupy one cell in GNOME Terminal and Terminator? If so, you need to set Edit: I've tried GNOME Terminal and Terminator, but they also behave as
Ah, yes. I actually wanted to confirm that
The outputs are expected ones except for Optionally, you may set
Hmm, OK. I think it is also related to the terminal behavior.
But some terminals may behave as |
I rechecked, and actually most emojis like 🎃 are 2 cells long in all terminals, I was trying with the
Yeah!! That solved the reprinting problem with most emojis, thanks!! I forgot about it in
Oh thanks for the suggestion, but changing the version didn't seem to solve anything itself. I'll keep it up to date in any case.
Oh, maybe I was referring to line wrapping of the autocompletion. When an emoji has the reprinting problem and the autosuggestion exceeds the last column, it reprints it below the current line and messes up the cursor position as well, something like
As
Thanks again |
Yeah, treatment of grapheme clusters and their components is the are that the behavior of terminals and applications differ from one another the most. The different levels of conformance to the Unicode standard come from the technical difficulty of implementing the full Unicode specification.
Well, they are all related to the grapheme clusters that
Hmm, I think that is kitty's glitch. Maybe I can support grapheme clusters someday, but I will never support kitty's behavior... I also checked the behavior of other shells' line editors. It seems that readline recognizes the grapheme clusters and works well in GNOME Terminal (but not in kitty). Zsh avoids handling the grapheme clusters directly but instead shows an ASCII representation of variation selector as |
Even all emojis inside quotes dissapearing? I also found that when that happens, if an autosuggestion appears inside those quotes, the emoji reappears, but well if it's as you say, there's not much to do.
I did notice that |
OK. Actually, I cannot reproduce this behavior in my GNOME Terminal. What is the version of your GNOME terminal? Maybe I also try Terminator later.
Hm, OK. Zsh is clever enough to switch the behavior depending on the terminal. My naive guess is that Konsole, Alacritty and Terminator implement their own width determination of emoji characters and sequences, but GNOME terminal uses the system
OK, thanks for the information. Yeah, this is one of the messiest areas in terminals. I remember the discussion at Terminal WG
I currently have two different approaches in my mind. (a) One approach is to treat clusters as one character in text editing. For example, pressing delete after a grapheme cluster deletes the entire cluster, (b) Another approach is that we don't change the text editing but just change how they are laid out in terminals. In this case, pressing delete after e.g.
Also, I need to support grapheme clusters and emoji sequences in prompts separately. The layout of prompts is treated in different logic because they are static texts, unlike the command line strings. |
ble version: 0.4.0-devel3+301d40f
Bash version: 5.1.8(1)-release (x86_64-pc-linux-gnu)
Emoji font: ttf-twemoji 13.0.1-1
This issue is a bit different depending on the terminal and font patches, but I'll try to explain it; unfortunately I couldn't get my recordings attached on this post 🙁
After an emoji is typed (or pasted), typing more characters (or backspacing) will make the first character in the current word to be reprinted and the character typed to be swapped with the previous one, so if one types
hello world 🎃
, moves the cursor to ther
and typesa
for instance, thew
will be reprinted and it will showhello wwoarld 🎃
. Typing at the beginning of a word treats the previous space separated field as its word, meaning if one types,
before thew
inhello world 🎃
(←3 spaces between hello and world), it will showhhello , world 🎃
.This reprinting is not an editable character, it's just a printing; and as said before, typing or backspacing when the emoji is around will continue to cause the behavior, but when the emoji is deleted it will no longer cause the issue. If the statement with the emoji is executed and is now in ble's history, when the autocompletion for that statement shows up, it will cause the issue again.
Also, as soon as another word is detected after the word with the emoji, the issue dissapears, so it one would type
🎃 bye
, the issue dissapears after typing theb
, so one would see🎃🎃🎃 bye
, and keep typing normally. A single quote after the emoji (🎃'
) will not make spacing detect a new word. A double quote after the emoji (🎃"
) will make the problem stop until the closing double quote is typed. If the emoji is preceded by an opening double quote however, moving the cursor will also cause the reprinting, so if one typesecho "🎃
and then moves the cursor backwards, it would seeecho """"""""🎃
where theecho ""
is just a printing and the actual characters were overwritten with"""""
Now, this is the behavior with most emojis, but with some like♠️ ♥️ ♦️ ♣️ ♟️, the character typed gets moved forward and the previous rest of the word gets printed behind (and the cursor gets moved too), so from
hello world ♟️
, typinga
after ther
would result inhelloworlad ♟️
, and so on. This doesn't happen with their ♠♥♦♣♟ counterparts.Finally, flag emojis (e.g. 🇧🇬) don't have this problem in most terminals I tested, which is interesting since the 2 emojis that compose a flag emoji (e.g. 🇧 🇬) do have the problem individually
I'll leave the following outputs from
cat -A <<< "[EMOJI]"
andbat -A <<< "[EMOJI]"
🎃
M-pM-^_M-^NM-^C$
\u{1f383}␊
🙄
M-pM-^_M-^YM-^D$
\u{1f644}␊
😱
M-pM-^_M-^XM-1$
\u{1f631}␊
👻
M-pM-^_M-^QM-;$
\u{1f47b}␊
♠️
M-bM-^YM- M-oM-8M-^O$
\u{2660}\u{fe0f}␊
♥️
M-bM-^YM-%M-oM-8M-^O$
\u{2665}\u{fe0f}␊
♦️
M-bM-^YM-&M-oM-8M-^O$
\u{2666}\u{fe0f}␊
♣️
M-bM-^YM-#M-oM-8M-^O$
\u{2663}\u{fe0f}␊
♟️
M-bM-^YM-^_M-oM-8M-^O$
\u{2663}\u{fe0f}␊
♠
M-bM-^YM- $
\u{2660}␊
♥
M-bM-^YM-%$
\u{2665}␊
♦
M-bM-^YM-&$
\u{2666}␊
♣
M-bM-^YM-#$
\u{2663}␊
♟
M-bM-^YM-^_$
\u{265f}␊
🇧🇬
M-pM-^_M-^GM-'M-pM-^_M-^GM-,$
\u{1f1e7}\u{1f1ec}␊
🇧
M-pM-^_M-^GM-'$
\u{1f1e7}␊
🇬
M-pM-^_M-^GM-,$
\u{1f1ec}␊
The text was updated successfully, but these errors were encountered: