Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keybindings (take two) #129

Open
TTWNO opened this issue Feb 29, 2024 · 7 comments
Open

Keybindings (take two) #129

TTWNO opened this issue Feb 29, 2024 · 7 comments
Milestone

Comments

@TTWNO
Copy link
Member

TTWNO commented Feb 29, 2024

Oh the wonderful works of binding keys in a Wayland world! Especially those pesky activation keys that have no equivalent outside of screen readers (see that thread and its links for further fun rabbit holes!)

Let's ignore the proper way to do things for a moment, and consider again, odilia-input, a daemon which listens to keys on evdev, then sends OdiliaCommands over a socket, which Odilia then executes an action for.

Here's the idea, it could not possibly go wrong! (LOL!)

  1. When the activation key (Capslock/Ins/KP_Ins) is pressed, start listening for keys.
  2. Do not pass through ANY keys while activation key is held.
  3. Add any pressed keys to a vec.
  4. If it the pressed keys vec matches a full binding (like Capslock+h), run that command.
  5. Remove any released keys inside the vec, from it (even if Capslock has been released).

Can anyone show me a case where this would cause any of these bad things:

  1. Unexpected appearance of holding a key by an application (they only see what was not swallowed),
  2. Pressing the wrong order of keys will activate the shortcut, but cause weird state problems, or
  3. would have issues working on Wayland, X11, or future display tech that re-structures our thinking around keybindings again?

cc @albertotirla

@TTWNO TTWNO added this to the 0.2.0 milestone Feb 29, 2024
@albertotirla
Copy link
Member

the keyboard capturing seemns sound, however I have afew concerns in regards to the viability of sending commands over a socket in the way we did before, which mainly revolve around the fact that screenreaders are extremely versatile and adaptive applications, and therefore modes are somewhat fluid and depend on context from the accessibility tree, which a static configuration for a daemon, as we currently do it, can't account for. In the name of completeness and transparency, I sent you a matrix message, which I shall quote below:

Essentially, the daemon simply can't send the screenreader only commands, because the configuration is static, and these are very much dynamic, and furthermore, only the screenreader knows best what to do in a certain mode. So then, I do believe that the key shortcut has to be passed to the sr after all, and the sr then consults its own ways of doing things. Sure, we can do it the keyboard daemon with commands way, but then we have to handle seamingly nonsensical commands for some modes. For example, go to next heading makes no sense in focus mode, but we have to handle it anyway, to retype the character h. And even so, the daemon has to know whether to consume the key altogether, or let it pass through to the app but then still send the action to the sr, etc. Plus, the daemon can't know all the intricacies of what to do in what modes, including the fact that the modes can be slightly fluid, aka the meanings of certain things can change with context, for example the arrow keys, even if in browse mode, should be passed through if automatic mode switching is enabled and I'm on an edit box. Hell, even automatic mode switching and its implementation, depending on the current control and the neighboring ones in some cases, simply can't be done in a static configuration with a daemon like hkd. So, sending the keys through to odilia, via that socket or odiliactl, would be the best action to take for now, at least in my opinion.

in addition to the above quoted message, I might add that certain keys, like keystrokes done without the designated odilia modifier, the modifiers by themselves, enter by itself, tab, etc should always be passed through, even if a copy of the keybinding is sent to odilia. For example, at least in focus mode, the arrow keys should always be passed to an application, not only because games and other non-standard applications need this, but it's certainly a factor. The way nvda does keyboard hooking may be of use here, because jaws certainly does it wrong.
Furthermore, a sleep mode is also required, so that odilia functions as normal, but every key is passthrough and not triggering any odilia function. To avoid additional latency, this should be built inside the daemon itself, at which point nothing is transmitted over the socket, decreasing the context switches and all the added cost caused by monitoring and writing to a socket.

@albertotirla
Copy link
Member

O, and one more thing: do we plan on supporting key coards? for example, think of emacs key combos, c-e, c-s and similar. If so, we would have a huge advantage, because conflicting key combos would rarely be a thing, as adons would be able to have their own prefix activated by a part of the coard, after which every other key can be used without fear of interfering with the screenreader's main set of keyboard shortcuts, same for those of other adons. If the answer is yes, we might have to rethink a bit the way we capture and treat key combos, and that's why I'm asking

@TTWNO
Copy link
Member Author

TTWNO commented Mar 4, 2024

do we plan on supporting key coards?

I'd like to have the option. This would make the algorithm for capturing/releasing keys to be more complex. But yes, I'm open to it being optional.

@albertotirla
Copy link
Member

albertotirla commented Mar 4, 2024

ok, here's my proposal in that case:

  • a coard consists of one or more keybindings captured as described in your first comment, followed by or mixed with other keys, including alphanumerical ones, function keys, and other special keys
  • the way you recognise a coard is as follows:
    • you have a list of all the defined coards in a certain point in time. This can be combined from different sources, for example static odilia configuration, dynamic adon reported ones, overlayes defined by the user for specific applications, etc
    • when the user successfully types a binding, add it to the coard array, let's call it init_coard
    • search in the list of coards you get, for anything beginning with what the user typed. This could be done by, for example, stringifying each coard and applying a string distance algorythm on that, or a simple x.starts_with(init_coard)
    • for every match found, add it to a hash set of results, call it result set
    • while not timeout and result set is not empty
      • if user pressed any other key, including a binding, except the quick hatch key, for example escape, add captured binding or key to init coard. If the hatch is pressed, all key events captured up to that point are being passed on to the system
      • for each coard in result set:
        • optionally, tell the user what options they have remaining in the result set, possibly as long as the size of the set isn't greater than a specified value, as well as the next key or binding that should be pressed in the coard to make it match closer. For example, if you have "[c-s, c-e, v], [c-s, c-p, e]", and the user already pressed c-s, the speech might say: options are edit -> c-e, part from cursor -> c-p. If the user pressed c-e, the reader might say: single match, evaluate ->v. This is to be done in order to facilitate exploration while understanding how the key combos work, but that's entirely optional and not required for this implementation to work
        • compare to init coard. If the answer is negative, remove coard from result set
      • if result set has a single element, you found it. Terminate, return the found coard, or perhaps the action associated with it
    • if you got to this point, there are no coards matching what the user pressed, or the user took too long to do so. Speak no action matching <coard>, where <coard> is the stringified and user-friendly representation of the coard
      I hope that's a relatively good algorythm, as always, awaiting feedback.

@TTWNO
Copy link
Member Author

TTWNO commented Mar 7, 2024

the daemon has to know whether to consume the key altogether, or let it pass through to the app but then still send the action to the sr,

Do you know any case where this would be needed?

@albertotirla
Copy link
Member

a good example is arrow keys. We want to be notified of them, but in edit boxes and webpages we may have to process them, while in games and native UI apps we don't. Since we can't probably do the decision dynamically every time, passthrough is better

@TTWNO
Copy link
Member Author

TTWNO commented Apr 17, 2024

Fair enough! Checks out for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants