Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ssh responsiveness on slow Wi-Fi networks #222

Merged
merged 5 commits into from
May 2, 2024
Merged

Conversation

sshane
Copy link
Contributor

@sshane sshane commented May 2, 2024

Problem

Our traffic control routing consists of a parent mq qdisc that encompasses 5 hardware tx queues with a pfifo_fast attached to each. pfifo_fast is supposed to prioritize interactive traffic using the ToS/DSCP field, but it is unclear why it does not seem to prevent serious ssh stability issues when the device is uploading a file on a slow network.

I am guessing that we are saturing the one hardware queue we are currently using, making the pfifo_fast useless as it can't stuff any more interactive ssh packets into the single hardware queue as it waits for the file packets to send.

Running a speed test, both traffic goes out on one hardware queue (parent :3):

comma@comma-d7e84c3:/data/openpilot$ sudo tc -s qdisc ls dev wlan0
qdisc mq 0: root 
 Sent 158247322 bytes 154641 pkt (dropped 0, overlimits 0 requeues 1) 
 backlog 0b 0p requeues 1
!!!Deficit -4, rta_len=48
qdisc pfifo_fast 0: parent :5 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1240 bytes 10 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
!!!Deficit -4, rta_len=48
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
!!!Deficit -4, rta_len=48
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 158243956 bytes 154622 pkt (dropped 0, overlimits 0 requeues 1)   # everything is being sent on this hardware queue
 backlog 0b 0p requeues 1
!!!Deficit -4, rta_len=48
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
!!!Deficit -4, rta_len=48
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 2126 bytes 9 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

Attempted solution

I tried adding a custom prio qdisc with 5 bands to replace the default mq one, but with just the priomap alone, it still kept everything on the same band. This is when I learned that the WiFi driver was resetting every Linux packet priority to 0 before it even got to the prio qdisc because it assumes you won't mess with traffic control. It wants to keep all of the queue prioritization inside the driver:

https://github.com/commaai/agnos-kernel-sdm845/blob/737a024adaf5aedf2a50500939a13c6a2e7283d2/drivers/staging/qcacld-3.0/core/hdd/src/wlan_hdd_wmm.c#L1592

BTW the DSCP map is literally just:

dscp_map = {}
for dscp in range(WLAN_HDD_MAX_DSCP + 1):
  dscp_map[dscp] = dscp >> 3

Since the ssh default DSCP value is 0x10, that lumps it in with normal high bandwidth traffic of 0x0:

https://github.com/commaai/agnos-kernel-sdm845/blob/737a024adaf5aedf2a50500939a13c6a2e7283d2/drivers/staging/qcacld-3.0/core/hdd/src/wlan_hdd_wmm.c#L1512-L1513

Solution

Once I moved it to a different hardware queue via raising the ssh QoS field, the ssh connection remains much more responsive on heavily degraded AP conditions while uploading (200-300 Kbps).

…re at 4 Mbps and below. previously 300 Kbps meant no ssh packets from the device indefinitely
@sshane sshane marked this pull request as draft May 2, 2024 12:44
@sshane
Copy link
Contributor Author

sshane commented May 2, 2024

Confusingly, the WiFi driver treats the TOS packet octet as a DSCP (DiffServ), and pfifo_fast treats it as the original standard TOS (pre-DiffServ). Context: https://en.wikipedia.org/wiki/Differentiated_services

To go from full TOS packet octet to DSCP you do: (tos >> 2) && 0x3f. To get the original standard TOS field which is used by pfifo_fast you do (tos >> 1) & 0b1111. If we want to pick the first non-best effort or non-bulk access category hardware queue, then we should use TOS of 32 which puts us in this video band (voice is higher priority).

Just in case the pre-DSCP TOS bits matter to pfifo_fast (we are sending live video data), I then went up to find the first TOS field value that had the minimize delay flag, which was 36. Btw, the "band" below here is just the index to the above array in the WiFi driver, which then maps to an access category/hardware tx queue:

tos
dec hex dscp  0b1000 bit flag
            band
...
120 0x78 30 3 has_min_delay=True
122 0x7a 30 3 has_min_delay=True
124 0x7c 31 3 has_min_delay=True
126 0x7e 31 3 has_min_delay=True
128 0x80 32 4 has_min_delay=False
130 0x82 32 4 has_min_delay=False
132 0x84 33 4 has_min_delay=False
134 0x86 33 4 has_min_delay=False
136 0x88 34 4 has_min_delay=False
138 0x8a 34 4 has_min_delay=False
140 0x8c 35 4 has_min_delay=False
142 0x8e 35 4 has_min_delay=False
144 0x90 36 4 has_min_delay=True
146 0x92 36 4 has_min_delay=True
148 0x94 37 4 has_min_delay=True
150 0x96 37 4 has_min_delay=True
152 0x98 38 4 has_min_delay=True
154 0x9a 38 4 has_min_delay=True
156 0x9c 39 4 has_min_delay=True
158 0x9e 39 4 has_min_delay=True
160 0xa0 40 5 has_min_delay=False
162 0xa2 40 5 has_min_delay=False
164 0xa4 41 5 has_min_delay=False
...
code to generate this
for i in range(128):
  tos = i << 1
  has_min_delay = bool(tos & (1 << 4))
  dscp = (tos >> 2) & 0x3f
  print(tos, hex(tos), dscp, dscp_map[dscp], f'{has_min_delay=}')

userspace/files/sshd_config Outdated Show resolved Hide resolved
@sshane sshane marked this pull request as ready for review May 2, 2024 22:38
userspace/files/sshd_config Outdated Show resolved Hide resolved
userspace/files/sshd_config Outdated Show resolved Hide resolved
@sshane sshane merged commit 283c7a1 into master May 2, 2024
2 checks passed
@sshane sshane deleted the inter-ssh branch May 2, 2024 22:42
@sshane
Copy link
Contributor Author

sshane commented May 2, 2024

Nice way to plot custom data live, this is for queued ssh packets: while true; do ss -tm | grep "192.168.61.32:ssh" | awk '{print $3}'; sleep 0.25; done | pipeplot --direction left

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant