Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

body-controller-state: Error: failed to find a BLE device: .... #142

Open
iainbullock opened this issue Jan 13, 2025 · 12 comments
Open

body-controller-state: Error: failed to find a BLE device: .... #142

iainbullock opened this issue Jan 13, 2025 · 12 comments

Comments

@iainbullock
Copy link
Contributor

iainbullock commented Jan 13, 2025

This might be a problem using body-controller-state for presence.

I have driven my car away from home for the first time today since I installed v0.4.1b-dev several days ago. When I returned presence is no longer working (neither the original presence detection method nor the new one based on body-controller-state - Presence BC). Looking at the logs Presence BC stopped reporting approx 1 hour 50 mins after I left

When I returned I got a shell inside the container and issued the following command (having set $VIN):

tesla-control -ble -vin $VIN -command-timeout 5s -connect-timeout 10s body-controller-state

It timed out and resulted in this response:

body-controller-state: Error: failed to find a BLE device: can't init hci: no devices available: (hci0: can't down device: device or resource busy)

Looks like the bluetooth adapter at hci0 has 'got stuck'

No commands other than the regular body-controller-state commands were sent at this time

The only way out of it is to reboot the host

Things I'm going to try in order:

  • Disable original presence detection
  • Increase loop delay in function poll_state_loop() from 29 secs to 59 secs
  • Explore whether it's a feature of having a RPI3+ / BT4.2 (I previously experienced no issues with v0.3.1) see Specify minimum BT version #125 (comment)
@iainbullock
Copy link
Contributor Author

iainbullock commented Jan 13, 2025

Updates:

  • Disable original presence detection. Same issue occured 57 mins after car had been moved away
  • Increase loop delay in poll_state_loop(). This seems to have worked. No issues after the car was away for 7 hours
  • Explore whether it's a feature of having a RPI3+ / BT4.2 - Not tried.

Maybe what is happening:

  • When the car is away, the tesla-command body-controller-state call times out (takes 10+ secs)
  • When it is present it is much quicker maybe 1 sec
  • Even with a 29 sec loop delay in function poll_state_loop(), body-controller-state calls should not build up quicker than they are executed (with one car)
  • Even though the body-controller-state call times out after 10s, maybe it can't take the next call for some time later

Next steps:

  • Provide environment variables for tesla-control -command-timeout -connect-timeout, and loop delay in poll_state_loop() parameters so users can adjust to match their requirements
  • Look where successive calls to tesla-control can occur with no delay in between, and fix. See if that makes any difference. Update I didn't find any, but I will rewrite teslaCtrlSendCommand() sometime to make it easier to follow and be more robust (e.g. use body-controller-state to test for presence)

@iainbullock
Copy link
Contributor Author

Action: Provide environment variables for tesla-control -command-timeout -connect-timeout, and loop delay in poll_state_loop() parameters so users can adjust to match their requirements

I have implemented, and settled on the following which seem stable for my set up (RPi3b+, car 1.5 metres away):

  • BLE_CMD_RETRY_DELAY=1.5
  • TC_CMD_TIMEOUT=5
  • TC_CON_TIMEOUT=10
  • PS_LOOP_DELAY=60

@iainbullock
Copy link
Contributor Author

iainbullock commented Jan 15, 2025

I may have spoken too soon. Just had a lock up after car being away for 4 hours. Investigating...
I modded the code slightly to echo the standard error from the tesla-control call and return the exit value. Here's what I got:

2025-01-15T14:09:53.004584517Z Car is not responding to bluetooth, assuming it's away. VIN:LRW3F7FS5RC036403
2025-01-15T14:10:53.016887687Z Error: context deadline exceeded
2025-01-15T14:10:53.019161573Z EXITVAL 1
2025-01-15T14:10:53.019306676Z Car is not responding to bluetooth, assuming it's away. VIN:LRW3F7FS5RC036403
2025-01-15T14:12:03.203511105Z 2025/01/15 14:12:03 can't accept: listner timed out

Everything is going well with a successful call every 60 secs. The last successful call is at 14:10:53, then next message on the standard error is '2025/01/15 14:12:03 can't accept: listner timed out' after which the nothing further happens in the poll_state_loop. Either it's crashed or stuck

Running a ps aux inside the container gives:

PID   USER     TIME  COMMAND
    1 root      0:00 {run.sh} /bin/ash -e /app/run.sh
  585 root      0:00 {run.sh} /bin/ash -e /app/run.sh
  586 root      0:00 {run.sh} /bin/ash -e /app/run.sh
  587 root      0:00 {run.sh} /bin/ash -e /app/run.sh
  588 root      0:00 {run.sh} /bin/ash -e /app/run.sh
  589 root      0:00 {run.sh} /bin/ash -e /app/run.sh
  590 root      0:00 mosquitto_sub -h 192.168.X.X -p XXX -u ***** -P *** --nodelay -t homeassistant/status -F %t %p
  591 root      0:00 sleep 86400
  593 root      0:00 {run.sh} /bin/ash -e /app/run.sh
  595 root      0:00 {run.sh} /bin/ash -e /app/run.sh
  596 root      0:00 mosquitto_sub -h 192.168.X.X -p 1883 -u *****-P ****--nodelay --disable-clean-session --qos 1 --topic tesla_ble/+/+ -F %t %p --id tesla_ble_mqtt
  857 root      0:00 /usr/bin/tesla-control -ble -vin LRW3xxxxxxxxxx03 -command-timeout 5s -connect-timeout 10s body-controller-state
 1046 root      0:00 /bin/sh
 1065 root      0:00 sleep 60
 1066 root      0:00 ps aux

So there is a stuck tesla-control process PID 857. When I issue kill -9 857, it kills the stuck process, and the poll_state_loop continues from where it left off. Maybe it is possible for the code to detect this and kill the stuck process.

Not sure why there are some many {run.sh} /bin/ash -e /app/run.sh processes

All is not lost. Not yet

@BogdanDIA
Copy link

Ah, I initially replied here #141
but now I see this topic, so copying it here:

Hi, great job!
I just wanted to add that body-controller-state returns also when the car is asleep, which is good because the output can be used to prevent execution of other commands while the car is asleep.
However, in my view, using body-controller-state for car's presence is not a good idea because it will try connecting the car and will rely to retries, timeouts and fail connection that tesla-control is doing internally. If the timeouts are set high to tesla-control, this becomes too heavy for the controller. Still the best solution is the passive scan.

Bogdan

@chokethenet
Copy link

Hi, I'm copying my response in this other issue that is the same: tesla-local-control/tesla-local-control-addon#125

Hi,
I've been struggling with this issue for many days with the integrated bluetooth chip in the pc and several home assistant recomended usb sticks and the issue is as follows:
With this addon stopped the bluetooth receiver (integrated or usb ones) is on hci0 always, for days/weeks/months, once this addon is enabled it causes the bluetooth receiver to have an error (in a couple minutes or several hours) and it's re-loaded in hci1 (or hci2 or the first available if you have more than 1 bluetooth receiver configured).
Once that reload in hci1 occurs the addon is no longer capable of sending commands as it expects the device where it found it on start. Only a restart of the machine works, as it reloads the receiver in hci0.
In my case its HAOS on an intel minipc.

To summarize, this addon causes the bluetooth receiver to crash and it reloads on a different hci# with causes the addon to not be able to send more commands (all my other bluetooth integrations/devices work in seconds after the hci change, they adapt to the available one).

The bluetooth receiver crashes with this addon and it reloads on a different hci number but the addon does not adapt to available devices it always expects the receiver on the hci number it found on start.

@iainbullock
Copy link
Contributor Author

@BogdanDIA I agree, I think the passive method of determining presence is the way to go, as it is not likely to overload the bluetooth. I think body-controller-state has its place in confirming whether the car is awake (and therefore present) before sending commands. I believe many people have automations sending commands when the car is away, which eventually crashes the bluetooth.

I will therefore revert to checking body-controller-state for the car being awake only when it it time to poll the car, which will drastically reduce the number of calls to tesla-control, and improve robustess. We'll use the passive method for presence detection.

I will progress the investigation in this Issue to conclusion, as I have learned something which I believe will allow recovery after a tesla-control call has 'stuck', see next comment (when I've written it!)

@chokethenet I think best to wait for the next update when I believe the robustness will improve considerably. In the meantime if you are sending commands with your automations when the car is away, you could improve robustness by not doing so

Thanks both for your comments

@iainbullock
Copy link
Contributor Author

I now know that the tesla-control process gets stuck when the bluetooth is getting overloaded, see my earlier post. I suspect that another call whilst the stuck process is still alive causes the hci to jump. So today I used the timeout command to see if I can automatically terminate the stuck process. I modded the line in read-state.sh which calls tesla-control to this:

set +e
bcs_json=$(timeout 20 /usr/bin/tesla-control -ble -vin $vin -command-timeout ${TC_CMD_TIMEOUT}s -connect-timeout ${TC_CON_TIMEOUT}s body-controller-state)
EXIT_VALUE=$?
set -e
echo ret: $EXIT_VALUE

The timeout command runs the tesla-control command, but forcibly terminates it after 20 secs if it is still running. With the car away all morning, then returning, here are the relevent parts of the logs:

2025-01-16T11:01:53.856564397Z Car is not responding to bluetooth, assuming it's away. VIN:LRW3F7FS5RC036403
2025-01-16T11:02:23.882793290Z Error: context deadline exceeded
2025-01-16T11:02:23.886846438Z ret: 1
2025-01-16T11:02:23.887026541Z Car is not responding to bluetooth, assuming it's away. VIN:LRW3F7FS5RC036403
2025-01-16T11:03:08.893538002Z Terminated
2025-01-16T11:03:08.893775084Z ret: 143
2025-01-16T11:03:08.893825969Z Car is not responding to bluetooth, assuming it's away. VIN:LRW3F7FS5RC036403
2025-01-16T11:03:23.941256355Z Error: context deadline exceeded
2025-01-16T11:03:23.945738926Z ret: 1
.
.
.

2025-01-16T12:42:29.166910806Z Car is not responding to bluetooth, assuming it's away. VIN:LRW3F7FS5RC036403
2025-01-16T12:42:59.138788846Z ret: 0
2025-01-16T12:42:59.194547317Z Car is present and awake VIN:LRW3F7FS5RC036403
2025-01-16T12:43:25.348033518Z ret: 0

It works and the system recovered not causing the bluetooth to 'jump'.

Even though it works, as per my earlier comment I'm going to change presence detection to use the existing passive process. This will dramatically reduce the load on the bluetooth. I will however use body-controller-state to check the car is awake before sending commands. I will also use the timeout command to lauch tesla-control whenever it is called. I believe doing these things will fix most of the issues people are having with bluetooth locking up / jumping etc

@BogdanDIA
Copy link

BogdanDIA commented Jan 16, 2025

Hi @iainbullock , checking body-controller should be done only when the car is available. In my implementation, I check car's presence with bluetoothctl (I reset the controller with "hciconfig hciX reset" always before checking the presence) and after that I check body-controllers. If car is asleep, wait 3min and execute again in a loop.

But there were two things that made my implementation bullet proof:

  1. Never execute commands in parallel. tesla-control is not made to execute commands in parallel. The impl should prevent this.
  2. More important - make sure that every command leaves the system clean upon exit, with no dangling tesla-control processes(subshells) still waiting for some timeouts to finish. I implemented each command so that no matter what happens, after an overall timeout, it kills all the processes and wait for children/subshells to finish. And only after that I consider the command finished and can send the next command. This eliminated all the locks and situations like you had with body-controller.

For no2 I used timeout command, maybe there is a better way but it woks very well with no locks. Check one of the files, e.g.:

https://github.com/BogdanDIA/tesla-ble/blob/ha_addon/charging_get_bcontroller.sh

@iainbullock
Copy link
Contributor Author

  1. More important - make sure that every command leaves the system clean upon exit, with no dangling tesla-control processes(subshells) still waiting for some timeouts to finish. I implemented each command so that no matter what happens, after an overall timeout, it kills all the processes and wait for children/subshells to finish. And only after that I consider the command finished and can send the next command. This eliminated all the locks and situations like you had with body-controller.

I like the idea of using the wait command to wait for any sub processes to finish. Do you find you can go straight into sending the next command if you do this, i.e. no need for a sleep command first?

@BogdanDIA
Copy link

I like the idea of using the wait command to wait for any sub processes to finish. Do you find you can go straight into sending the next command if you do this, i.e. no need for a sleep command first?

I did run a loop script for more than 4 hours setting the current between 1 and 2 Amps continuously.
The commands I am running now do not have sleep delays. But probably depends also on the controller.

@iainbullock
Copy link
Contributor Author

I've just tested it and I can send commands one after another with no time delay in between. No issues. Thanks a lot for the idea I think this will improve the robustness of the project

@iainbullock
Copy link
Contributor Author

iainbullock commented Jan 19, 2025

I have implemented these changes into v0.4.3b-dev which is on DockerHub. Main changes are robustness improvements, see the CHANGELOG.
https://github.com/tesla-local-control/tesla_ble_mqtt_docker/blob/iain-dev/CHANGELOG.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants