I am new to using Balena and have hit an odd problem I’m hoping others may have solved. I have setup a single container application on Balena. The core of my application is written in Python 2.7. I have been able to build, deploy and run my application so far with minimal changes to the core code (which ran bare metal on RPI3). While the application runs fine for a period of time, eventually the container stops getting connections to Bluetooth devices and throws connection exceptions. If I restart the app container the problem continues. If I restart the host, the problem clears. It appears to happen after a period of time running the container, perhaps exhausting underlying connections at the OS level? My application is designed to retry BT connections (they are inherently unreliable with some devices, therefore catch a failure, close connection and retry is standard practice.). I am using the latest version of the BluePy library (https://github.com/IanHarvey/bluepy). Running in a bare metal configuration this library has not had connection problems, it just started w/the deployment to Balena. One thing that may be unique to my configuration is that I use Python multiprocessing to spawn sub processes that can run in parallel. I use process forking instead of threading due to some underlying behaviors of a library I’m using - has a bad habit of leaving abandoned sockets open. My using sub processes the orphans are destroyed w/the sub processes on exit. That approach works fine on bare metal, not sure what is happening w/balena under the covers when using this and potentially a problem w/BT connections? Any help to figure out what is happening would be appreciated.
Have you tried the exact same method for BT under Raspbian? I am a little confused about what you meant - I couldn’t understand if you meant sub-processes in general, or sub-processes for BT related stuff work fine on Raspbian.
If that’s the case, this could be an interesting issue with how balena works.
Yes, the same exact code runs under Raspbian without issue. What I mean by subprocesses is that the module of code that interacts w/bluetooth is run as a subprocess instead of a thread. See more about subprocess vs. threads here https://docs.python.org/2/library/multiprocessing.html
I’m still new to Balena, not sure what you mean. Below is the device configuration I see after the app is deployed to the container. Is three additional parameters I should set in the Dockerfile or compose file? Keep in mind, this works fine for a period of time, then the host stops allocating connections.
Thanks for sending over. That is using a different BT library and connection model is very different (incoming audio connection to RPI vs. outbound connection model for my app.) The only real difference I see is the base image it starts with vs. the python image I am using. I have tried the Balena Python 32 and 64 bit image and experience the same issue. I’ll try the alternative base image, any other ideas out there?
Hm, I was more thinking of how your container is set up to talk to the bluetooth subsystem on the host os. I had a quick look at bluepy and saw that they use bluez (the bleutooth-dbus interface) too.
Your container will need access to the host’s dbus specifically, that’s what is happening here and here
From your description it sounds like something about the configuration makes it so that the container will not be notified when the BT device goes away and comes back. This kind of problem usually points to some problem with how udev is set up, since docker doesn’t really take care of dynamically appearing devices when you’re not running the container as privileged.
Maybe you can share the host-level bluetoothd logs with us from when the device is in the fail-state, that might tell us a bit more about what’s going on…
On a side note: you might want to look at moving your project over to python 3 since we will be deprecating the python 2 support from our official base images in the coming months.
Still new to how Balena works - where is the log file on the host OS? My host is in a failure loop right now, perfect timing.
I have "io.balena.features.dbus: “1"” set in my docker-compose under label and I have “privileged: true” set under the service definition. I do not have the ENV variable set, will try that next.
You can login to the host via the dashboard terminal or the balena-cli. Then it works just like on raspbian via journalctl --no-pager -u bluetooth.service
no log files?
root@6c9ca41:~# journalctl -u bluetooth.service
– Logs begin at Mon 2020-04-13 17:58:06 UTC, end at Tue 2020-04-14 13:45:31 UTC. –
– No entries –
root@6c9ca41:~#
Right now only w/the dbus label set. I am creating a test harness to replicate the error condition. once I have the harness and proven use case to create failure will start introducing new variables - or start by setting the env. var.
Okay that sounds good. What I can tell you is that if your software requires dbus to talk to the bluetooth service then for sure you will need both (label and env var) for it to work. Keep us posted on your findings!
I’m seeing this error in the logs after running the regression for <100 iterations. This is without the env. setting above - will add that next. This error looks like a an open issue with some kernels, have you seen it before?
The host is in failure mode now. There is a new error in the host logs that coincides w/the container requests for BT connection. Here is the host exception being thrown. Interesting note, the application code is trying to connect to 2 unique devices, but sequentially (connect, get characteristics, disconnect, try next device.). What’s interesting is the device is not able to connect to 1 of the 2 devices, every time it tries to connect the error below is in the host log file and the application BT connection code fails. When the code attempts to connect to the 2nd device it works as expected and no error is throw in the host log.
Apr 15 12:09:45 a9901ab kernel[722]: [42199.013931] [<8013c114>] (process_one_work) from [<8013d20c>] (worker_thread+0x60/0x5b8)
Apr 15 12:09:45 a9901ab kernel[722]: [42199.022141] [<8013d20c>] (worker_thread) from [<80142cec>] (kthread+0x16c/0x174)
Apr 15 12:09:45 a9901ab kernel[722]: [42199.029645] [<80142cec>] (kthread) from [<801010ac>] (ret_from_fork+0x14/0x28)
Apr 15 12:09:45 a9901ab kernel[722]: [42199.036967] Exception stack(0xb9505fb0 to 0xb9505ff8)
Apr 15 12:09:45 a9901ab kernel[722]: [42199.042087] 5fa0: 00000000 00000000 00000000 00000000
Apr 15 12:09:45 a9901ab kernel[722]: [42199.050381] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Apr 15 12:09:45 a9901ab kernel[722]: [42199.058673] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Apr 15 12:09:45 a9901ab kernel[722]: [42199.065434] kobject_add_internal failed for hci0:64 with -EEXIST, don’t try to register things with the same name in the same directory.
Apr 15 12:09:45 a9901ab kernel: Bluetooth: hci0: failed to register connection device
Apr 15 12:09:45 a9901ab kernel[722]: [42199.077908] Bluetooth: hci0: failed to register connection device
Apr 15 12:09:47 a9901ab 1a3e92352b25[814]: [api] GET /v1/healthy 200 - 15.794 ms