Moving device to another fleet produces failures in one of the containers

cjaramillo · March 17, 2023, 5:19pm

I’m very puzzled by this inconvenience. I have various devices working fine under one fleet (the container runs TensorFlow on the NVIDIA GPU). However, when moving any of these devices to another fleet (with the same architecture amd64), the container keeps respawning complaining about

Traceback (most recent call last):
import tensorflow as tf
File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 37, in <module>
 from tensorflow.python.tools import module_util as _module_util

The release pushed into the devices is from the same exact code (just pushed to the various fleets), but this is not a problem on the one fleet where the devices were deployed on.

alanb128 · March 22, 2023, 2:36am

Hello, note that for devices running balenaOS version 2.12.0 and above, data in persistent storage (named volumes) is automatically purged when a device is moved to a new fleet. Is it possible this is affecting your container? If not, is there any further text from the error message than you’ve posted here so we can try to troubleshoot further?

cjaramillo · March 22, 2023, 6:02pm

Our devices are currently using balenaOS 2.95.12+rev1. The data in the persistent storage is not the case because we are easily recreating whatever data needs to be added to the volume(s).

If I try some of the latest balenaOS releases, like 2.113.15, the issue is a different one (complaining about CUDA kernel modules drivers) even within the same working fleet:

CUDA Error: no CUDA-capable device is detected

Do you think the balena labels added to the docker-compose file have anything to do with this behavior?

If possible, could I schedule a “Private Support” session with a Balena Engineer (I think our Pilot plan supports that)?

rhampt · March 23, 2023, 11:17am

Hey there @cjaramillo — yes, you being on the Pilot Plan indeed covers private support. It’s accessible via the “Need Help?” tab on the bottom right-hand side of your dashboard when you’re signed in. When you open the support ticket, please provide a link to this forum thread and provide as much context as possible so we can continue helping.

cjaramillo · March 23, 2023, 5:13pm

Thanks for your help. So the issue was that for some reason, the other fleet was installing a different version of Numpy in the container. We assumed the installed dependencies were going to be the same on both releases (same code pushed to different fleets), but that wasn’t the case. The solution (for me, at least) was to explicitly add the version number for numpy to get installed in the containers, such as pip3 install numpy==1.23.4

Topic		Replies	Views
Persist named volume folder after moving a device from one fleet to another Product support docker	1	299	December 1, 2022
One container vanished on multiple devices? Product support docker	6	30	June 5, 2025
Devices unable to start new containers Product support	1	23	February 18, 2025
How to have persistent storage named volumes not get wiped through moving device to different applications Product support	2	428	April 15, 2021
Newb question regarding fleets, devices, and services Product support	4	293	October 6, 2021

Moving device to another fleet produces failures in one of the containers

Related topics