Service not starting, Device state not settled

So I have a development OS with version 2.48+rev1 (raspberrypi4). When I push my container to my local device using balena push <hostname> --nocache everything goes seemingly fine. Here are my ‘deploy logs’:

[Live] Waiting for device state to settle…
[Info] Streaming device logs…
[Live] Watching for file changes…
[Warn] Windows-format line endings were detected in some files. Consider using the --convert-eol option.
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Logs] [2020-10-5 11:01:00 PM] Killing service ‘shipr.hub.serialreader sha256:189231e8415eb8e0224f3de40269dd8633c6d9e494cb6e10967e3325f8f6f22b’
[Logs] [2020-10-5 11:01:00 PM] Service is already stopped, removing container ‘shipr.hub.serialreader sha256:189231e8415eb8e0224f3de40269dd8633c6d9e494cb6e10967e3325f8f6f22b’
[Logs] [2020-10-5 11:01:00 PM] Killed service ‘shipr.hub.serialreader sha256:189231e8415eb8e0224f3de40269dd8633c6d9e494cb6e10967e3325f8f6f22b’
[Debug] Device state not settled, retrying in 1000ms
[Logs] [2020-10-5 11:01:00 PM] Installing service ‘shipr.hub.serialreader sha256:98f54fbb3a0aea45f4221ce43fedb24233d7c52db54cc06f9362f404ed2aed50’
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Logs] [2020-10-5 11:01:04 PM] Installed service ‘shipr.hub.serialreader sha256:98f54fbb3a0aea45f4221ce43fedb24233d7c52db54cc06f9362f404ed2aed50’
[Logs] [2020-10-5 11:01:04 PM] Starting service ‘shipr.hub.serialreader sha256:98f54fbb3a0aea45f4221ce43fedb24233d7c52db54cc06f9362f404ed2aed50’
[Debug] Device state not settled, retrying in 1000ms
[Logs] [2020-10-5 11:01:07 PM] Starting service ‘shipr.hub.serialreader sha256:98f54fbb3a0aea45f4221ce43fedb24233d7c52db54cc06f9362f404ed2aed50’
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Logs] [2020-10-5 11:01:10 PM] Starting service ‘shipr.hub.serialreader sha256:98f54fbb3a0aea45f4221ce43fedb24233d7c52db54cc06f9362f404ed2aed50’
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Logs] [2020-10-5 11:01:16 PM] Starting service ‘shipr.hub.serialreader sha256:98f54fbb3a0aea45f4221ce43fedb24233d7c52db54cc06f9362f404ed2aed50’
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
[Debug] Device state not settled, retrying in 1000ms
… and so on till the end of days …

But a few things are not really clear to me:

  1. Why does it start with like 30 attempts and saying device state not settled and all of the sudden it starts trying to start the service. Why isn’t it trying to start the service from the beginning?

  2. The container it’s trying to start works fine on my development machine. Appearantly something goes wrong; how can I figure out what’s the error? I can see the container inside the device with balena ps -a with status created. But balena logs <containerid> shows nothing, not a single line.

  3. The application I am deploying is a multi-container one so I am using a docker-compose with multiple services. How can you restart those services with the configurations specified in the docker-compose file. For example: my docker-compose file has some devices parameters forwarding a /dev into my container. The only way to ‘retry’ my failed service is manually running it with docker balena run -it -v /dev/mydevice/ <imageid>. To me that’s not the same; now I’m running that container with manually specified device-parameters instead of it using the one specified inside the docker-compose file.

    This might be relevant when let’s say you have a production device. You ssh into it and manually stop a container. And let’s say you want to start that service/container again, you wouldn’t want to specify all the parameters you specified in the docker-compose.yml would you?

    I am merely asking if there is a way to do this. I understand that this is docker’s way of doing things and that you guys probably don’t have any control about this.

Hi there and thanks for reporting!

First, I notice that the CLI mentions Windows line endings, have you attempted to allow the CLI to fix that with --convert-eol already?

Why does it start with like 30 attempts and saying device state not settled and all of the sudden it starts trying to start the service. Why isn’t it trying to start the service from the beginning?

On the development device, you should grab the supervisor logs (journalctl -au resin-supervisor --no-pager) so we can see what might be happening.

The application I am deploying is a multi-container one so I am using a docker-compose with multiple services. How can you restart those services with the configurations specified in the docker-compose file. For example: my docker-compose file has some devices parameters forwarding a /dev into my container. The only way to ‘retry’ my failed service is manually running it with docker balena run -it -v /dev/mydevice/ . To me that’s not the same; now I’m running that container with manually specified device-parameters instead of it using the one specified inside the docker-compose file…This might be relevant when let’s say you have a production device. You ssh into it and manually stop a container. And let’s say you want to start that service/container again, you wouldn’t want to specify all the parameters you specified in the docker-compose.yml would you?

Typically accessing the host OS directly is a bit of an anti-pattern, there are endpoints exposed by the balena-supervisor that make this sort of interaction cleaner and more predictable (see https://www.balena.io/docs/reference/supervisor/supervisor-api/ for the full documentation of that API). Generally speaking docker-compose.yml allows you to define containers with configuration (based on an image), and balena run first creates a new container from an image and then runs whatever you tell it to do. For more on the difference between images and containers, I recommend taking a look at our masterclass on the subject: https://www.balena.io/docs/learn/more/masterclasses/docker-masterclass/#6-docker-containers.

Please let us know what other questions you may have!

Hi, thanks for the response!

I didn’t use --convert-eol before (too lazy I guess) but will from now on.
Also thanks for the tip to check supervisor logs, that indeed gave me the answer I was looking for.

However… now I have a question about my error. I have two udev-rules configured in my config.json, those work fine. But those two devices aren’t always connected. Sometimes I have device A, sometimes device B. My service did not start because the supervisor logs said the following:

Device state apply error Error: Failed to apply state transition steps. (HTTP code 500) server error - linux runtime s pec devices: error gathering device information while adding custom device “/dev/sygo”: no such file or directory Steps:[“start”]
Oct 06 20:48:57 dev-shipr resin-supervisor[1591]: [error] at /usr/src/app/dist/app.js:614:16375

Which makes sense to me because the device /dev/sygo isn’t connected. And inside my docker-compose.yml I specify my devices as following:

    devices:
      - "/dev/gps:/dev/gps"
      - "/dev/sygo:/dev/sygo"

How do I tackle this issue? I want my container to know about those devices and I want to be able to hotplug devices but I don’t want my service to be dependent on those devices when starting.

Those are my udev-rules inside the config.json in case yo are wondering:

"udevRules": {
      "sygo": "SUBSYSTEM==\"tty\", KERNELS==\"1-1.3\", SYMLINK+=\"sygo\"",
      "gps": "SUBSYSTEM==\"tty\", KERNELS==\"1-1.4\", SYMLINK+=\"gps\""
}

Hi, yeah you’re running into a bit of a weakness of containers here :slight_smile:

When you’re using the devices directive docker will only attempt to bring those into the container at start time, if they later appear on the host they will not get picked up as you noticed.
The solution for that is to make the container privileged (allowing it broad access to the host system) and then run udev inside the container to take care of dynamically updating the /dev tree.

You should be able to adapt the info here to your needs: https://www.balena.io/docs/learn/develop/runtime/#mounting-external-storage-media

Thanks, I think the original question is answered. But I could really use some help here as my understanding of linux, mount and devices aren’t my greatest assets.

I added privileged: true to my service inside the docker-compose and also added ENV UDEV=ON to the Dockerfile. To my understanding all the /dev/ devices should be accessible/visible in my container. When I /bash into the container and check the devices I don’t see my ‘symlinks’ which the host does have.

So I thought maybe symlinks aren’t accessible inside containers and you need to add the udev-rules inside the container instead of the host. Inside my Dockerfile I copy the udev-rules to /etc/udev/rules.d/. Still no results.

I also noticed that when I plug and unplug my usb devices, the /dev/ directory does not change inside the container.

The example you guys provided is about external-storage-media. And to my understanding mount is used for storage device or filesystem. In my case my usb-devices aren’t storage-devices. The devices are outputting some serial sensor-data over RS232, which are converted to USB and plugged into the raspberry.

I don’t really know where to go from here. I felt like I was pretty close to the finishline because I was able to read the data. Now when using privileged:true instead of just device-forwarding I feel you need to go down the rabbithole. I just want to hotplug a USB device and like to give the device a name based on the USB-port it uses.

Hi @gerb0n,
There is a sample repo that demonstrates how to automatically mount/unmount external devices in multi container app. Can you try this setup?

Yes I tried that setup. It worked! I took bits here and there to make it work in my own project. I found that the problem was my ENTRYPOINT in the dockerfile. I used ENTRYPOINT ["dotnet", "myproject.dll"] before and once I started using CMD ["dotnet", "myproject.dll"] it finally picked up the mounts and unmounts.

I must add that the line RUN install_packages findmnt util-linux grep from the sample-repo does not work for me with the balenalib/aarch64-debian-dotnet:3.1-run image. It says it cannot find the package findmnt. But I found that findmnt is part of the util-linux package anyway so I just removed findmnt from that command.

Thanks for pointing me in the right direction guys!

Hi there – glad to hear it worked out for you!

All the best,
Hugh