Container doesn't start on host OS

Hello,

Since a few days ago, the container doesn’t start on my upboard. I didn’t change anything about the host OS during these days. There is a weird behavior on the dashboard : the status of the container switches and loops between status downloaded --> installing --> starting --> running --> stopped .
The Host OS version is Resin OS 2.12.5+rev2 and the supervisor version is 7.1.18 .

Does anyone have an idea?

Thanks in advance !

Hello. I’m sorry for this situation. Can you share with us the device URL and enable support access on it? That would help up to take a look on the misbehaving device and see what is wrong with it.

If I provide a new release it seems that the device download it (I have the progress bar and the status switch to update) but when it is finished, the container doesn’t start.

When I check the release on the device, it has not changed, the version is still the same than 15 days ago.

Tag release on device does not match with the tag release on “service” :

Thanks

Hello @quentinb, as @agherzan mentioned in the previous message.
Could you share the URL of the device and grant support access so that we can investigate the issue?

image

Thank you!

Hello @dansku, support access is grant for 12 hours. The URL of the device is https://3542b665c0639d9ce60ddbbd7c2ba676.balena-devices.com .

Thank you !

Hi, thanks for reporting. We are investigating this issue and will get back to you as soon as we identify the problem.

Hi @karaxuna,

Do you have any news or answer to my problems ?

Thanks,

The device seems to be in a weird though I can’t make out whether it is the host OS or your application. Systemd seems to be throwing errors all over the place. Can you try pushing the last known working commit again?

@dfunckt, I tried to push the last known working commit, but it doesn’t work.

I have also tried to change the resin base image by an image from balenalib Docker Hub repo, but it didn’t work either.

Hi, did you enable UDEV with the appropriate env variable? If not, you can find here how to

Adding to my colleagues’ replies, I was looking at the device now and it seems to be in a different state than earlier today. I see that the device is attempting to start an app release that was pushed “9 hours ago” (ceedce4), but it fails to start because of errors in the application container. If you open a command prompt to the host OS using the web dashboard, then run the command "journalctl -au balena", you’ll see some of the error messages printed by the application container. For example, the following is printed in a loop:

$ journalctl -au balena
...
Aug 08 21:05:53 yocto balenad[807]: [2019-08-08T21:05:53.500Z] Event: Service started {"service":{"appId":1392974,"serviceId":221197,"serviceName":"main","releaseId":1023046,"image":"sha256:a0c486d2
Aug 08 21:05:53 yocto balenad[807]: server send: 1
Aug 08 21:05:53 yocto balenad[807]: W: [pulseaudio] main.c: This program is not intended to be run as root (unless --system is specified).
Aug 08 21:05:53 yocto balenad[807]: [2019-08-08T21:05:53.826Z] Event: Service kill {"service":{"appId":1392974,"serviceId":221197,"serviceName":"main","releaseId":1023046,"image":"sha256:a0c486d29f2
Aug 08 21:05:53 yocto balenad[807]: kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Aug 08 21:06:03 yocto healthdog[807]: time="2019-08-08T21:06:03.833856032Z" level=info msg="Container 58bdaeedb5c243e01bb6d6e1a84c2cffdd6de701038c69a8a4f8065024ddc288 failed to exit within 10 second
Aug 08 21:06:04 yocto balenad[807]: [2019-08-08T21:06:04.175Z] Event: Service exit {"service":{"appId":1392974,"serviceId":221197,"serviceName":"main","releaseId":1023046,"image":"sha256:a0c486d29f2 

As if the “pulseaudio main.c” error was key to the issue, though there is additional log output that I didn’t paste above. I suggest you have a look at the full output of that command, in case some of the log messages are meaningful to your application.

I tried to push the last known working commit, but it doesn’t work

Is that release ceedce4? It might be worth investigating the last known working commit further, to find out why it stopped working.

Also, one of the engineers who was investigating the device earlier today left an internal thread note, that I might as well quote:

“systemd-udev needs privileged mode. If they use udev, they should use balenalib base images, no systemd in there, privileged, UDEV=on, that should stop in-container systemd-udev to interfere.”

But if you had a working release, I think a good approach would be to get that release to work again. Changing from the old resin images to the new balenalib images is not always straightforward. Some major breaking changes are listed in the following doc: https://www.balena.io/docs/reference/base-images/base-images/#major-changes

@pdcastro

Thanks for your reply.
Today I tried to fix the last working release but it didn’t work. The behaviour is the same : the release is downloaded on the device and the host OS tries to start it but it stops because of a SIGTERM signal being received. See the following screenshot from the host with the command dmesg.

08

I tried to deploy an example of a small application with electron and resin that displays a unicorn on the screen and it works !

When I tried to deploy a minimal Dockerfile with only FROM resin/up-board-buildpack-deps:stretch , ENV INITSYSTEM on and CMD ["echo", "balena" ], it didn’t work. But in the previous small app, the FROM image is the same and it uses INITSYSTEM there too.

So, I don’t understand what I missed.

I grant support access and the device’s URL is https://9abde87abc010c2ab533c87fbc0013cf.balena-devices.com/

Thank you.

Hey, I took a look at your device, and it seems the supervisor is killing the user container for some reason. I also noticed that the supervisor version is quite old, and subsequent versions of the supervisor will report exactly why they are killing a service. Is it possible for you to update the host OS version?

@CameronDiver

Thank you for your reply.
I will try to update the version of the host OS by replacing the meta-resin with the meta-balena and update required elements.

Is there an additional configuration to add for the BalenaOS build versus ResinOS ?

Thanks.

Hi,

Is there a particular reason you are building your own OS?
How will you update the OS? Using our host OS upgrade? Or reflashing your board?

Quite a few things have changed since balenaOS and resinOS. If you are comfortable with Yocto, yes replacing meta-resin with meta-balena will work.

Otherwise, I’d recommend a clean start with balenaOS

Regards
ZubairLK

Hi @zubairlk,

We are building our own host OS because we need some drivers, customize splash screen, etc …

I rebuild a host OS from sources with Yocto to integrate our meta.
Now, I fixed version of all meta to use sumo branch of Yocto project. I achieved to build our host OS without error and I flash our board with it.
But on Balena dashboard, the device appears in inactive status and the LED flash a repeated pattern describe here : https://www.balena.io/docs/faq/troubleshooting/troubleshooting/#unable-to-connect-to-the-internet.

I check the configuration in config.json and it seems to be ok there are no particular restrictions on the local network.
Do you have an idea of ​​what’s going on ?

I noticed a warning during the build : WARNING: Your build configuration uses RESIN_CONNECTABLE* variables. These variables are no longer used. There is only one type of resinOS image type which is unconnected by default. The os-config tool is used to configure the resinOS image for connectivity to a resin instance..
Is there any side effect when this variable is set to 1 or she just ignore ?

Thank you

Hi @quentinb,

The splash screen is customizable via balenaOS. And if the drivers you need are something generic that others can benefit from, we can try to enable them in our OS so that its easier for you.

Regarding, I have a feeling your device hasn’t really connected to the balenaCloud.
I can’t remember which version, but we moved to a separate service that provisions/connects the device to the cloud. If you check systemctl status resin-supervisor it should show unconnected or something. You need to run os-config join 'contents_of_your_app_config.json' (note the single quotes wrapping around the config.json contents. single quotes are needed.)

Regards
ZubairLK

Hi @zubairlk,

When a device is in a inactive status , I can’t access it by Balena Dashboard or Balena CLI, because device is offline. So I can’t run command into the device.
I did some tests and I found a BalenaOS that works fine (touchscreen and sound doesn’t but it’s another issue).
But I have a weird behavior. I flashed the board with another BalenaOS to try to fix touchscreen problem and the device still stay in inactive status like before. I put back the old version that works and I have same problem (inactive status)… (and in one case, Balena has downloaded the latest release and has started it !!)

I don’t know what to do to fix this problem that appears randomly…

Thanks

Hi @quentinb ,
with an old custom built OS version of balena it is a rather complex task supporting you. Specially if your device is not online.
You would be best to try reproduce these problems on a device that you can access and provide access or logs for us.
Also we would need to know what kind of modifications you have made to the OS.
As zubairlk suggested further up you would be better off reverting to a stock balena OS and talking to support if it is possible to integrate the drivers you need.
Regards
Thomas