Hi,
We reflashed an iMX8MM device this morning with the v2.50.1+rev1 release and it seemed to work ok briefly, but then after a little while it stopped working completely and showed up in the dashboard as status “Online (VPN only)” and none of the containers appear to be running, nor is the console log output updating.
We are able to balena ssh
into the host OS (and have physical console access), but trying to SSH into any of the containers outputs:
BalenaRequestError: Request error: tunneling socket could not be established, statusCode=500
Clicking application restart on the device in the dashboard also gives the same error message. We also tried changing the target release of the device, but it isn’t changing from the previous version.
Rebooting the device didn’t change anything. On the serial console, we’re seeing a look like a bunch of device resets coming from the supervisor every few minutes it seems:
br-340bd66480a1: port 4(vethdf2d783) entered disabled state
vethb00b1ad: renamed from eth0
br-340bd66480a1: port 4(vethdf2d783) entered disabled state
device vethdf2d783 left promiscuous mode
br-340bd66480a1: port 4(vethdf2d783) entered disabled state
supervisor0: port 2(veth4a1fd6f) entered disabled state
veth75e526e: renamed from eth1
supervisor0: port 2(veth4a1fd6f) entered disabled state
device veth4a1fd6f left promiscuous mode
supervisor0: port 2(veth4a1fd6f) entered disabled state
br-340bd66480a1: port 4(vethee4a5f1) entered blocking state
br-340bd66480a1: port 4(vethee4a5f1) entered disabled state
device vethee4a5f1 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): vethee4a5f1: link is not ready
br-340bd66480a1: port 4(vethee4a5f1) entered blocking state
br-340bd66480a1: port 4(vethee4a5f1) entered forwarding state
supervisor0: port 2(vetha772b59) entered blocking state
supervisor0: port 2(vetha772b59) entered disabled state
device vetha772b59 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): vetha772b59: link is not ready
supervisor0: port 2(vetha772b59) entered blocking state
supervisor0: port 2(vetha772b59) entered forwarding state
br-340bd66480a1: port 4(vethee4a5f1) entered disabled state
supervisor0: port 2(vetha772b59) entered disabled state
eth0: renamed from veth1213745
IPv6: ADDRCONF(NETDEV_CHANGE): vethee4a5f1: link becomes ready
br-340bd66480a1: port 4(vethee4a5f1) entered blocking state
br-340bd66480a1: port 4(vethee4a5f1) entered forwarding state
eth1: renamed from veth7c6ae65
IPv6: ADDRCONF(NETDEV_CHANGE): vetha772b59: link becomes ready
supervisor0: port 2(vetha772b59) entered blocking state
supervisor0: port 2(vetha772b59) entered forwarding state
We tried restarting the supervisor as described in Services are in a constant restart loop!, and as soon as we did the device showed up as “Online” (i.e., not VPN only) in the dashboard but nothing else changed. We rebooted the device again and now don’t seem to see the supervisor failure messages on the console anymore after the Balena OS banner printed but everything is still broken.
We can reflash the device and see if it happens again, but we’re not sure how it got into this state to begin with and we’re concerned that this could happen to a customer, who would not be able to reflash the device. Is there a way to remotely restore/reflash a device using the infrastructure for host OS updates? Is there any sort of backup partition?
We also don’t want to reflash the device and lose any chance of actually debugging the issue, so we’re going to hold off on doing so until we hear back but would appreciate any help we can get right away.
Worth noting that this is our first use of 2.50.1. We’ve been using a custom build of 2.47.1+rev2, the OS version when we started working with Balena, because at the time that version did not have the Variscite BSP updates necessary for iMX8MM. That version has worked fine for now, but we’d like to get on the mainline releases. The BSP was updated in the 2.50.1 release.
Thanks in advance,
Adam