Device Stuck in Inactive State

Hello!

I have a device that is stuck in an ‘inactive’ state, similar to this thread.

The logs are coming through fine but most other options are grey-ed out; there is no access to the terminal and I can’t use balena ssh <UUID> ; doing that just prints Device is not online.

As far as I know, the necessary ports are unrestricted and not blocking the devices. Unfortunately ethernet is not an option for these devices, wifi only.

Possibly related but we also have a handful of devices that are offline, while other devices are online and working normally.

Thanks for your help

Hi, one way to debug this problem is to access the inactive device through another device in the same network and see the logs. Is there online device in the same network?

I’ve got a couple of devices that are nominally on the same network but they’re in different sites, if that makes sense? I’m not sure if those logs will help, but happy to do it. Are their any commands in particular you’d want run?
$ journalctl -a -u resin-supervisor ?

It should be shown as online if it successful connects to our VPN service, so it looks like that is not happening. Could you send us the UUID of the device so that we can investigate further?

Logs for resin-supervisor and openvpn would be useful as well

No problem, I can’t get logs for the inactive device but here are the logs for another device on the same network that’s online:

resin-supervisor logs: https://pastebin.com/3Px3BrNp

openvpn logs:

`root@6019f7e:~# journalctl -a -u openvpn-resin`
`-- Logs begin at Tue 2019-12-10 22:02:15 UTC, end at Tue 2019-12-10 23:50:06 UTC. --`
`-- No entries --`

UUID sent over pm to you @Ereski

Hey, just to check, did you change the config.json on the device? Or do a os-config join/leave? If that was the case, the vpn would have to be restarted so that it would start connecting with the correct uuid/api key pairing.

No, none of that has been touched unfortunately

Hi @ade are there any other online devices on the same network with the “inactive” device?

We have the same company-wide network set up across 3 different sites ( as far as I know, it’s the same network set-up, same permissions etc.)

In the site with the “inactive” device, the other device is ‘offline’ but on the same network in a different site we have a couple that are “online”. I’ll send the UUIDs if you want to have a look.

Hi there, just wanted to quickly followup to see if you had made any progress on this issue. Another method that you could possibly attempt is to connect to the Inactive device via a standalone SSH client, if you happen to have added SSH keys to the config.json prior to deploying the devices to the field. More information on that is located here: https://www.balena.io/docs/learn/manage/ssh-access/#using-a-standalone-ssh-client

Otherwise, power cycling the device may be necessary at this point.

Hope that helps, thanks!

Hi @dtischler, thanks for your input.

Unfortunately power cycling doesn’t resolve the issue - device boots up and runs fine, but remains in the same ‘inactive’ state. The app running on the device connects to the internet and I can access it’s browser interface, receive it’s logs etc. but nothing on the balena end.

In terms of a standalone SSH client, running
ssh -p 22222 root@<device_ip_address>
returns:
ssh: connect to host <device ip address> port 22222: Operation timed out

I should also mention that I’ve tried different SD cards and different raspberry pis but that hasn’t resolved the issue either.

If it helps, I can get the inactive device (or one of the offline ones) shipped to me as SSH isn’t working at the moment?

Hi. Just to clarify, did you run ssh while connected to the same local network as the device? And also, is there some firewall that might be blocking traffic from the device? It would be very useful if you could confirm that the requirements we have for firewall rules are met for the specific network the device is in: https://www.balena.io/docs/faq/troubleshooting/faq/#what-network-ports-are-required

Hi Ereski,
In terms of the firewall rules, from my first post -

As far as I know, the necessary ports are unrestricted and not blocking the devices.

Is there more detail I could privide? I don’t run the network management but I’ve shared that link with our IT team a few times in troubleshooting this issue and they’re confident that everything is set correctly to allow the balena traffic. We might be running OpenVPN as well though, could there be a conflict there?

I’m in a different location to the offline devices but on the same company-wide network; I’ve ssh-ed from here but I’m not able to do it from the same location as the devices. I can get the devices sent to me and ssh that way though?

One more thing to add, I ran balena push <Application Name> and all the device apps have updated (even those that are inactive or offline), but they’re still in that same state in balena cloud (some online, some offline, 1 inactive).

Hey there,

I think a great next step is to get the devices sent to you, connect them in your same network, and SSH to them. From there, it should be easy to confirm if certain traffic is getting blocked.

The devices are updating, which is good news, so there is definitely something wrong with the OpenVPN connection. I don’t think running another OpenVPN connection in the company would be a problem, but its worth confirming. My guess is that you have an OpenVPN server in the network, and everyone in the company is allowed to connect to it, while in the case of Balena Cloud, they need to connect to an OpenVPN server that we provide under *.balena-cloud.com.

Also, is there a way you can get the IT team to share the actual firewall rules with you, so we can help troubleshooting?

Thanks @jviotti, no worries. I’ve managed to get one device sent to me so far ; before I plug it in, are there any logs etc. I can provide?

In terms of the firewall rules, I’m still waiting to hear back from IT. I’ve asked them to double check that *.balena-cloud.com is set up too. I’ll update once I hear anything.

Hi,
Once you have sshed into the device I think the most interesting logs would be the vpn logs, which you could get by running journalctl -a -u openvpn-resin. Let us know how it goes.

OK, it looks like this issue’s been resolved. Our IT guys spent some time tweaking the firewall settings to ensure that the Ras. Pi traffic could flow unobstructed and now the devices have come online.

Many thanks for all the help everyone! As had been suggested, this was issue caused by network settings, not something app or container related. I don’t have any specifics about what was changed (sorry, sorry), but I believe the relevant ports were always open; it’s possible that the domains needed to be whitelisted though.

Oh, and FWIW, the logs aren’t very illuminating unfortunately. I managed to get 3 devices sent down and I SSHed into them; this is what I got (again, in case this helps anyone in the future):

=============================================================
Device #1
=============================================================
journalctl -a -u openvpn-resin
-- Logs begin at Fri 2020-01-03 13:47:33 UTC, end at Fri 2020-01-24 04:04:50 UTC. --
-- No entries --

=============================================================
Device #2
=============================================================
-- Logs begin at Fri 2020-01-03 13:47:33 UTC, end at Fri 2020-01-24 04:06:05 UTC. --
-- No entries --

=============================================================
Device #3
=============================================================
-- Logs begin at Fri 2020-01-24 01:59:20 UTC, end at Fri 2020-01-24 04:07:00 UTC. --
-- No entries --

Hi,

Great to hear that it works now.
Sorry I provided the wrong command. It should have been journalctl -a -u openvpn.