Raspberrypi3 stuck in "Online (VPN only)"

Hello,
My raspberrypi3 is stuck in “Online (VPN only)”, also /mnt/data is full.


Pls help

I was able to resolve it thanks to these commands

Yes, those commands will recover the device, but will also delete any data written inside the app containers.

The /mnt/data partition is shared between the application (for any use, in the form of named volumes) and balenaEngine for the purpose of storing application images and the data written inside by applications inside the container filesystem (e.g. data writing to the /home folder inside app containers). If /mnt/data gets completely full, it can unfortunately prevent balenaEngine from operating correctly and even from starting, which in turn could prevent applications and the balena supervisor from operating correctly or even starting.

So to recover the device, the first thing to do is free up some space in the /mnt/data partition. If your application uses named volumes (e.g. if it writes files to the /data folder as visible inside the app container), you will be able to see individual files (for selective deletion) in subfolders of /mnt/data/docker/volumes/. For inspection, some other commands you can run are:

cd /mnt/data/docker
find volumes
du -hs *

Thanks @pdcastro.
I am currently on balenaOS 2.43.0+rev1 and do an update shortly. Is this fixed in the latest version?
Thanks again

Hi, yes. This issue is known to be fixed in version 2.50.4+rev1

THANKS!!

You’re welcome

I think I have a similar issue as the one discussed in the post.
I have installed balena-sense on a Raspbery PI Zero W. Works all fine.
I removed power to relocate the sensor and when I power on it would not work. I thought was due to me just removing power. I performed a new installation (removed SD card. formatted it, used balena etcher) it work perfectly. This time I used shutdown functionality offered but again after removing power again status was Online (VPN only).
How to avoid all together this issue on Status: Online (VPN only). what other information do you need to better understand the issue?
Ah yes, other big difference in this instance: This is my first project with Balena, PI…
Thnaks,

Installed version 2.50.4+rev1 and now it works fine.

Hello,

I’ve been using Balena for a few months and currently running three RPI4’s. Two of them are currently stuck in “Online (VPN only)” and I want to follow the advice given above, but I’m unable to start a terminal session for either fo the two which are stuck (using balena dashboard.)

And they are several hours drive away, so is there another way to fix this problem?

Thanks

Hi Hein,

You can try to Access a Device using a Gateway Device. For this purpose you would need another working device on the same network.
Alternatively, you might try to reboot the device, if the above method does not work. The device should come up back online if the only problem is/mnt/data being full.

Best regards,
Genadi

Thanks - unfortunately I have nothing on that network, its connected to a 3G router since its a remote installation.

When trying to reboot from the dashboard I get “Request error: tunneling socket could not be established, statusCode=500”

I had the user do a complete power down cycle, no change.

This seems to be a rather serious bug, or am I supposed to change something so this doesn’t happen again?

And of course, any other idea on how to fix this remotely without flashing the SD card again.

Does the error code 500 have anything to do with the RPI being connected through a 3G router? I know with mobile internet the unit sits behind NAT.

Unfortunately the only way to rover in this case is to gain physical access to the device or ship a pre-flashed SD Card (if that is easier).

That’s too bad.

Any way to keep this from happening again? These units are trials for a commercial reporting product, can’t really afford to have devices quietly go down without a remote way of sorting it out. Well ideally not at all. I’m only running a simple node.js script that relays ID strings from a sensor to my server, not sure how and why this should be happening. And this is 2 out of only 3 units, not a very good ratio so far.

Hi Hein, if the device shows as VPN only, that means it is connected to our VPN. Since you cannot access the device through the VPN tunnel, this tells us that the device might be misbehaving in some way. Unfortunately there is no way to tell for certain what are the reasons for this. For Raspberry PI Zero devices, a common suspect is the device being resource constrained and unable to allocate connections, but there might be other issues.

One thing that might help is trying to replicate the conditions on a lab device (use same 3G network, same OS, app release, etc) to see if you can get a device into the same state, and use local connection to try to diagnose. Let us know if this helps.

Thanks @pipex, these are RPI4’s with 4GB ram and like I said they run a tiny script, so hugely overspec’d. One addition I need to make is that one of them is on a 3G router, the other is on the client’s wifi, completely different service providers. It’s also weird that they have both been running smoothly for several months and decided to act up within 3 days of each other (it could be less than three days, but their last logging were 3 days apart. Incoming data may be several days apart.) These two devices are hundreds of kms apart.

This makes me think that it’s not so much a hardware issue. Anyway, it seems like the only way to fix the situation right now is to mail new sd cards up to the clients, but this is a scenario which I don’t think we can afford to have for future clients. The whole idea with testing Balena was to be able to remotely manage the devices, and it seems like that is not an option for me right now. We intend to cover an entire region once our product rolls out and I think we will need a more reliable solution that what I’ve experienced so far.

You say you can’t tell for certain what the issue is, but do you have any guesses apart from constrained resources?

Hi again.

Sorry I got confused by the previous user mentioning Pi Zero. You are right, I wouldn’t expect memory to be an issue on a PI4, although SD card corruption or file descriptor abuse could also interfere with the ability of the device to communicate. What is your script doing? How long have these devices been running, can you find any commonalities between the devices? Are they running the same release? These answers might give us an idea, but unfortunately, without access to the device we can only hypothesize the possible causes of the issue.

For the device on Wi-Fi, is there another device on the same network that can be used as gateway? Are your customers able to ping the device through the local network? If they can access the device locally using the CLI they might be able to retrieve logs and get more information about the issue (supervisor logs and engine logs usually provide useful information)

If you can get your customer’s old SD card and plug it into a PI4 on your network that might also help with diagnosing the problem. Otherwise, as I mentioned before, trying to replicate the problem on a lab device might be the only solution left :frowning:

The script receives 24 character strings from a sensor, compiles them into a json using memory only, and sends the json on to my server. The script doesn’t touch the SD card ever.

The first one has been running since Dec 2020, the second since March this year. Running same script, only difference is each one’s unique identifier to connect to my server. Last update was couple months ago which contained one minor change.

My customer who has wifi is not technically savvy enough to connect to it or to set up port forwarding so I can access it from here.

Do you come across similar issues often from other users?

When I get the SD card, what would I be looking for?

So my third and last RPI4 decided to give the same error today, stroke of luck I guess. This is one with me on wifi so hopefully I can figure out what the deal is.