I’ve got my app setup to connect via WiFi and it works fine until (due to unavoidable power problems) my WiFi reboots. At this point, my device stays offline until I have to manually reboot it.
I’m considered switching to Ethernet because of this but is this a known issue and is there a way to resolve this?
This is not supposed to happen. Any chance you could use a development image and try to use the serial console when that happens so we can diagnose it? Or just hard-reboot the router to simulate the same condition and try to debug it over the serial debug console?
You could download a .dev image from the resin dashboard and flash this image to a new sd card that you could use to boot your rpi with.
The development image allows you to use the serial debug console on the rpi. Do you need instructions with how to connect to the serial debug console on your device?
We also experienced this issue rarely, but this is a no go for us because we have our devices running at partner locations and we have to call them to manually reboot the devices when their wifi restarts. For us it is not reproducible on each wifi restart, most of the time it will reconnect just fine but sometimes it seems to hang forever and will not reconnect.
I was running the production image on my development device therefore i could not debug the issue. I will install the development version soon, but this should really not be happening.
We’ve experienced the issue on at least two different devices, therefore there is definitely something going on. We are using the latest version of resinOS and the supervisor:
@peterjuras how many restarts does it take to reproduce? Could you have a go at reproducing with a very minimal container to rule out your application?
It would be really helpful if you could get dmesg and network manager logs next time this happens.
I am seeing the same problem. A device on site occasionally loses all connectivity if site’s WiFi is interrupted, and it can only be solved by a device reboot. It’s revision:
Raspberry Pi 3, Resin OS 2.0.3+rev1 (prod), supervisor 4.2.2
The device is mounted inside a tamper-proof box, many miles away, so I’m not going to be able to debug it directly.
Is there a way of forcing the device to automatically do a full hardware reboot if it loses connectivity for more than a given period of time (e.g., a few hours). It’s not an ideal solution but at least it means if I do lose my connection to it I can rely on knowing I’ll get it back within a few hours.
@joe I’m not sure how many restarts it takes, and I also can’t allocate time right now to debug this issue. The whole point of resin for us is to worry about our application while running in a serviced host environment.
I’ll see whether I can reproduce it in the future and debug it, but I don’t have the equipment here to deal with a serial connection so I’m not sure whether I even can access the mentioned logs.
I put the developer build of resinOS on my developer device now, I hope this will make it easier to diagnose the issue should it occur again.
But in general I have seen this issue on multiple devices by now and it has been confirmed by @dwtowner so I hope that this will be found and fixed asap.
Hey @peterjuras, I have just set up a test rig for this. I have a resinOS 2.3.0 build connected to a mobile hotspot and I am systematically bringing the hotspot up and down over and over. I have a serial cable hooked up so I can monitor the NetworkManager logs and hopefully I will be able to recreate this issue so we can get to the bottom of it. A few questions about your case.
Does the router just drop out for a minute or two or does it go down for an hour or more?
What kind of application code is running in the app and does it ever interact with networking, etc? My current test App is just a simple node.js server which doesn’t do much at all, so it would be good to get a more representative usecase.
How close to the router is the RPI3, I plan to do a number of range tests as well to rule out the antenna being flakey, so any range estimates will help greatly.
Hopefully I can cat something with these tests
@dwtowner I just had a look through the resinOS changelog and it seems that in resin OS 2.0.7 we changed the default NetworkManager Automatic retry setting to infinity so in your case it may be that the 2.0.3 version has the default autoretry setting for NetworkManager which I believe is 4 retries per connection. I think there is a setting that can be added to the connection file to force infinite retries, but I will need to try dig up that info.
Hi @shaunmulligan, thanks for having a look at this issue:
I’m not sure I know the real answer to this, in one case the router has been down for a few hours, the last time I encountered the problem it has been offline for less than 3 minutes (I did network interrupt tests for our application which was running on the Pi)
It is a networked application, some parts use MQTT to talk to AWS IoT, other parts make plain HTTPS calls. But it is not only our code which is not connected to network anymore, I also don’t see the device in resin anymore and also not on the local network.
We don’t expect or work with anything Wifi related inside the application, we only expect a working internet connection, therefore I don’t think our application is in any way touching the network manager configuration.
It depends again, but I don’t think this is a big issue. For one of our partner locations I think the Pi is around 3-4 meters from the router, my developer device is less than a meter away (pretty much just next to it).
great, thanks for the info @peterjuras, I will try do a barrage of tests and see if I can get the device into this state. I will update the thread as and when I find interesting stuff. Hopefully we can get to the bottom of it soon.
Yeah, self-service updates are coming (working on it as we speak), in the meantime we can manually update your device to the latest version, to see if it works better for your use case. (see the announcement of that beta update program).