orange pi zero doesnt recover from wifi outaage

PI ZERO logs showing wifi DOWN and wifi UP this is what id expect to see on ORANGE PI

Apr 16 16:30:05 localhost user.info kernel: [   74.182586] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 16 16:30:06 localhost user.info kernel: [   74.296660] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Apr 16 16:30:06 localhost user.info kernel: [   74.310999] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 16 16:30:06 localhost user.info kernel: [   75.040142] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 16 16:30:06 localhost user.info kernel: [   75.041665] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt73.bin'
Apr 16 16:30:06 localhost user.info kernel: [   75.080948] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 1.7
Apr 16 16:30:06 localhost user.info kernel: [   75.187624] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 16 16:30:07 localhost user.info kernel: [   75.423698] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 16 16:30:07 localhost daemon.info dbus-daemon[724]: [system] Activating via systemd: service name='fi.w1.wpa_supplicant1' unit='wpa_supplicant.service' requested 
by ':1.6' (uid=0 pid=808 comm="/usr/sbin/NetworkManager --no-daemon ")
Apr 16 16:30:08 localhost daemon.info dbus-daemon[724]: [system] Successfully activated service 'fi.w1.wpa_supplicant1'
Apr 16 16:30:09 localhost user.warn wpa_supplicant: Libgcrypt warning: missing initialization - please fix the application
Apr 16 16:30:09 localhost user.info kernel: [   77.947496] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 16 16:30:15 localhost user.info kernel: [   83.963407] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 16 16:30:20 localhost user.info kernel: [   88.934909] wlan0: authenticate with e4:f4:c6:41:d3:58
Apr 16 16:30:20 localhost user.info kernel: [   88.988103] wlan0: send auth to e4:f4:c6:41:d3:58 (try 1/3)
Apr 16 16:30:20 localhost user.info kernel: [   89.004938] wlan0: authenticated
Apr 16 16:30:20 localhost user.info kernel: [   89.016267] wlan0: associate with e4:f4:c6:41:d3:58 (try 1/3)
Apr 16 16:30:20 localhost user.info kernel: [   89.019710] wlan0: RX AssocResp from e4:f4:c6:41:d3:58 (capab=0x431 status=0 aid=8)
Apr 16 16:30:20 localhost user.info kernel: [   89.036576] wlan0: associated
Apr 16 16:30:21 localhost user.info kernel: [   89.631092] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Apr 16 16:30:30 localhost user.notice kernel: [   98.839857] Bridge firewalling registered
Apr 16 16:30:38 localhost user.info kernel: [  106.633476] IPv6: ADDRCONF(NETDEV_UP): supervisor0: link is not ready
Apr 16 18:44:12 localhost user.info kernel: [  110.212109] IPv6: ADDRCONF(NETDEV_UP): balena0: link is not ready
Apr 16 18:44:15 localhost user.info kernel: [  112.682732] IPv6: ADDRCONF(NETDEV_UP): br-2aecd679e83f: link is not ready

Hi @osde8info let me ask couple of clarification questions and re-iterate on couple of messages on this thread.

First of all, I see the device as offline now. Could you please clarify your / device’s timezone? What time is it in UTC when you say midnight?

On our side, I see that device was connected to balena VPN aronud 21:00 UTC yesterday. It’d be good to know when our support team could connect to the device to check its logs.

Next, is the app running on OPi any different than the app that other RPi devices are running? Have you applied any specific network / connection configuration to the OPi?

Could you copy paste the contents of the files under /etc/NetworkManager/system-connections/?
I also wonder what the NetworkManager logs look like. Could you run journalctl -a -u NetworkManager and grab the output?

Do you have another OPi with the same setup? Do you experience the same issues on another OPi?
Doing some quick search online, I got the impression that OPi does not have reliable WiFi connection in general. So perhaps it’s better for you to pick another hardware for your IoT deployment.

Finally, what do you think about my colleague’s earlier suggestion?

another workaround may be to wire up a watchdog type script to run periodically on the device, checking for network connectivity and manually reconnecting if detected offline.

It’d be good to know if you could successfully force a wifi reconnect via a script when the device goes offline (before that midnight reset).

@gelbal

  1. looks like my theory about it resetting at midnight was wrong ! it is still down after 20 hrs

  2. im in UTC+1 (but PI starts up at a random time ?)

  3. yes yesterday (wed) it was running fine i created an outage (thu) at 4pm

  4. i havent touched the OPi build all other PI3 and PI4 are recovering from outage

  5. will post those config files asap ( but i will have to reboot it to get access)

  6. ok will run a journal too

  7. no other OPIs just this one on eval before i commit to buying a batch of 1000 (v unlikely atm)

  8. i think i explained why workarounds are unacceptable however i can try an edimax usb stick if that will help you debug the problem

  9. again i am not prepared to modify and create an unsupported config in any way

sorry its such a frustrating bug as soon as wifi is gone you cant see the logs until you reboot

so just to clarify i will REBOOT NOW (10:00 UTC) and start running suggested diagnostics and post requested configs here asap

but feel free to connect in the meantime

@osde8info thanks for the clarification. I have couple of more suggestion to help with troubleshooting.

Could you please enable persistent logging from Device Configuration menu?
This will allow the device to keep the logs across reboots. So we could investigate what went wrong before the reboot potentially fixes the issue.

Next, could you also run Device Health Checks and Device Diagnostics from the Diagnostics menu?
It’d be great if you could send us the Device Diagnostics output.

i have just enabled

Enable / Disable logs from being sent to balena

Enable persistent logging. Only supported by supervisor versions >= v7.15.0.

for you

Thanks @osde8info. I took a look at the logs just now but there is no logs about the earlier disconnect or issues because of the reboot. I don’t see anything obvious right now. The wifi network connection looks good with full signal strength. There is nothing out of ordinary in the device’s NetworkManager configuration.

Could you reproduce the issue again now with the router restart?
With the persistent logging enabled, we will check the logs again.

yes will crash it right now

crashed but its not coming back up

i think i read somewhere that powering off PIs corrupts SD cards so i will reburn another SD card

thinking now it might be simpler if i just send you the OPI in the post or you buy your own for $15

its flashing red LED 4 X now i will insert new SD card

just got a log

main error i see is

Apr 16 17:29:56 localhost user.warn wpa_supplicant: Libgcrypt warning: missing initialization - please fix the application

NO WIFI DOWN MSGS but logs appear to have stopped at 17:30 YESTERDAY and restarted today at 11:33

also seeing

/etc/dropbear/dropbear_*

errors
Apr 16 17:29:55 localhost daemon.info dbus-daemon[591]: [system] Activating via systemd: service name=‘org.freedesktop.nm_dispatcher’ unit=‘dbus-org.freedesktop.nm-dispatcher.service’ requested by ‘:1.4’ (uid=
0 pid=646 comm="/usr/sbin/NetworkManager --no-daemon “)
Apr 16 17:29:55 localhost daemon.info dbus-daemon[591]: [system] Successfully activated service ‘org.freedesktop.nm_dispatcher’
Apr 16 17:29:55 localhost daemon.info nm-dispatcher: req:1 ‘hostname’: new request (1 scripts)
Apr 16 17:29:55 localhost daemon.info nm-dispatcher: req:1 ‘hostname’: start running ordered scripts…
Apr 16 17:29:55 localhost daemon.info nm-dispatcher: req:2 ‘connectivity-change’: new request (1 scripts)
Apr 16 17:29:55 localhost daemon.info nm-dispatcher: req:2 ‘connectivity-change’: start running ordered scripts…
Apr 16 17:29:56 localhost daemon.info dbus-daemon[591]: [system] Activating via systemd: service name=‘fi.w1.wpa_supplicant1’ unit=‘wpa_supplicant.service’ requested by ‘:1.4’ (uid=0 pid=646 comm=”/usr/sbin/Ne
tworkManager --no-daemon ")
Apr 16 17:29:56 localhost daemon.info dbus-daemon[591]: [system] Successfully activated service ‘fi.w1.wpa_supplicant1’
Apr 16 17:29:56 localhost user.warn wpa_supplicant: Libgcrypt warning: missing initialization - please fix the application
Apr 16 17:29:58 localhost daemon.info nm-dispatcher: req:3 ‘up’ [supervisor0]: new request (1 scripts)
Apr 16 17:29:58 localhost daemon.info nm-dispatcher: req:3 ‘up’ [supervisor0]: start running ordered scripts…
Apr 16 17:29:58 localhost daemon.info nm-dispatcher: req:4 ‘connectivity-change’: new request (1 scripts)
Apr 16 17:29:58 localhost daemon.info nm-dispatcher: req:4 ‘connectivity-change’: start running ordered scripts…
Apr 16 17:29:59 localhost daemon.info nm-dispatcher: req:5 ‘up’ [wlan0]: new request (1 scripts)
Apr 16 17:29:59 localhost daemon.info nm-dispatcher: req:5 ‘up’ [wlan0]: start running ordered scripts…
Apr 16 17:29:59 localhost daemon.info nm-dispatcher: req:6 ‘connectivity-change’: new request (1 scripts)
Apr 16 17:29:59 localhost daemon.info nm-dispatcher: req:6 ‘connectivity-change’: start running ordered scripts…
Apr 17 11:33:52 localhost authpriv.warn dropbear[2139]: Failed loading /etc/dropbear/dropbear_dss_host_key
Apr 17 11:33:52 localhost authpriv.warn dropbear[2139]: Failed loading /etc/dropbear/dropbear_ecdsa_host_key
Apr 17 11:33:52 localhost authpriv.info dropbear[2139]: Child connection from ::ffff:52.4.252.97:60716

burning new SD now

resin wifi is

cat /etc/NetworkManager/system-connections/resin-wifi-01

[connection]
id=resin-wifi-01
type=wifi

[wifi]
hidden=true
mode=infrastructure
ssid=NETGEAR33

[ipv4]
method=auto

[ipv6]
addr-gen-mode=stable-privacy
method=auto

[wifi-security]
auth-alg=open
key-mgmt=wpa-psk

this is proving impossible to debug as we seem to be losing lots of lots as soon as wifi disconnects

# journalctl -a -u NetworkManager
-- Logs begin at Fri 2020-04-17 10:50:17 UTC, end at Fri 2020-04-17 11:45:53 UTC. --
Apr 17 11:31:57 e19fbba NetworkManager[646]: <info>  [1587123117.7141] manager: (resin-vpn): new Tun device (/org/freedesktop/NetworkManager/Devices/9)

send me your postal address and i will send you the OPI

obviously you can keep it or throw it away i have no further interest in it my primary concern was a flaw in the balena wifi stack but i dont have any further resources to prove or disprove this

Hi @osde8info

Thank you for the new information that you sent. I’ve taken this up with our product team. Considering what has been discussed so far, as well as other online discussions, it seems that the WiFi chip on the Orange Pi Zero presents many problems, see e.g.

and

Since this is outside of our control, it would not be of much use if you’d ship your own device to us, and there’s unfortunately not really something further we can do to assist you in this regard. Balena on this device has also been a community supported OS only (see here for an explanation of what this entails).

We hope that you’ll have success with the other device types, and that you’ll have a great experience with running them on balena! Please let us know if you encounter any other problems.

Kind regards
Alida

sure but i would just like to point out i have seen none of the wifi problems linked to the ONLY problem i have had with wifi is the balena wifi stack FAILING to RECONNECT after a wifi outage

i will throw the OPI into the nearest rubbish bin

@osde8info There’s one more thing you can try, which is to download the latest balenaOS image from the staging server, and testing the device with that image. There’s a good chance that that might not solve the problem, but it might be worth a shot.