All devices got offline simultaneously

Yesterday, all our devices (of all our fleets) got offline for 10 min simultaneously.

This situation was detected by monitoring MQTT connection from the container on our devices.

Their upstream networks are LTEs using multiple carriers.

After recovery, I checked the journal log of one of the devices. I attached the log file by replacing the hostname and the SSID.

It looks there was a problem on OpenVPN.

Could I ask the following questions?

  • Were there any trouble on Balena’s OpenVPN system at the time although the status indicator https://status.balena.io/ is all green?
  • Will the container or the network stop their operation when such a OpenVPN problem happens?

balena_error_20231011_pub.txt (86.7 KB)

Thanks for sharing @Cota

Are you using balenaCloud? I will research more on this!

Thank you for your response.

Yes. I’m using balenaCloud with Prototype plan.

1 Like

@Cota could you please confirm what different carriers are you using? are the devices in the same geographical area?

We didn’t have any issue yesterday and it’s odd to see issues such as

Cannot resolve host address: cloudlink.balena-cloud.com:443

Do you know if the carriers use the same network provider?

@mpous

Thank you for your clarification on the status of your system yesterday.

Our fleet is using Docomo, au and Softbank (Top 3 carriers in Japan).
Their geographical areas are Tokyo, Kanagawa, Ibaraki, Ishikawa and Hyogo (Kanto, Hokuriku and Kansai region in Japan).

Cannot resolve host address: cloudlink.balena-cloud.com:443

Humm… It looks DNS problem. I don’t know whether the carriers use the same DNS.

I will search their DNS information and their status of yeesterday.

Although the carriers are different, the SIM cards for the devices were provided by the same MVNO.

Now I suspect the DNS servers specified by the MVNO were down or under maintenance at that time.
I’m asking the MVNO.

I’ll share the result soon after any answer comes.

1 Like

Thanks @Cota let us know if this is the case!

@mpous

Finally, the MVNO we are using told me that the period was for the scheduled maintenance of the network (including DNS servers) provider that the MVNO uses.

Unfortunately, the MVNO does not provide advance notification service. Then, I directly registered the maintenance notification service of the network provider today.

Thank you for your kind support.

Thanks for the confirmation @Cota

1 Like