New devices not connecting - existing devices unresponsive and disconnecting

I have been trying to test some code on a device for a large part of the evening. In total I created 4 new devices from the dashboard for fleet 1770574 (I am unsure if this is useful for product support).

The first device appeared on the dashboard marked as factory build release. It seemed to have the services available, however, configuration and environment variables did not take affect.

The second device again appeared as factory build, however, it never seemed to download services. It was just stuck in an initialising state.

The third and fourth devices have not shown up in the dashboard. One of the devices is blinking a 4 blink pattern but the other has an active green LED but is not on the dashboard.

Is there an issue with the Balena Cloud services? Or have I managed to incorrectly flash 4 devices? Any support would be much appreciated.

I have also seen a complete deterioration in the ability to see device logs on the Balena dashboard. We are still running starter applications as we are yet to release a product. Does this have any impact on the availability of Balena Cloud services?

Last night we had a number of device disconnect from the Balena platform and then reconnect. These issues occurred from 8pm GMT last night (9th November) to around 4am GMT this morning (10th November). To me this seems like all part of a wider issue on reliability. However, the only reported outage was related to the git builder.

Hello Henry, we did indeed have Outages last night, and just posted a recap here: Balena.io Status - Elevated GIT/Application Builder Errors

If you go ahead and try again, you should be fully functional now, thanks.

Hi @dtischler, thanks for your message. I think we have chatted before, perhaps at FOSDEM, or maybe I just recognise you from the Balena live stream chats.

I did see the Balena Status, but assumed that it did not affect us. We build on our own AWS EC2 ARM instance and the issues I was having were not related to deploying releases. In this case do Application Builder Errors also affect the day to day running of the devices?

I can see from the summary point on the outage that this was an overall performance issue. I would like to be in a position where I can clearly see from the status updates why devices might have certain issues. If the status were more like “System Wide Performace Issues” or “Device Release and Management Outage” I would have quickly identified it was not a client error.

Thanks again for your response. We do really love the Balena service.

1 Like

Glad to hear that overall, you’re enjoying balena Henry! And thanks for the reminder, I need to get my FOSDEM submission in. :yum:

We have been discussing ways to improve our overall communications regarding Outages such as this one, as the point you raise is definitely valid: in this particular case, the API was also down, which in turn is absolutely caused more widespread issues, and did affect the “regular” functionality you described.

Thanks again and apologies for the inconvenience, we’ll work on more clearly communicating our system statuses. :slight_smile: