Thermal Operating Range of BalenaFin

I’ve purchased a balenaFin v1.1 and used it to implement an IoT controller.

The Fin basically collects sensors data and then posts the data to a server which is connected over the Ethernet port on the Fin.

Soon after installation at site, I started facing an issue on the connection to the server where the OS reports that eth0 is unknown. After rebooting, eth0 is recognized. So, I wrote a small script to check the connectivity and reboot the device if required.

After few days of readings, I found out that the failures are happening between 13:00 - 14:30 mostly.
Here is a temperature profile (in degC) for reference in my city (Dubai, UAE)

Temperature is definitely peaking around the failure time, however, the Fin is supposed to be rated up to 70 degC.

The Fin is mounted on a shaded control panel.

WiFi connectivity is perfect.

So, my question is, why the Ethernet controller is failing?
And more importantly, how to resolve this issue?

Hi @EngMoath,

Have you tried running this device in a temperature-controlled environment or just in the current location? We’d like to try to rule out temperature as an issue. You can get the CPU temperature by running cat /sys/class/thermal/thermal_zone0/temp from the HostOS. Divide that number by 1000 (cpu_temp/1000) to get the CPU temp in C.

John

Hello @jtonello,
Thanks for your reply.
I did not test the Ethernet in the controlled environment extensively, just a simple function check.

I have the below script configured as a crontab every 15 minutes:

#!/bin/bash

LOGFILE=/home/zero/bin/network.log

       if ifconfig eth0 | grep -q "inet" ; then
           echo "$(date "+%m %d %Y %T") : Ethernet OK" >> $LOGFILE
       else
           echo "$(date "+%m %d %Y %T") : Network connection lost" >> $LOGFILE
           systemctl reboot -i
       fi

and the result is (I have truncated the “Ethernet Ok” lines except 1 before/1 after):

07 01 2020 14:15:01 : Ethernet OK
07 01 2020 14:30:01 : Network connection lost
07 01 2020 14:45:01 : Ethernet OK
07 01 2020 15:00:01 : Network connection lost
07 01 2020 15:15:02 : Ethernet OK

07 01 2020 16:15:01 : Ethernet OK
07 01 2020 16:30:01 : Network connection lost
07 01 2020 16:45:02 : Ethernet OK

07 02 2020 14:45:01 : Ethernet OK
07 02 2020 15:00:01 : Network connection lost
07 02 2020 15:15:01 : Network connection lost
07 02 2020 15:30:01 : Network connection lost
07 02 2020 15:45:01 : Network connection lost
07 02 2020 16:00:01 : Ethernet OK

07 03 2020 13:00:01 : Ethernet OK
07 03 2020 13:15:01 : Network connection lost
07 03 2020 13:30:01 : Network connection lost
07 03 2020 13:45:02 : Network connection lost
07 03 2020 14:00:01 : Ethernet OK
07 03 2020 14:15:01 : Network connection lost
07 03 2020 14:30:01 : Network connection lost
07 03 2020 14:45:01 : Ethernet OK
07 03 2020 15:00:01 : Network connection lost
07 03 2020 15:15:01 : Network connection lost
07 03 2020 15:30:01 : Ethernet OK

07 05 2020 14:15:01 : Ethernet OK
07 05 2020 14:30:01 : Network connection lost
07 05 2020 14:45:01 : Ethernet OK

07 06 2020 14:00:01 : Ethernet OK
07 06 2020 14:15:01 : Network connection lost
07 06 2020 14:30:01 : Network connection lost
07 06 2020 14:45:01 : Ethernet OK
07 06 2020 15:00:01 : Ethernet OK
07 06 2020 15:15:01 : Network connection lost
07 06 2020 15:30:01 : Network connection lost
07 06 2020 15:45:01 : Network connection lost
07 06 2020 16:00:01 : Network connection lost
07 06 2020 16:15:01 : Network connection lost
07 06 2020 16:30:01 : Network connection lost
07 06 2020 16:45:01 : Network connection lost

I will add the CPU temperature to the logger and see if we get something interesting.

Regards,
Moath

Hello @EngMoath

Besides checking the CPU temp load, it would be very helpful if you could enable persistent logging and post the dmesg output during the period where the network connection is lost. That would help us narrow down the possible root causes.

Cheers,
Nico.

Hello Nico

Can you please guide me how to enable persistent logging?

Thanks

Hi @EngMoath, you can modify the persistent logging setting via the device dashboard configuration variables. I attach here a screenshot for you.

Georgia

Thanks @georgiats.
I am actually running Raspbian on the Fin, so no dashboard :wink:

I have added the code for logging the temperature and dmesg (dmesg is logged only when Ethernet failure is detected), will get the logs tomorrow.

My visit to the device installation site was at around 16:00, The CPU temperature was varying between 67 and 70 (in C).

The Fin datasheet specifies the “Operating Temperature” as 70 degC max, normally this would mean “Operating Ambient Temperature”, I hope I am not wrong?

Regards,

Hello @EngMoath,

You are correct, the operating temperature on the datasheet is operating ambient temperature. One thing to keep in mind is that the ambient temperature is not necessarily equal to the outside air temperature. For example, if the device is inside a case, the ambient temperature will be higher. Another factor is if you have any other peripherals inside the case generating heat or if the case is at direct sunlight.

Looking forward to seeing the logs.

Cheers,
Nico.

Hey @ntzovanis,

Here is a picture of the installation so that the setup is more clear.

Basically, the Fin is enclosed in the developer kit’s enclosure, mounted on a DIN rail on panel. The panel is sheltered in a mini concrete room.

Network status/temperature log:

07 07 2020 17:00:01 : 69832 | Ethernet OK
07 07 2020 17:15:01 : 67142 | Ethernet OK
07 07 2020 17:30:01 : 66604 | Ethernet OK
07 07 2020 17:45:01 : 67680 | Ethernet OK
07 07 2020 18:00:01 : 66604 | Ethernet OK
07 07 2020 18:15:01 : 67142 | Ethernet OK
07 07 2020 18:30:02 : 66604 | Ethernet OK
07 07 2020 18:45:01 : 66604 | Ethernet OK
07 07 2020 19:00:01 : 66604 | Ethernet OK
07 07 2020 19:15:01 : 65528 | Ethernet OK
07 07 2020 19:30:01 : 65528 | Ethernet OK
07 07 2020 19:45:01 : 64990 | Ethernet OK
07 07 2020 20:00:01 : 64990 | Ethernet OK
07 07 2020 20:15:01 : 64990 | Ethernet OK
07 07 2020 20:30:01 : 64990 | Ethernet OK
07 07 2020 20:45:02 : 64990 | Ethernet OK
07 07 2020 21:00:01 : 64990 | Ethernet OK
07 07 2020 21:15:01 : 64452 | Ethernet OK
07 07 2020 21:30:01 : 64452 | Ethernet OK
07 07 2020 21:45:01 : 64452 | Ethernet OK
07 07 2020 22:00:01 : 64452 | Ethernet OK
07 07 2020 22:15:01 : 64452 | Ethernet OK
07 07 2020 22:30:01 : 64452 | Ethernet OK
07 07 2020 22:45:01 : 64452 | Ethernet OK
07 07 2020 23:00:01 : 63376 | Ethernet OK
07 07 2020 23:15:01 : 64452 | Ethernet OK
07 07 2020 23:30:01 : 63376 | Ethernet OK
07 07 2020 23:45:01 : 63376 | Ethernet OK
07 08 2020 00:00:01 : 63376 | Ethernet OK
07 08 2020 00:15:01 : 63376 | Ethernet OK
07 08 2020 00:30:01 : 63376 | Ethernet OK
07 08 2020 00:45:01 : 62300 | Ethernet OK
07 08 2020 01:00:02 : 62838 | Ethernet OK
07 08 2020 01:15:01 : 62838 | Ethernet OK
07 08 2020 01:30:01 : 62838 | Ethernet OK
07 08 2020 01:45:01 : 62838 | Ethernet OK
07 08 2020 02:00:01 : 62300 | Ethernet OK
07 08 2020 02:15:01 : 62300 | Ethernet OK
07 08 2020 02:30:01 : 62300 | Ethernet OK
07 08 2020 02:45:01 : 62300 | Ethernet OK
07 08 2020 03:00:01 : 62300 | Ethernet OK
07 08 2020 03:15:01 : 62300 | Ethernet OK
07 08 2020 03:30:01 : 62300 | Ethernet OK
07 08 2020 03:45:01 : 62300 | Ethernet OK
07 08 2020 04:00:01 : 61224 | Ethernet OK
07 08 2020 04:15:01 : 61762 | Ethernet OK
07 08 2020 04:30:01 : 61224 | Ethernet OK
07 08 2020 04:45:01 : 62300 | Ethernet OK
07 08 2020 05:00:01 : 61224 | Ethernet OK
07 08 2020 05:15:02 : 61224 | Ethernet OK
07 08 2020 05:30:01 : 60686 | Ethernet OK
07 08 2020 05:45:01 : 61224 | Ethernet OK
07 08 2020 06:00:01 : 61224 | Ethernet OK
07 08 2020 06:15:01 : 60686 | Ethernet OK
07 08 2020 06:30:01 : 60686 | Ethernet OK
07 08 2020 06:45:01 : 61224 | Ethernet OK
07 08 2020 07:00:01 : 61224 | Ethernet OK
07 08 2020 07:15:01 : 61224 | Ethernet OK
07 08 2020 07:30:01 : 61762 | Ethernet OK
07 08 2020 07:45:01 : 61762 | Ethernet OK
07 08 2020 08:00:01 : 62300 | Ethernet OK
07 08 2020 08:15:02 : 62300 | Ethernet OK
07 08 2020 08:30:01 : 62300 | Ethernet OK
07 08 2020 08:45:01 : 63376 | Ethernet OK
07 08 2020 09:00:01 : 63914 | Ethernet OK
07 08 2020 09:15:01 : 63914 | Ethernet OK
07 08 2020 09:30:01 : 64990 | Ethernet OK
07 08 2020 09:45:01 : 64452 | Ethernet OK
07 08 2020 10:00:01 : 65528 | Ethernet OK
07 08 2020 10:15:01 : 65528 | Ethernet OK
07 08 2020 10:30:01 : 65528 | Ethernet OK
07 08 2020 10:45:01 : 66604 | Ethernet OK
07 08 2020 11:00:01 : 66604 | Ethernet OK
07 08 2020 11:15:01 : 67142 | Ethernet OK
07 08 2020 11:30:01 : 67142 | Ethernet OK
07 08 2020 11:45:01 : 67680 | Ethernet OK
07 08 2020 12:00:01 : 68218 | Ethernet OK
07 08 2020 12:15:01 : 68756 | Ethernet OK
07 08 2020 12:30:02 : 68756 | Ethernet OK
07 08 2020 12:45:01 : Network connection lost
07 08 2020 13:00:01 : 67680 | Ethernet OK
07 08 2020 13:15:01 : Network connection lost
07 08 2020 13:30:01 : Network connection lost
07 08 2020 13:45:01 : Network connection lost
07 08 2020 14:00:01 : Network connection lost
07 08 2020 14:15:01 : Network connection lost
07 08 2020 14:30:01 : Network connection lost
07 08 2020 14:45:01 : Network connection lost
07 08 2020 15:00:01 : Network connection lost
07 08 2020 15:15:01 : Network connection lost
07 08 2020 15:30:01 : Network connection lost
07 08 2020 15:45:01 : 68218 | Ethernet OK
07 08 2020 16:00:01 : Network connection lost
07 08 2020 16:15:01 : Network connection lost
07 08 2020 16:30:01 : Network connection lost
07 08 2020 16:45:01 : 72522 | Network connection lost

I missed to record the temperature at time of failure, however, since it is 15 minutes difference, I believe it is not that much different.

The full dmesg log is attached

dmesg.log (733.4 KB)

Ok, here is the updates as of yesterday:

  • Ethernet has apparently got corrupted. Even after reboot, connection is not started.

  • I have opened the top cover and kept the device off for about 30 minutes (modification work on cabling), then when started back, the temperature range is in the 50s.
    I have lost the log files for today unfortunately.

  • For time being, I have installed a USB-Ethernet module as eth1 to keep the connectivity.

So,

  1. How to know if eth0 is dead or any chance of recovery?

  2. I believe the developer kit’s DIN rail case needs improvement to provide more ventilation (and the DIN mount needs improvement as well, I was never able to mounted on the rail, I could only slide it from the side).

Hello @EngMoath ,

Thanks for the detailed information. Is the USB-Ethernet module working fine? I’m asking because the USB and ethernet ports on the Fin come from the same chip, so if the Ethernet port is shut down due to high temperature so will the USB ports.

If you can confirm the above, we’ll know where to focus our efforts.

Regards

Hello @ntzovanis,

Yes, the USB-Ethernet worked fine on initial testing.

Friday and Saturday are weekend over here, so I can get the logs on Sunday to see the performance over time.

Moath

Hello,

I have attached the log data of dmesg as well as my temp/network check script.

Moath

network_eth1.log (6.7 KB)
dmesg.log (150.2 KB)

Hello,
Curiously we are having a similar issue with two FIN-s in Estonia (far away from Dubai). We have not been able to confirm if it is due to temperature, but we see that the devices start losing connectivity around lunch time and soon the go offline without self recovery. It could be due to heat build-up (industrial use case), but as mentioned, I cannot fully confirm it yet. The devices are located at our customer and we have a meeting at their plant in the coming Friday. I can then find out much more on site.
Would be curious to know if there can be issues when ambient temperature gets above 50C and am following this topic with great interest. :slight_smile:
Thank you!

Best regards,
Tauno

Hey Tauno,

This is interesting why would the ambient get to 50C in Estonia? Is it running around some machinery?

As I mentioned earlier, removing the cover helped reduce the maximum CPU temperature from around 72.5C to 67.5C.

Here is the script which I am running to log the temperature and Ethernet status (a crontab entry is running the script every 30 minutes)

LOGFILE_eth1=/home/zero/bin/network_eth1.log
LOGDMESG=/home/zero/bin/dmesg.log


if ifconfig eth1 | grep -q "inet 172." ; then
	echo "$(date "+%m %d %Y %T") : $(cat /sys/class/thermal/thermal_zone0/temp) | eth 1 : Ethernet OK" >> $LOGFILE_eth1
else
	echo "$(date "+%m %d %Y %T") : $(cat /sys/class/thermal/thermal_zone0/temp) | eth 1 : Network connection lost" >> $LOGFILE_eth1
	echo "-------------- START -------------------" >> $LOGDMESG
	echo "$(date "+%m %d %Y %T") : $(cat /sys/class/thermal/thermal_zone0/temp) | eth 1 : Network connection lost" >> $LOGDMESG
	echo "-----------------------------------------" >> $LOGDMESG
	echo "$(dmesg)" >> $LOGDMESG
	echo "--------------- END --------------------" >> $LOGDMESG
	systemctl reboot -i
fi

Hello @EngMoath

From the logs you shared, I can see that the USB-to-Ethernet interface is getting disconnected every once in a while, but (as the rest of the USB devices) gets reconnected shortly after.
We’re investigating a similar issue reported by another customer and your information has proven very useful. We’ll be running more tests this week in a controlled temperature environment so I’ll keep this thread updated with the results.
Let us know if there’s anything else we can do to help in the meantime.

Cheers,
Nico.

Hello,

The estonian Fin “dying out” issues are now a bit more tested and can really be related to temperature.

We have made tests with a Fin by making it log CPU temperature and slowly heating the ambient temperature. CPU temp is only indicative, of course, because the device is not dying because of CPU, but still depends on inner and ambient temperature and it is the easiest thing to monitor.

The result is that whenever the device starts to get hot (our CPU temp readings even just as low as 62, but device itself felt really hot), it does not connect to network any more and finally goes completely offline.

Maybe it is worth to mention that we have observed this issue on two different Fins in parallel. One of them is connected via normal LAN socket and uses an external modem and the other one is connecting with an internal 4G modem and a SIM card. Both Fins act pretty much the same despite the different setup.

During our test today we also managed to get the Fin out of this “dying soon” condition by cooling it down when the connectivity issues started and temperature was getting too high. When Fin was cooler, it performed well again.

However, when it went completely offline, it did not recover without a power disconnected reboot any more even when we cooled it down properly.

So in our case in Estonia we can probably install the Fins in a more cool area and it could help to solve our problems at the customer, but I am still happy to continue monitoring this thread and perhaps the devices can become more temperature resistant in time. :slight_smile:

Thank you and have a nice day!

Best regards,
Tauno

Hi!

The team is still investigating the issue and is going to test a potential fix. We’ll come back to you as soon as we have any update :slight_smile: