I’ve purchased a balenaFin v1.1 and used it to implement an IoT controller.
The Fin basically collects sensors data and then posts the data to a server which is connected over the Ethernet port on the Fin.
Soon after installation at site, I started facing an issue on the connection to the server where the OS reports that eth0 is unknown. After rebooting, eth0 is recognized. So, I wrote a small script to check the connectivity and reboot the device if required.
After few days of readings, I found out that the failures are happening between 13:00 - 14:30 mostly.
Here is a temperature profile (in degC) for reference in my city (Dubai, UAE)
Have you tried running this device in a temperature-controlled environment or just in the current location? We’d like to try to rule out temperature as an issue. You can get the CPU temperature by running cat /sys/class/thermal/thermal_zone0/temp from the HostOS. Divide that number by 1000 (cpu_temp/1000) to get the CPU temp in C.
Besides checking the CPU temp load, it would be very helpful if you could enable persistent logging and post the dmesg output during the period where the network connection is lost. That would help us narrow down the possible root causes.
I have added the code for logging the temperature and dmesg (dmesg is logged only when Ethernet failure is detected), will get the logs tomorrow.
My visit to the device installation site was at around 16:00, The CPU temperature was varying between 67 and 70 (in C).
The Fin datasheet specifies the “Operating Temperature” as 70 degC max, normally this would mean “Operating Ambient Temperature”, I hope I am not wrong?
You are correct, the operating temperature on the datasheet is operating ambient temperature. One thing to keep in mind is that the ambient temperature is not necessarily equal to the outside air temperature. For example, if the device is inside a case, the ambient temperature will be higher. Another factor is if you have any other peripherals inside the case generating heat or if the case is at direct sunlight.
Ethernet has apparently got corrupted. Even after reboot, connection is not started.
I have opened the top cover and kept the device off for about 30 minutes (modification work on cabling), then when started back, the temperature range is in the 50s.
I have lost the log files for today unfortunately.
For time being, I have installed a USB-Ethernet module as eth1 to keep the connectivity.
So,
How to know if eth0 is dead or any chance of recovery?
I believe the developer kit’s DIN rail case needs improvement to provide more ventilation (and the DIN mount needs improvement as well, I was never able to mounted on the rail, I could only slide it from the side).
Thanks for the detailed information. Is the USB-Ethernet module working fine? I’m asking because the USB and ethernet ports on the Fin come from the same chip, so if the Ethernet port is shut down due to high temperature so will the USB ports.
If you can confirm the above, we’ll know where to focus our efforts.
Hello,
Curiously we are having a similar issue with two FIN-s in Estonia (far away from Dubai). We have not been able to confirm if it is due to temperature, but we see that the devices start losing connectivity around lunch time and soon the go offline without self recovery. It could be due to heat build-up (industrial use case), but as mentioned, I cannot fully confirm it yet. The devices are located at our customer and we have a meeting at their plant in the coming Friday. I can then find out much more on site.
Would be curious to know if there can be issues when ambient temperature gets above 50C and am following this topic with great interest.
Thank you!
From the logs you shared, I can see that the USB-to-Ethernet interface is getting disconnected every once in a while, but (as the rest of the USB devices) gets reconnected shortly after.
We’re investigating a similar issue reported by another customer and your information has proven very useful. We’ll be running more tests this week in a controlled temperature environment so I’ll keep this thread updated with the results.
Let us know if there’s anything else we can do to help in the meantime.
The estonian Fin “dying out” issues are now a bit more tested and can really be related to temperature.
We have made tests with a Fin by making it log CPU temperature and slowly heating the ambient temperature. CPU temp is only indicative, of course, because the device is not dying because of CPU, but still depends on inner and ambient temperature and it is the easiest thing to monitor.
The result is that whenever the device starts to get hot (our CPU temp readings even just as low as 62, but device itself felt really hot), it does not connect to network any more and finally goes completely offline.
Maybe it is worth to mention that we have observed this issue on two different Fins in parallel. One of them is connected via normal LAN socket and uses an external modem and the other one is connecting with an internal 4G modem and a SIM card. Both Fins act pretty much the same despite the different setup.
During our test today we also managed to get the Fin out of this “dying soon” condition by cooling it down when the connectivity issues started and temperature was getting too high. When Fin was cooler, it performed well again.
However, when it went completely offline, it did not recover without a power disconnected reboot any more even when we cooled it down properly.
So in our case in Estonia we can probably install the Fins in a more cool area and it could help to solve our problems at the customer, but I am still happy to continue monitoring this thread and perhaps the devices can become more temperature resistant in time.