USB controller problem during higher heat inside of control box

Hello,
we are using balenaFins for an IOT project and have some issues with the network connection and the USB controller.

Description of environment:
The balenaFins are placed in control boxes (stainless steel) and are working as controller and logging units for sensors. In the control box there are as well power supplies and electrical wirings.
The balenaFin case has an open lid and is connected to a din rail mount. It is powered with a 24V power supply.

Os:

  • Raspbian (image form the balena website)

Hardware:

  • balenaFin version 1.1
  • RPi compute module 3+ lite

Connections:

  • 3 DI inputs for power supply status
  • 8 DO relay controls
  • 2x USB serial converters
  • 1x network interface

Problem description
During higher temperatures we are loosing the network connection to the device. At the moment we have no temp probe inside the control box, but when I checked yesterday the inside temperature of the cage, it was about 42°C (Outside temperature 29°C).

After I opened up the control box and it cooled down to the outside temperature, the balenaFin started to work again. Closing the control box caused again problems. Later in the evening, the balenaFin was running fine until the next late morning (09:58 am). Since then, it is not responding any more. The last reported temperature was 60°C.

Troubleshooting steps done
*Changed the balenaFin board and the compute module -> same results

I have saved the system logs form the balenaFin, it looks like the USB controller stopped working and devices get reattached.

Extra Info
We have also installed a “normal” RPi 3B+, for testing, it is working without any problems and shows up temperature above 66°C. It does the same function as the balenaFin and is located within the same control box.

Questions
Has anyone the same problems as we have?
Is there some recommendation for us to do?
Does anyone has as well problems with USB controller?

Thank all of you in advance for any kind of input.

Hi @Frorh , thanks for bringing this up. I have a couple of questions that would help me understand better what’s happening on your devices:

  1. Are you using our maintained version of raspbian for the balenaFin? ( https://www.balena.io/fin/1.1/docs/downloads/ )
  2. Are you using an external antenna, since you are enclosing the device in a metal cage? this might explain why when you open the lid it starts working again
  3. Can you please share the system logs with us? you can also do it privately sending an email to fin@balena.io mentioning this forum thread if you are worried they might contain sensitive information

Best regards,

Carlo

Hello @curcuz, thank you very much.

Please find my answers below:

  1. Yes, we are using your maintained version for the balenaFin.
  2. The balenaFin is connected to a cell router via ethernet.
  3. I will mail the system log.

I will go today to the location of the control box and install a temperature probe inside to compare the inside and outside temperature . I hope to get more information about the behaviour.

Next, I have discovered that the balenFin started responding to ICMP requests yesterday late in the afternoon. Unfortunately the device is in an in between state and I am not able to access it via SSH ( I will also collect this logs and send them to you).

Best Regards,

Frorh

Hi @Frorh , thanks, looking forward for the logs!

Hello @curcuz, I have sent the logs. This morning I have installed a temperature sensor within the control box. At the moment we have a temperature of 47°C inside the panel, the outside temperature is 27 degree.
The balenaFin was able to start (I have seen the DHCP request) but after that I was not able to access it.
I have brought the balenaFin inside, there it powered up and I was able to get the system logs. I will do some further more testing with the balenaFins and will keep you informed.

I am looking forward to your feedback about my logs.

Best Regards,

Frorh

Hey,

We had thermal problems with USB. In our case, I can’t be sure it ever affected the controller itself.

The main problem for us was heat transferring back through the ports to some of the USB power supply chips on the other side that would get to hot and throw an over amp code back to the USB controller / OS and cause a sudden reset. This would happen so often the devices would take themselves offline. The annoying part was this only occurred when I put the lid on the cases.

We were using the onboard wifi as a connection, but when I tried to give them direct Ethernet access it wasn’t able to bring them back online.

See my post here for how we resolved our issue. I am not sure it would work in your case or if we are even having the same problem.

Good luck.
-Thomas

Hello,
thank you @tacLog, your input is very appreciated. I think we might have a similar problem.

Test description
We have done some further more testing yesterday, I would like to share our experience. Our goal was to figure out the temperature where we get a problem. To figure out this we installed two temp probes, one outside and one inside temperature probe.

Test

  • During the test the outside temperature stayed between 27 - 28.5°C.
  • The control box temperature was between 42 (open control box lid) - 46°C (closed control box lid).
  • The device was connected via ethernet.

With an open control box the balenaFin was working. With the lid closed problems started at 42-43°C after 44°C the balenaFin was not responding any more. Once the lid was open again the balenaFin started responding immediately to ICMP packets again and we could access it via SSH. We have done this test multiple times and the behaviour was always the same. (With USB and without USB devices connected).
When we left the device outside of the control box for some time and let it cool down, it was working for approximately 11 minutes after we placed it back inside with a closed control box. After this time we could see the on/off behaviour again, opening and closing the panel.

The device did not reboot itself during the tests, we loose only connection to USB devices and the Ethernet interface during environment temperatures over 42°C.

I have attached a picture from a part of the control box. The Raspberry Pi 3B+ at the uppest layer has done the temperature measurements during our tests.

@curcuz, do you have gotten our logs? Have they been helpful to trace down the root cause?

Hello @Frorh,

Thank you for the extra information. We received the logs and are currently analyzing. We believe the problem might be related to the one Thomas shared above. I will take a closer look at the logs and get back to you with my findings.

Cheers,
Nico.

@Frorh just to keep you updated on our progress, we built a heat chamber to test a similar setup at different temperatures. I’ll update the thread as soon as I have news.

2 Likes

What is the status of this thermal investigation? We have fielded 6 BalenaFin boards in production systems and have not successfully been able to keep any of them online over temperature. Monitoring the CPU reported temp, they all start to fail around 58C and will not come back online until they cool back down below 55C. We have modified our heatsink technology and will see if this helps, but if this a USB over power problem, changing the heatsink will not solve it.

Any guidance would be much appreciated. I also would like to see a qualification report for the Industrial temp range and what testing was completed to prove these boards will operate across the full range of temp and loading conditions specified. Is this report available on the Balena Fin site?

Rich

1 Like

Hey, I’ve pinged the team to get an update on this

Hey @MotivDev,

We solved this issue with this solution. Note we changed the small copper caps to one big one with thermal compound underneath for ease of install.

We have 14 fins in the field that have been running 24/7 with this solution. Granted they aren’t in conditions where it gets anywhere near 58C outside so they only have to disperse their own heat.

I am still trying to release design files for the lid we designed to deal with this issue.

In my testing, I think I have eliminated the possibility of a USB overpower problem by creating that problem with chained hubs and not experiencing the same issue set. The problem wasn’t heating the fin was generating but the heat that was flowing in from our USB devices. However, your use case will chance this as our USB devices generate all their heat within 1cm of the fin.

If you have any questions about how we designed around this, please let me know as I have come to hate thermal issues. :slight_smile:

Cheers
-Thomas

Hello @MotivDev,

All the components used in the balenaFin are rated for the -20 to +70C range. It should be noted that the rating corresponds to the operating temperature (chip temperature) and not the ambient temperature, which can be lower depending on external factors. For example, air flow, nearby heat sources, type of case, processing load, etc.

Our theory is still that the problem lies in the USB circuitry (keep in mind that the ethernet controller is in the same chip as the USB hub), due to the tests we did with @tacLog and the logs we received from @Frorh.

We are currently conducting ambient temperature tests with different load parameters to be able to provide more information on that end. It will also help us understand what are the tools needed to use the balenaFin at high ambient temperatures. The reason it’s taking time is because we needed to build a proper temperature chamber and acquire measurement tools and sensors.

Cheers,
Nico.

Hi @ntzovanis!

I can give some validity to the failure maybe USB related. We are using 2 Fin boards in each system and the one Fin with a bigger power draw on the USB never comes back online. The other Fin will come on and offline based on temperature changes, it also has a USB device plugged into it, but that USB device uses less current.

I would offer log files but all of my devices have been offline for over a couple of days now and I have no way to get to them except when they are online.

I look forward to your findings and any solutions to address this problem.

Rich

@MotivDev thanks for adding the USB-related info - the more information we can gather only helps to resolve the issue so thanks again. I’ve let Nico know.