@adamshapiro0 I’ve been running the test for almost a day on the Devkit, it has transferred about 7GB and it hasn’t failed yet. I’m running on an empty App, with one instance of the cpu load script:
root@00d8c8f:~# cat /etc/os-release
ID="balena-os"
NAME="balenaOS"
VERSION="2.71.3+rev1"
VERSION_ID="2.71.3+rev1"
PRETTY_NAME="balenaOS 2.71.3+rev1"
MACHINE="imx8mm-var-dart"
VARIANT="Development"
VARIANT_ID="dev"
META_BALENA_VERSION="2.71.3"
RESIN_BOARD_REV="9925f5d"
META_RESIN_REV="15117bd8"
SLUG="imx8mm-var-dart"
root@00d8c8f:~# date && uptime && ls -l /mnt/data/ttydata.bin
Tue Mar 30 07:15:02 UTC 2021
07:15:02 up 22:28, 1 user, load average: 1.23, 1.23, 1.19
-rw-r--r-- 1 root root 7234537880 Mar 30 07:15 /mnt/data/ttydata.bin
root@00d8c8f:~# date && uptime && ls -l /mnt/data/ttydata.bin
Tue Mar 30 07:15:16 UTC 2021
07:15:16 up 22:28, 1 user, load average: 1.19, 1.22, 1.19
-rw-r--r-- 1 root root 7235731688 Mar 30 07:15 /mnt/data/ttydata.bin
root@00d8c8f:~# date && uptime && ls -l /mnt/data/ttydata.bin
Tue Mar 30 07:21:34 UTC 2021
07:21:34 up 22:34, 1 user, load average: 1.10, 1.14, 1.16
-rw-r--r-- 1 root root 7270151250 Mar 30 07:21 /mnt/data/ttydata.bin
root@00d8c8f:~# stty -F /dev/ttymxc2
speed 1500000 baud; line = 0;
min = 1; time = 0;
-brkint -icrnl -imaxbel
-opost
-isig -icanon -iexten -echo
The SOM I’m using has the code 1933 and the carrier board is VAR-DT8MCustomboard V1.4. Are you using the same setup?
I’ve been doing my testing on our custom device, which is based on their carrier board and uses the DART-MX8M-Mini SOM. I don’t currently have a carrier board I can test on but that is what @anathan84 was using to test Variscite’s OS image above.
Once he’s around in a bit we can image that board with the Balena image and empty application and run the same test that I ran on our device.
@acostach how are you generating your incoming UART data and what is the generated data rate? Is your device connected to wifi? Mine is connected to both Ethernet and wifi currently. The DART bt+wifi module uses UART4, so one possibility is that it’s somehow messing with the other UARTs or something when it does transactions or wifi scans or something. I can try testing mine with the wifi disconnected.
root@00d8c8f:~# date && uptime && ls -l /mnt/data/ttydata.bin
Tue Mar 30 13:11:32 UTC 2021
13:11:32 up 1 day 4:24, 1 user, load average: 1.47, 1.21, 1.12
-rw-r--r-- 1 root root 9179786570 Mar 30 13:11 /mnt/data/ttydata.bin
root@00d8c8f:~# date && uptime && ls -l /mnt/data/ttydata.bin
Tue Mar 30 13:11:33 UTC 2021
13:11:33 up 1 day 4:24, 1 user, load average: 1.47, 1.21, 1.12
-rw-r--r-- 1 root root 9179893369 Mar 30 13:11 /mnt/data/ttydata.bin
Ok interesting. I wouldn’t think it would matter, but just for reference we’re sending arbitrary binary data so it includes non-ASCII characters. Our MCU is outputting data at roughly 33 kB/s.
Just a quick update: did another test on my device with the empty application, this time with wifi disconnected, and it died again after 15 minutes. Still need to get the dev board setup with the Balena image to test that way.
@adamshapiro0, @acostach, what is the status of this investigation? I understand that there was a suggestion that @acostach could repeat his tests using binary (non-ASCII) data at a rate around 33kB/s, while @adamshapiro0 mentioned getting a dev board setup with the balena image. Did you a get a chance to make some progress?
I ask because this issue now appears to be a blocker to another (unrelated) issue (a paid support thread) to do with an overlayfs bug in certain combinations of balenaEngine and kernel versions, which is expected to be fixed in newer balenaOS versions, but this UART issue prevents upgrading balenaOS to fix that other issue.
On our end, we still need to do the dev board + Balena image test. I have a new dev board coming but it is currently delayed in shipping. I’m hoping to have it available to test with this weekend.
This ticket is actually a blocker for a few of our issues, the overlayfs issue, as well as the Chrony issue, so we would definitely like to resolve it as soon as possible.
Hi Adam, I’ve ran several tests ranging from 10 to 24 hours each, continuously reading data from /dev/urandom on my PC and then sending it at various rates: 35, 80 and 300kB/s.
Still hasn’t failed on the devkit. I’ve also kept 1 core load as you do, no application pushed and read data with cat:
I finally got my new dev board but I am having trouble imaging it for some reason - it got through provisioning on the dashboard, but then after I booted it from internal flash and got to the “going to reboot” step it just never rebooted, and power cycling just seems to keep coming to the same “going to reboot” message. Need to figure out what’s going on there so I can test with it.
That aside, unfortunately the MCU UART is hard wired into the IMX8 so we can’t replace its data stream with an external one using the same IMX8 UART. We do have external hookups for other IMX8 UARTs so we can give those a shot.
My current hypothesis is that it could be a floating flow control pin or something like that on the board, though we have not found anything yet (and our design is based directly on the Variscite reference design). Even though flow control is disabled, it’s possible there is an issue in the kernel driver causing weird behavior. If that’s the case, this might be specific to the MCU UART so we might not be able to replicate it with the alternate UARTs. We’ll try them though.
It’s not clear why the board would need to reboot though. On the Variscite iMX8M Mini devkit we have, provisioning works as indicated in the dashboard:
Boot switch is in external position and flasher sd-card inserted
Power up the board, wait for image to get flashed. When flashing is completed the device should power off and notify post-provisioning state in the dashboard. It shouldn’t reboot. The power switch can also be toggled to off once flashing is completed.
Change boot switch back to internal, remove sd-card, power on the board.
The only reboot I can think of would be in the case where a hostOS update was performed.
I figured out what was happening with the reboots: I had left the SD card in the device when I switched to external boot. The switch tells it to load uboot from emmc instead of SD, but apparently uboot itself actually ignores the switch and loads the flasher image from the SD card if it finds it no matter what. So basically, even though the board was configured for internal emmc boot, it was still redoing the flasher step over and over instead of booting the actual host OS image.
Yes, u-boot looks if there’s an sd-card that contains a flasher image, and if it exists, boots it. Only the firmware which loads u-boot checks the boot pins to load u-boot from the desired storage.
The jumper wires connect UART3 (ttymxc2) directly to the dev board from our MCU outputting data at 1.5mbps.
I tried two setups:
Running our software on the IMX8M mini with the 2.71.3+rev1 host image. Result: Data stoppage after about 1 hour.
Running nothing on the IMX8M except the HostOS and the Balena Supervisor. In the host, I ran:
In console 1: cat /dev/ttymxc2 > /var/lib/docker/volumes/1623312_resin-data/_data/garbage
In console 2: while : ; do : ; done
In console 3: watch -n1 ls -l /var/lib/docker/volumes/1623312_resin-data/_data/
Result: Data stoppage after ~5 hours. Total data received is 429MB.
As @adamshapiro0 mentioned, this is a total show stopper for us. Our application has a critical data link between the MCU and the IMX8 over this UART and in previous releases, this link is stable for literally days at a time.
(also I checked:
dmesg shows no events over this time except an occasional wlan0: link is not ready. The device is on the hardline so I suppose this expected.
)
Is the issue reproducible with the latest Yocto Dunfell official release v6.7 that has kernel 5.4.85 from Variscite’s website - dart-mx8mm-recovery-sd.v67.img.gz? I recommend flashing that image to the devkit eMMC so that it resembles the original setup. If it isn’t reproducible anymore with that image, we can look into updating to that yocto release.
Quick update on our end. We haven’t yet been able to test with the latest Yocto, but I did have a follow up question for you. In your test, were you using a 5V or 3.3V TTL adapter? Our system is 3.3V @ 1.5mbps and I wanted to check if thats the same as you.
We’ll be working in parallel on getting the Dunfell release tested this week.