Hello,
We have an application running on a Balena fin and have deployed this to several balena fin gateways in the field. All gateways are running OS 2.46.1+rev3 or 2.51.1+rev1.
The gateways are always powered, but all of the gateways eventually reset - typically within a few hours to a few days at most. If we look at the fleet in Balena cloud, all online gateways have only been online anywhere from a few minutes to 1 day.
Our application is very simple. It periodically performs a scan to discover Bluetooth devices, pulls data from the devices (if needed), and sends the data to the cloud.
I was able to capture the log output from my desk gateway when a reset occurred using the serial UART pins on the 40-pin HAT header. Unfortunately, there is very little useful information in the log. Here is a portion of the output when the reset occurred.
[16117.615955] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[16432.622568] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[16747.630134] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[17044.981471] NET: Registered protocol family 38
[17045.014193] cryptd: max_cpu_qlen set to 1000
[17062.649126] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[17377.652784] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[17623.855779] i2c i2c-3: sendbytes: NAK bailout.
[17623.860313] leds pca963x:green: Setting an LED's brightness failed (-5)
[17691.348509] i2c i2c-3: sendbytes: NAK bailout.
[17691.353001] leds pca963x:blue: Setting an LED's brightness failed (-5)
[17692.695421] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[18008.952954] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
MMC: mmc@7e202000: 0, mmcnr@7e300000: 1
Loading Environment from FAT... WARNING at drivers/mmc/bcm2835_sdhost.c:408/bcm2835_send_command()!
WARNING at drivers/mmc/bcm2835_sdhost.c:408/bcm2835_send_command()!
*** Warning - bad CRC, using default environment
In: serial
Out: serial
Err: serial
Net: No ethernet found.
WARNING at drivers/mmc/bcm2835_sdhost.c:408/bcm2835_send_command()!
WARNING at drivers/mmc/bcm2835_sdhost.c:408/bcm2835_send_command()!
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found U-Boot script /boot.scr
437 bytes read in 1 ms (426.8 KiB/s)
## Executing script at 02400000
Scanning mmc usb devices 0 1 2
24 bytes read in 2 ms (11.7 KiB/s)
Found resin image on mmc 0
Loading resinOS_uEnv.txt from mmc device 0 partition 1
** Unable to read file resinOS_uEnv.txt **
Loading bootcount.env from mmc device 0 partition 1
** Unable to read file bootcount.env **
No bootcount.env file. Setting bootcount=0 in environment
9439392 bytes read in 410 ms (22 MiB/s)
Kernel image @ 0x080000 [ 0x000000 - 0x9008a0 ]
## Flattened Device Tree blob at 2eff9300
Booting using the fdt blob at 0x2eff9300
reserving fdt memory region: addr=0 size=1000
Using Device Tree in place at 2eff9300, end 2f002f3b
Starting kernel ...
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 4.19.75 (oe-user@oe-host) (gcc version 8.3.0 (GCC)) #1 SMP Thu Jun 4 14:34:24 UTC 2020
[ 0.000000] CPU: ARMv7 Processor [410fd034] revision 4 (ARMv7), cr=10c5383d
[ 0.000000] CPU: div instructions available: patching division code
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] OF: fdt: Machine model: Raspberry Pi Compute Module 3 Plus Rev 1.0
[ 0.000000] Memory policy: Data cache writealloc
[ 0.000000] cma: Reserved 8 MiB at 0x3dc00000
[ 0.000000] random: get_random_bytes called from start_kernel+0xb0/0x4b8 with crng_init=0
[ 0.000000] percpu: Embedded 17 pages/cpu s39564 r8192 d21876 u69632
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 253242
[ 0.000000] Kernel command line: coherent_pool=1M bcm2708_fb.fbwidth=656 bcm2708_fb.fbheight=416 bcm2708_fb.fbdepth=16 bcm2708_fb.fbswap=1 smsc95xx.macaddr=B8:27:EB:2F:F0:D2 vc_mem.mem_base=0x3f000000 vc_mem.mem_size=0x3f600000 dwc_otg.lpm_enable=0 console=tty1 console=ttyAMA0,115200 rootfstype=ext4 rootwait root=UUID=ba1eadef-e193-4d7e-a9c0-2ceb396ff21e rootwait
[ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.000000] Memory: 983640K/1021952K available (8192K kernel code, 655K rwdata, 2372K rodata, 7168K init, 828K bss, 30120K reserved, 8192K cma-reserved)
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB)
[ 0.000000] fixmap : 0xffc00000 - 0xfff00000 (3072 kB)
[ 0.000000] vmalloc : 0xbe800000 - 0xff800000 (1040 MB)
[ 0.000000] lowmem : 0x80000000 - 0xbe600000 ( 998 MB)
[ 0.000000] modules : 0x7f000000 - 0x80000000 ( 16 MB)
[ 0.000000] .text : 0x(ptrval) - 0x(ptrval) (9184 kB)
[ 0.000000] .init : 0x(ptrval) - 0x(ptrval) (7168 kB)
[ 0.000000] .data : 0x(ptrval) - 0x(ptrval) ( 656 kB)
[ 0.000000] .bss : 0x(ptrval) - 0x(ptrval) ( 829 kB)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] ftrace: allocating 28894 entries in 85 pages
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[ 0.000000] arch_timer: cp15 timer(s) running at 19.20MHz (phys).
[ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
[ 0.000006] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns
[ 0.000022] Switching to timer-based delay loop, resolution 52ns
[ 0.000271] Console: colour dummy device 80x30
[ 0.000917] console [tty1] enabled
[ ...and a lot more startup messages after this... ]
Since this reset happens pretty randomly, do you have any suggestions on steps I could take to figure out what’s going on? I have run “df” to check disk usage, but we seem to have plenty of space.