Container does not restart on failure

I have a docker-compose file with a single service (FROM balenalib/%%BALENA_ARCH%%-debian-node:16-buster-build) with a restart: always flag enabled.

In latest release of my nodejs app there was was a memory leak that caused the process to crash due to JavaScript heap out of memory. Logs:

Jan 28 09:23:10 e1dba8e 688e4825900e[1030]: <--- Last few GCs --->Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]: [2994784:0x4c4e250] 257640581 ms: Mark-sweep 2014.1 (2083.8) -> 2013.3 (2084.1) MB, 728.8 / 1.1 ms  (average mu = 0.977, current mu = 0.767) allocation failure scavenge might not succeed
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]: [2994784:0x4c4e250] 257643703 ms: Mark-sweep 2014.3 (2084.1) -> 2013.5 (2084.3) MB, 726.8 / 1.0 ms  (average mu = 0.956, current mu = 0.767) allocation failure scavenge might not succeed
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]: <--- JS stacktrace --->Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  1: 0xb08e80 node::Abort() [/usr/local/bin/node]
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  2: 0xa1b70e  [/usr/local/bin/node]
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  3: 0xce1890 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  4: 0xce1c37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  5: 0xe992a5  [/usr/local/bin/node]
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  6: 0xea8f6d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  7: 0xeabc6e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  8: 0xe6d1aa v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
Jan 28 09:23:10 e1dba8e 688e4825900e[1030]:  9: 0x11e5f96 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]

However, after the crash balena did not restart the container (even though the restart: always was present). Also, I tried to issue a balena restart from the hostOS but without success. It seemed like the docker daemon became unresponsive.

I had to restart the device to get things working again.

Question: What’s the best way of tackling this kind of situations? I could potentially add a watchdog container that would periodically check the status of my container and then, in case of failure, restart the device. However, I don’t like this approach very much.

More context:
Supervisor version: 14.0.6
balenaOS version: 2.89.15
Device Ram: 16Gigs
NodeJs memory limit: default (1.4Gigs)

@AndMaz have you tried using the built-in healthcheck functionality for this kind of situation?

More info: Healthcheck implementation
Documentation: Compose specification | Docker Documentation
Example (rosetta-at-home): rosetta-at-home/docker-compose.yml at master · balena-labs-projects/rosetta-at-home · GitHub