OCI runtime exec failed: exec failed: container_linux.go:348

nima · March 30, 2020, 3:35pm

Hello Team,

We use your service on our two mainstream products,NUC and Jetson TX2.

On both devices we sometimes see random restarts of the containers. Recently I randomly captured the last stdout before the restart happens:

“OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused “process_linux.go:91: executing setns process caused "exit status 21"”: unknown”

Device info:

TYPE

Intel NUC

HOST OS VERSION

balenaOS 2.41.1+rev1

development

SUPERVISOR VERSION

10.2.2

I have seen a similar post here:

Do you think this has to do something with the supervisor or is it our Docker images?

I also found a popular post talking about this issue and the suggestion was the following:

use /bin/sh instead of /bin/bash

ref:

github.com/docker/for-linux

OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "open /proc/self/fd: no such file or directory": unknown

opened 04:30AM - 02 Mar 18 UTC

yanpeipan

machine: nvidia drive px2 ``` /apollo/data/core/core_%e.%p dev-aarch64-201709…27_1111: Pulling from apolloauto/apollo a1c981565bcf: Already exists 4ecd22b74242: Already exists 8d4841161f63: Already exists a93b04770247: Already exists Digest: sha256:10fc9daa7699f30650df05b9f3aff9762a236433e089e8755dcb0f6cfb7baab0 Status: Downloaded newer image for apolloauto/apollo:dev-aarch64-20170927_1111 RTNETLINK answers: Device or resource busy modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.4.38-rt49-tegra modprobe: FATAL: Module nvidia-uvm not found in directory /lib/modules/4.4.38-rt49-tegra [WARNING] Failed to find device with pattern "ttyUSB*" ... [ OK ] Found device: /dev/ttyS0. [ OK ] Found device: /dev/ttyS3. [ OK ] Found device: /dev/ttyS2. [ OK ] Found device: /dev/ttyS1. [ OK ] Found device: /dev/can3. [ OK ] Found device: /dev/can2. [ OK ] Found device: /dev/can1. [ OK ] Found device: /dev/can0. [WARNING] Failed to find device with pattern "ram*" ... [WARNING] Failed to find device with pattern "loop*" ... [ OK ] Found device: /dev/nvidia-uvm-tools. [ OK ] Found device: /dev/nvidia-uvm. [ OK ] Found device: /dev/nvidia0. [ OK ] Found device: /dev/nvidiactl. bb0567aef16ae92397e23fb577cabf78a3821e1bb6cef9d598c7004947618eec OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "open /proc/self/fd: no such file or directory": unknown ``` docker info ``` Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 2 Server Version: 17.12.1-ce Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 9b55aab90508bd389d7654c4baf173a981477d55 runc version: 9f9c96235cc97674e935002fc3d78361b696a69e init version: 949e6fa Security Options: seccomp Profile: default Kernel Version: 4.4.38-rt49-tegra Operating System: Ubuntu 16.04 LTS OSType: linux Architecture: aarch64 CPUs: 6 Total Memory: 6.504GiB Name: nvidia ID: C4UG:ILPT:DQFX:ZUZA:R4RE:45XP:KKV3:6XMX:X4CK:D56J:C4PL:BOFZ Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false WARNING: No swap limit support WARNING: No kernel memory limit support WARNING: No cpu cfs quota support WARNING: No cpu cfs period support ```

Before we start making changes to our Dockerfiles I wanted to have your opinion on this.

Thank you

ab77 · March 30, 2020, 6:44pm

Hi there, it’s difficult to say at this point, but these sorts of issues are typically (but not always) related to resource constraints on the device and possibly hardware issues with storage latency, etc. There could also be a problem with balena components, but we’d need to try and take a look at the device to see if we can narrow it down first. Are you able to grant support access and PM me the device guid (if you don’t want to make it public)?

jpk · March 26, 2021, 6:57pm

I am encountering a similar problem:

OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "process_linux.go:93: starting setns process caused \"fork/exec /proc/self/exe: no such file or directory\"": unknown

This happens when I try to get a terminal to the device either through Balena Cloud UI or using balena cli tool (ssh).

Sometimes it seems restarting the container causes it to go away but then often it comes back.

Using balenaOS 2.58.6+rev1 on generic-x86-64
Supervisor version 11.14.0

jpk · March 26, 2021, 7:10pm

I requested internal support and enabled support access on the device.

Topic		Replies	Views
Container stuck when restarting: OCI runtime exec failed balenaEngine	0	1787	July 27, 2021
Rpi4 container fails after some time, multiple devices balenaOS raspberrypi4	9	932	August 12, 2021
Device In Unrecoverable State balenaOS support	8	384	July 19, 2022
Container stopped running Product support	3	1694	October 16, 2018
Containers won't start on Nvidia Jetson balenaOS	7	319	March 6, 2024

OCI runtime exec failed: exec failed: container_linux.go:348

Related topics