nima
March 30, 2020, 3:35pm
1
Hello Team,
We use your service on our two mainstream products,NUC and Jetson TX2.
On both devices we sometimes see random restarts of the containers. Recently I randomly captured the last stdout before the restart happens:
“OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused “process_linux.go:91: executing setns process caused "exit status 21"”: unknown”
Device info:
TYPE
Intel NUC
HOST OS VERSION
balenaOS 2.41.1+rev1
development
SUPERVISOR VERSION
10.2.2
I have seen a similar post here:
Hi,
Thank you so much for your prompt response. We have figured out the problem and realized that it was due to bug in our supervisor part of the code. We fixed the bug and tested it. It is working good now. I have revoked the support access and we can close this issue.
Thank you so much for your time.
Best,
Do you think this has to do something with the supervisor or is it our Docker images?
I also found a popular post talking about this issue and the suggestion was the following:
use /bin/sh instead of /bin/bash
ref:
opened 04:30AM - 02 Mar 18 UTC
machine: nvidia drive px2
```
/apollo/data/core/core_%e.%p
dev-aarch64-201709… 27_1111: Pulling from apolloauto/apollo
a1c981565bcf: Already exists
4ecd22b74242: Already exists
8d4841161f63: Already exists
a93b04770247: Already exists
Digest: sha256:10fc9daa7699f30650df05b9f3aff9762a236433e089e8755dcb0f6cfb7baab0
Status: Downloaded newer image for apolloauto/apollo:dev-aarch64-20170927_1111
RTNETLINK answers: Device or resource busy
modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.4.38-rt49-tegra
modprobe: FATAL: Module nvidia-uvm not found in directory /lib/modules/4.4.38-rt49-tegra
[WARNING] Failed to find device with pattern "ttyUSB*" ...
[ OK ] Found device: /dev/ttyS0.
[ OK ] Found device: /dev/ttyS3.
[ OK ] Found device: /dev/ttyS2.
[ OK ] Found device: /dev/ttyS1.
[ OK ] Found device: /dev/can3.
[ OK ] Found device: /dev/can2.
[ OK ] Found device: /dev/can1.
[ OK ] Found device: /dev/can0.
[WARNING] Failed to find device with pattern "ram*" ...
[WARNING] Failed to find device with pattern "loop*" ...
[ OK ] Found device: /dev/nvidia-uvm-tools.
[ OK ] Found device: /dev/nvidia-uvm.
[ OK ] Found device: /dev/nvidia0.
[ OK ] Found device: /dev/nvidiactl.
bb0567aef16ae92397e23fb577cabf78a3821e1bb6cef9d598c7004947618eec
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "open /proc/self/fd: no such file or directory": unknown
```
docker info
```
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 2
Server Version: 17.12.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.4.38-rt49-tegra
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: aarch64
CPUs: 6
Total Memory: 6.504GiB
Name: nvidia
ID: C4UG:ILPT:DQFX:ZUZA:R4RE:45XP:KKV3:6XMX:X4CK:D56J:C4PL:BOFZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
```
Before we start making changes to our Dockerfiles I wanted to have your opinion on this.
Thank you
ab77
March 30, 2020, 6:44pm
2
Hi there, it’s difficult to say at this point, but these sorts of issues are typically (but not always) related to resource constraints on the device and possibly hardware issues with storage latency, etc. There could also be a problem with balena components, but we’d need to try and take a look at the device to see if we can narrow it down first. Are you able to grant support access and PM me the device guid (if you don’t want to make it public)?
jpk
March 26, 2021, 6:57pm
6
I am encountering a similar problem:
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "process_linux.go:93: starting setns process caused \"fork/exec /proc/self/exe: no such file or directory\"": unknown
This happens when I try to get a terminal to the device either through Balena Cloud UI or using balena cli tool (ssh).
Sometimes it seems restarting the container causes it to go away but then often it comes back.
Using balenaOS 2.58.6+rev1 on generic-x86-64
Supervisor version 11.14.0
jpk
March 26, 2021, 7:10pm
7
I requested internal support and enabled support access on the device.