Hi,
I’ve justing installed openBalena for testing and I’ve even joined a balenaOS device previously used on my balenaCloud subscription and deployed a new application. But now I’m experiencing some strange problems…
After some time my balena CLI stops to work and I get the following error when I try to log again:
Logging in to fleet.timelapse.in
FetchError: request to https://api.fleet.timelapse.in/login_ failed, reason: read ECONNRESET
at ClientRequest.<anonymous> (/usr/lib/node_modules/balena-cli/node_modules/node-fetch/index.js:133:11)
at ClientRequest.emit (events.js:189:13)
at ClientRequest.EventEmitter.emit (domain.js:441:20)
at TLSSocket.socketErrorListener (_http_client.js:392:9)
at TLSSocket.emit (events.js:189:13)
at TLSSocket.EventEmitter.emit (domain.js:441:20)
at emitErrorNT (internal/streams/destroy.js:82:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
at process._tickCallback (internal/process/next_tick.js:63:19)
Here are some logs from the HAProxy container:
[NOTICE] 119/171236 (9) : New worker #1 (11) forked
[WARNING] 119/171236 (11) : Server backend_api/resin_api_1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 119/171236 (11) : backend 'backend_api' has no server available!
[WARNING] 119/171240 (11) : Server backend_api/resin_api_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[ALERT] 119/173605 (9) : Current worker #1 (11) exited with code 139 (Segmentation fault)
[ALERT] 119/173605 (9) : exit-on-failure: killing every workers with SIGTERM
[WARNING] 119/173605 (9) : All workers exited. Exiting... (139)
Issuing a “compose stop” followed by a “compose start” does repair the toy, but only for some minutes.
Any suggestion for further troubleshooting?
Hey there, this might be due to a service not running as expected, and it might be related to Device failing to download images - Error processing tar file(exit status 1): unexpected EOF'. You can check the logs for each service as described in the link I posted, and we can continue from there once we get more information. Let us know what you find.
Thanks @sradevski!
I didn’t see that post while searching into the forum, but I agree on the fact that the two problems are related!
What I see is:
[root@midgard open-balena]# ./scripts/compose exec -T api journalctl | grep haproxy
May 01 09:21:34 d6b2cbe560bb kernel: haproxy[15890]: segfault at 5639c27f7000 ip 00007f0628683b0a sp 00007fff33f5a2a8 error 4 in ld-musl-x86_64.so.1[7f062864a000+45000]
May 01 09:53:56 d6b2cbe560bb kernel: haproxy[9322]: segfault at 55b3d5e55000 ip 00007f7ad5877b0a sp 00007ffe968da248 error 4 in ld-musl-x86_64.so.1[7f7ad583e000+45000]
[root@midgard open-balena]# ./scripts/compose exec -T haproxy journalctl
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"journalctl\": executable file not found in $PATH": unknown
[root@midgard open-balena]# ./scripts/compose exec -T haproxy ps -elf
PID USER TIME COMMAND
1 root 0:00 {start-haproxy} /bin/sh /start-haproxy
10 root 0:00 inotifywait -r -e create -e modify -e delete /certs
58 root 0:00 ps -elf
[root@midgard open-balena]# ./scripts/compose exec -T haproxy /bin/sh start-haproxy
Using certificate from cert-provider...
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
[NOTICE] 120/095922 (71) : New worker #1 (73) forked
That is, the haproxy seems to halt after some time. Now the service is up again:
[root@midgard open-balena]# ./scripts/compose top haproxy
openbalena_haproxy_1
UID PID PPID C STIME TTY TIME CMD
-------------------------------------------------------------------------------------------------------
root 8520 8424 0 11:21 ? 00:00:00 /bin/sh /start-haproxy
root 9312 8520 0 11:21 ? 00:00:00 inotifywait -r -e create -e modify -e delete /certs
root 18775 8424 0 11:59 ? 00:00:00 /bin/sh start-haproxy
root 18784 18775 0 11:59 ? 00:00:00 haproxy -f /usr/local/etc/haproxy/haproxy.cfg -W
root 18785 18775 0 11:59 ? 00:00:00 inotifywait -r -e create -e modify -e delete /certs
root 18790 18784 0 11:59 ? 00:00:09 haproxy -f /usr/local/etc/haproxy/haproxy.cfg -W
As you can see, I had to use a slightly different command that the suggested one, maybe because I’m running Docker Compose 1.24.0 and Docker 18.9.5 here:
Hello, can you please inspect the server logs for any hints? It looks to me like some of the backend containers aren’t running as they should. The command you need is ./scripts/compose exec -it SERVICE_NAME journalctl -fn1000. You’ll need to run this command once for each service, replacing SERVICE_NAME with api/registry/vpn/etc.
I will come back with updates as long as the haproxy fails again, since it’s up since a couple of hours right now!
Just an update on the topic: I’ve NOT reinstallad openBalena, NOR updated/touched it. HAProxy just has not died again, so my openBalena is working well right now!
Hi, same problem here… haproxy crashes randomly with segfault, maybe under high load…?
what kind of machine are you running openbalena on?
@daghemo we are glad that this is no longer an issue for you.
@Federicod could you share some details about your setup and share some logs as per the other forum thread?
https://forums.balena.io/t/device-failing-to-download-images-error-processing-tar-file-exit-status-1-unexpected-eof/4789/3