ECONNRESET on the API URL every few minutes

Hi,
I’ve justing installed openBalena for testing and I’ve even joined a balenaOS device previously used on my balenaCloud subscription and deployed a new application. But now I’m experiencing some strange problems…

After some time my balena CLI stops to work and I get the following error when I try to log again:

Logging in to fleet.timelapse.in
FetchError: request to https://api.fleet.timelapse.in/login_ failed, reason: read ECONNRESET
    at ClientRequest.<anonymous> (/usr/lib/node_modules/balena-cli/node_modules/node-fetch/index.js:133:11)
    at ClientRequest.emit (events.js:189:13)
    at ClientRequest.EventEmitter.emit (domain.js:441:20)
    at TLSSocket.socketErrorListener (_http_client.js:392:9)
    at TLSSocket.emit (events.js:189:13)
    at TLSSocket.EventEmitter.emit (domain.js:441:20)
    at emitErrorNT (internal/streams/destroy.js:82:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
    at process._tickCallback (internal/process/next_tick.js:63:19)

Here are some logs from the HAProxy container:

[NOTICE] 119/171236 (9) : New worker #1 (11) forked
[WARNING] 119/171236 (11) : Server backend_api/resin_api_1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 119/171236 (11) : backend 'backend_api' has no server available!
[WARNING] 119/171240 (11) : Server backend_api/resin_api_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[ALERT] 119/173605 (9) : Current worker #1 (11) exited with code 139 (Segmentation fault)
[ALERT] 119/173605 (9) : exit-on-failure: killing every workers with SIGTERM
[WARNING] 119/173605 (9) : All workers exited. Exiting... (139)

Issuing a “compose stop” followed by a “compose start” does repair the toy, but only for some minutes.

Any suggestion for further troubleshooting?

Hey there, this might be due to a service not running as expected, and it might be related to Device failing to download images - Error processing tar file(exit status 1): unexpected EOF'. You can check the logs for each service as described in the link I posted, and we can continue from there once we get more information. Let us know what you find.

Thanks @sradevski!

I didn’t see that post while searching into the forum, but I agree on the fact that the two problems are related!
What I see is:

[root@midgard open-balena]# ./scripts/compose exec -T api journalctl | grep haproxy
May 01 09:21:34 d6b2cbe560bb kernel: haproxy[15890]: segfault at 5639c27f7000 ip 00007f0628683b0a sp 00007fff33f5a2a8 error 4 in ld-musl-x86_64.so.1[7f062864a000+45000]
May 01 09:53:56 d6b2cbe560bb kernel: haproxy[9322]: segfault at 55b3d5e55000 ip 00007f7ad5877b0a sp 00007ffe968da248 error 4 in ld-musl-x86_64.so.1[7f7ad583e000+45000]
[root@midgard open-balena]# ./scripts/compose exec -T haproxy journalctl
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"journalctl\": executable file not found in $PATH": unknown
[root@midgard open-balena]# ./scripts/compose exec -T haproxy ps -elf
PID   USER     TIME  COMMAND
    1 root      0:00 {start-haproxy} /bin/sh /start-haproxy
   10 root      0:00 inotifywait -r -e create -e modify -e delete /certs
   58 root      0:00 ps -elf
[root@midgard open-balena]# ./scripts/compose exec -T haproxy /bin/sh start-haproxy
Using certificate from cert-provider...
Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.
[NOTICE] 120/095922 (71) : New worker #1 (73) forked

That is, the haproxy seems to halt after some time. Now the service is up again:

[root@midgard open-balena]# ./scripts/compose top haproxy
openbalena_haproxy_1
UID     PID    PPID    C   STIME   TTY     TIME                             CMD
-------------------------------------------------------------------------------------------------------
root   8520    8424    0   11:21   ?     00:00:00   /bin/sh /start-haproxy
root   9312    8520    0   11:21   ?     00:00:00   inotifywait -r -e create -e modify -e delete /certs
root   18775   8424    0   11:59   ?     00:00:00   /bin/sh start-haproxy
root   18784   18775   0   11:59   ?     00:00:00   haproxy -f /usr/local/etc/haproxy/haproxy.cfg -W
root   18785   18775   0   11:59   ?     00:00:00   inotifywait -r -e create -e modify -e delete /certs
root   18790   18784   0   11:59   ?     00:00:09   haproxy -f /usr/local/etc/haproxy/haproxy.cfg -W

As you can see, I had to use a slightly different command that the suggested one, maybe because I’m running Docker Compose 1.24.0 and Docker 18.9.5 here:

Hello, can you please inspect the server logs for any hints? It looks to me like some of the backend containers aren’t running as they should. The command you need is ./scripts/compose exec -it SERVICE_NAME journalctl -fn1000. You’ll need to run this command once for each service, replacing SERVICE_NAME with api/registry/vpn/etc.

I will come back with updates as long as the haproxy fails again, since it’s up since a couple of hours right now!

Just an update on the topic: I’ve NOT reinstallad openBalena, NOR updated/touched it. HAProxy just has not died again, so my openBalena is working well right now! :slight_smile:

Hi, same problem here… haproxy crashes randomly with segfault, maybe under high load…?
what kind of machine are you running openbalena on?

@daghemo we are glad that this is no longer an issue for you.
@Federicod could you share some details about your setup and share some logs as per the other forum thread?
https://forums.balena.io/t/device-failing-to-download-images-error-processing-tar-file-exit-status-1-unexpected-eof/4789/3