s3 is exiting with code 255 after startup with ./scripts/compose up

I really don’t know what’s happening here. It seems like the s3 container dies instantly, so that the registry container follows with code 255 (like the s3 container), so that HAProxy won’t start, because the hostnames could not be resolved (because the containers are no longer there) and the cert container can’t connect to port 80 because of HAProxy not serveing anything at all.

s3 dies → registry dies → HAProxy won’t start → nothing

Why is s3 exiting? Where can I find the right logs?

There is no problem with the firewall. Firewall is widely open to anyone. httpd container was able to server 80 without any problems.

Running a VPS on STRATO.de with 1vCore 2GB RAM and Ubuntu 18.04
Domain is a sub-domain: openbalena.linus-h.de
with A and AAAA record. Subdomains of that are api, registry, s3 and vpn. All with CNAME to openbalena.linus-h.de.

Output from $ ~/open-balena/scripts/compose up

root@h2902577:~/open-balena# ./scripts/compose up
Starting openbalena_s3_1            ... done
Starting openbalena_cert-provider_1 ... done
Starting openbalena_db_1            ... done
Starting openbalena_redis_1         ... done
Starting openbalena_api_1           ... done
Starting openbalena_registry_1      ... done
Starting openbalena_vpn_1           ... done
Starting openbalena_haproxy_1       ... done
Attaching to openbalena_s3_1, openbalena_cert-provider_1, openbalena_db_1, openbalena_redis_1, openbalena_api_1, openbalena_registry_1, openbalena_vpn_1, openbalena_haproxy_1
s3_1             | Systemd init system enabled.
cert-provider_1  | [Info] VALIDATION not set. Using default: http-01
cert-provider_1  | [Info] Waiting for api.openbalena.linus-h.de to be available via HTTP...
cert-provider_1  | [Info] (1/3) Connecting...
cert-provider_1  | [Info] (1/3) Failed. Retrying in 5 seconds...
cert-provider_1  | [Info] (2/3) Connecting...
cert-provider_1  | [Info] (2/3) Failed. Retrying in 5 seconds...
db_1             | 2020-09-01 18:00:44.497 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db_1             | 2020-09-01 18:00:44.497 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db_1             | 2020-09-01 18:00:44.497 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1             | 2020-09-01 18:00:44.533 UTC [21] LOG:  database system was shut down at 2020-09-01 17:56:47 UTC
db_1             | 2020-09-01 18:00:44.588 UTC [1] LOG:  database system is ready to accept connections
redis_1          | 1:C 01 Sep 2020 18:00:44.790 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1          | 1:C 01 Sep 2020 18:00:44.790 # Redis version=6.0.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1          | 1:C 01 Sep 2020 18:00:44.790 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1          | 1:M 01 Sep 2020 18:00:44.793 * Running mode=standalone, port=6379.
redis_1          | 1:M 01 Sep 2020 18:00:44.793 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
redis_1          | 1:M 01 Sep 2020 18:00:44.793 # Server initialized
redis_1          | 1:M 01 Sep 2020 18:00:44.797 * Loading RDB produced by version 6.0.6
redis_1          | 1:M 01 Sep 2020 18:00:44.797 * RDB age 237 seconds
redis_1          | 1:M 01 Sep 2020 18:00:44.797 * RDB memory usage when created 0.77 Mb
redis_1          | 1:M 01 Sep 2020 18:00:44.797 * DB loaded from disk: 0.004 seconds
redis_1          | 1:M 01 Sep 2020 18:00:44.797 * Ready to accept connections
api_1            | Systemd init system enabled.
registry_1       | Systemd init system enabled.
openbalena_s3_1 exited with code 255
vpn_1            | Systemd init system enabled.
haproxy_1        | Building certificate from environment variables...
openbalena_registry_1 exited with code 255
haproxy_1        | Setting up watches.  Beware: since -r was given, this may take a while!
haproxy_1        | Watches established.
haproxy_1        | [ALERT] 244/180048 (15) : parsing [/usr/local/etc/haproxy/haproxy.cfg:73] : 'server balena_registry_1' : could not resolve address 'registry'.
haproxy_1        | [ALERT] 244/180048 (15) : parsing [/usr/local/etc/haproxy/haproxy.cfg:85] : 'server balena_s3_1' : could not resolve address 's3'.
haproxy_1        | [ALERT] 244/180048 (15) : Failed to initialize server(s) addr.
cert-provider_1  | [Info] (3/3) Connecting...
cert-provider_1  | [Info] (3/3) Failed!
cert-provider_1  | [Info] Unable to access api.openbalena.linus-h.de on port 80. This is needed for certificate validation. Retrying in 30 seconds...
cert-provider_1  | [Info] Waiting for api.openbalena.linus-h.de to be available via HTTP...
cert-provider_1  | [Info] (1/3) Connecting...
cert-provider_1  | [Info] (1/3) Failed. Retrying in 5 seconds...
cert-provider_1  | [Info] (2/3) Connecting...
cert-provider_1  | [Info] (2/3) Failed. Retrying in 5 seconds...
cert-provider_1  | [Info] (3/3) Connecting...
cert-provider_1  | [Info] (3/3) Failed!
cert-provider_1  | [Info] Unable to access api.openbalena.linus-h.de on port 80. This is needed for certificate validation. Retrying in 30 seconds...

docker-compose.yml is based on
$ ~/open-balena/scripts/quickstart -U XXXXXXX@YYY.de -P 'XXXXXXXXXXXX' -d openbalena.linus-h.de -c

It’s possible the API isn’t starting. You may get more info by SSH’ing into the API container and getting logs with journalctl -fn100.

API container isn’t showing anything at all on journalctl

root@h2902577:~# docker exec -it 71432ac1b2d2 bash
root@71432ac1b2d2:/usr/src/app# journalctl -fn100
No journal files were found.
^C
root@71432ac1b2d2:/usr/src/app# journalctl
No journal files were found.
-- No entries --

First of all, thank You for the fast reply! I’ve tried running the API’s entry.sh file manually and got this error:

root@71432ac1b2d2:/usr/src/app# ./entry.sh

Running node-supervisor with
  program 'index.js'
  --watch 'src'
  --extensions 'js,node,coffee,sbvr,json,sql,pegjs,ts'
  --exec 'node'

Starting child process with 'node index.js'
Watching directory '/usr/src/app/src' for changes.
Press rs for restarting the process.

/usr/src/app/src/lib/config.ts:38
                throw new Error(`Missing environment variable: ${varName}`);
        ^
Error: Missing environment variable: API_HOST
    at Object.exports.requiredVar (/usr/src/app/src/lib/config.ts:38:9)
    at Object.<anonymous> (/usr/src/app/src/lib/config.ts:61:25)
[...]

… so after I set API_HOST to api.openbalena.linus-h.de in the docker-compose.yml I’ve got this error:

root@66c9d613aa8a:/usr/src/app# ./entry.sh

Running node-supervisor with
  program 'index.js'
  --watch 'src'
  --extensions 'js,node,coffee,sbvr,json,sql,pegjs,ts'
  --exec 'node'

Starting child process with 'node index.js'
Watching directory '/usr/src/app/src' for changes.
Press rs for restarting the process.
raven@2.6.4 alert: no DSN provided, error reporting disabled
Could not execute standard models { Error: connect ECONNREFUSED 127.0.0.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1097:14)
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 5432 }
Program node index.js exited with code 1

Don’t really know what to do with that last error I’ve gotten. (As well still no entries in journalctl )

root@66c9d613aa8a:/usr/src/app# ping api.openbalena.linus-h.de
PING api.openbalena.linus-h.de (172.18.0.7) 56(84) bytes of data.
64 bytes from open-balena_haproxy_2.openbalena_default (172.18.0.7): icmp_seq=1 ttl=64 time=0.389 ms

So ping is comping back, but from HAProxy that is, as far as I have understood this whole situation, not running, which would explain why API is not starting.
Is the core problem still that HAProxy is not starting because of s3 exiting?

Even changing the VPS provider didn’t help…
At the beginning I tried VPS by strato.de - using Virtuozzo
Now I tried VPS by onyxhosting.de - using OpenVZ
same result: s3 is exiting 2-3 seconds after startup.

Everything is working fine on my local VM. But nothing on two different VPS.

Linus

What logs are coming out of journalctl on the S3 container?

Also, just to make sure, you’re using the versions on the services defined in the latest OpenBalena repo commit? If you have changed them then this would introduce unknowns, so make sure your running “stock” OpenBalena.

I also seem to recall another user who couldn’t get it working on OpenVZ due to the way it does virtualisation. Perhaps you could try a more traditional VPS solution; I have used DigialOcean without issue in the past, for example.

1 Like

I’ve already looked into s3. It is exiting about 2 seconds after it starts, so there is no real chance of seeing anything. The one time it worked, nothing appeared in journalctl etc. Starting the container separately doesn’t make much sense either, because of all the missing configuration.

Yes, I was running on “stock” openBalena.
(… see next post)

So I’ve FINALLY got it working by now… Just ordered my third VPS this month :joy: on contabo.de - they are using KVM (as DigitalOcean does too) and it just works. Thank you very much! You should include this in the README / Getting started guide. KVM seems to be the only really great working option.

In addition, it’s much faster than OpenVZ and Virtuozzo. The worst experience really was OpenVZ. All VPS had around the same configuration.

And sorry for my not that great English… hope you could understand me without much headache :grin:

Glad to hear you got it working! I have opened up a GitHub issue on our Docs repo to see if we can get a note added that KVM-based servers work best, to help future users avoid this problem. Thanks for the report!

2 Likes

Could you please point us to the issue, so I can track this?

I’m facing the same issue on one of my VPS servers

Hi, the GitHub issue is here: https://github.com/balena-io/docs/issues/1488

this is also happening in podman from time to time .

podman-openbalena-s3.service: Main process exited, code=exited, status=255/n/a

Hi @gyeah11

could you please elaborate on what exactly you mean by “time to time”? Also can you describe how did you set up openBalena using Podman? Are you running in a VM? If so, is that KVM or something else?

Thanks

i run s3 like this

ExecStart=/usr/bin/podman run --cgroups=no-conmon --name podman-openbalena-s3 --net=host --rm -e TZ=Europe/Bucharest \
--privileged \
--env-file=/etc/redacted/podman-openbalena-s3.conf \
--tmpfs /run,/sys/fs/cgroup \
--volume /var/lib/podman/podman-openbalena/s3:/export:rw \
registry.redacted.com/external/open-balena-s3:v2.10.3 \

by time to time i mean:

if i systemctl stop podman-openbalena-s3 and i start it will fail , if i let it 5 minutes and i start again it will work .

I am running podman inside a virtual machine that is running on a KVM hypervisor.

Any idea how i can debug this ?

Hi, could you please advise how did you integrate openbalena with podman? Is this packaged in some 3rd party distribution or did you write the systemd files yourself? I do not think we have ever tested with podman so this would help us understand and possible improve the experience.