"could not receive data from client: Connection reset by peer" and other DB errors

Hey.
I have an OpenBalena instance installed on a DigitalOcean droplet (1 CPU & 1GB RAM)
All working fine for a while. On the 7th of September, my RPi’s LED started to blink, indicating a loss of connection to the server.

After investigation, I cannot use the CLI, I get a 500 Service Unavailable.

Here is the logs of the database from the last reboot back in mid August, retrieved using ./scripts/compose logs (no errors in the other containers, so I didn’t copy the whole thing).

db_1             | 2019-08-26 14:47:48.496 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db_1             | 2019-08-26 14:47:48.496 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db_1             | 2019-08-26 14:47:48.501 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1             | 2019-08-26 14:47:48.533 UTC [21] LOG:  database system was shut down at 2019-08-26 14:26:17 UTC
db_1             | 2019-08-26 14:47:48.545 UTC [1] LOG:  database system is ready to accept connections
db_1             | 2019-08-26 14:48:08.325 UTC [28] ERROR:  relation "uniq_model_model_type_vocab" already exists
db_1             | 2019-08-26 14:48:08.325 UTC [28] STATEMENT:  CREATE UNIQUE INDEX "uniq_model_model_type_vocab" ON "model" ("is of-vocabulary", "model type");
db_1             | 2019-08-26 15:12:23.535 UTC [54] ERROR:  duplicate key value violates unique constraint "service_application_service name_key"
db_1             | 2019-08-26 15:12:23.535 UTC [54] DETAIL:  Key (application, "service name")=(1, main) already exists.
db_1             | 2019-08-26 15:12:23.535 UTC [54] STATEMENT:  INSERT INTO "service" ("application", "service name")
db_1             | 	VALUES ($1, $2) RETURNING "id";
db_1             | 2019-08-26 19:16:32.708 UTC [330] ERROR:  duplicate key value violates unique constraint "service_application_service name_key"
db_1             | 2019-08-26 19:16:32.708 UTC [330] DETAIL:  Key (application, "service name")=(1, main) already exists.
db_1             | 2019-08-26 19:16:32.708 UTC [330] STATEMENT:  INSERT INTO "service" ("application", "service name")
db_1             | 	VALUES ($1, $2) RETURNING "id";
db_1             | 2019-08-27 15:39:27.620 UTC [1791] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-27 15:39:27.640 UTC [1791] LOG:  incomplete startup packet
db_1             | 2019-08-27 15:39:27.820 UTC [1794] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-27 15:39:27.821 UTC [1794] LOG:  incomplete startup packet
db_1             | 2019-08-27 15:39:27.824 UTC [1793] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-27 15:39:27.824 UTC [1793] LOG:  incomplete startup packet
db_1             | 2019-08-28 17:19:32.961 UTC [3597] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-28 17:19:42.310 UTC [3598] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-28 17:19:43.213 UTC [3597] LOG:  incomplete startup packet
db_1             | 2019-08-28 17:19:58.571 UTC [3598] LOG:  incomplete startup packet
db_1             | 2019-08-28 17:22:19.573 UTC [3600] ERROR:  duplicate key value violates unique constraint "service_application_service name_key"
db_1             | 2019-08-28 17:22:19.573 UTC [3600] DETAIL:  Key (application, "service name")=(1, main) already exists.
db_1             | 2019-08-28 17:22:19.573 UTC [3600] STATEMENT:  INSERT INTO "service" ("application", "service name")
db_1             | 	VALUES ($1, $2) RETURNING "id";
db_1             | 2019-08-29 16:39:48.082 UTC [5222] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-29 16:39:51.939 UTC [5222] LOG:  incomplete startup packet
db_1             | 2019-08-29 16:40:52.855 UTC [1] LOG:  could not fork new process for connection: Cannot allocate memory
db_1             | 2019-08-29 16:41:27.691 UTC [5224] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-29 16:41:49.363 UTC [5224] LOG:  incomplete startup packet
db_1             | 2019-08-29 16:42:17.903 UTC [5225] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-29 16:42:17.943 UTC [5225] LOG:  incomplete startup packet
db_1             | 2019-08-30 14:47:32.511 UTC [6764] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-30 14:48:39.390 UTC [6765] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-30 14:48:46.802 UTC [6765] LOG:  incomplete startup packet
db_1             | 2019-08-30 14:48:46.796 UTC [1] LOG:  could not fork new process for connection: Cannot allocate memory
db_1             | 2019-08-30 14:48:46.875 UTC [6767] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-30 14:48:46.878 UTC [6767] LOG:  incomplete startup packet
db_1             | 2019-08-31 11:19:18.927 UTC [8203] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-31 11:19:31.635 UTC [8203] LOG:  incomplete startup packet
db_1             | 2019-08-31 11:21:33.084 UTC [8206] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-31 11:21:33.087 UTC [8206] LOG:  incomplete startup packet
db_1             | 2019-08-31 11:21:33.212 UTC [8207] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-31 11:21:33.212 UTC [8207] LOG:  incomplete startup packet
db_1             | 2019-08-31 11:21:33.255 UTC [8208] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-08-31 11:21:33.255 UTC [8208] LOG:  incomplete startup packet
db_1             | 2019-09-01 06:30:55.579 UTC [25] WARNING:  worker took too long to start; canceled
db_1             | 2019-09-01 06:30:55.834 UTC [9543] WARNING:  autovacuum worker started without a worker entry
db_1             | 2019-09-02 00:19:04.617 UTC [10789] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-09-02 00:19:12.338 UTC [10789] LOG:  incomplete startup packet
db_1             | 2019-09-02 00:19:18.373 UTC [10791] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-09-02 00:19:18.390 UTC [10791] LOG:  incomplete startup packet
db_1             | 2019-09-02 00:19:18.254 UTC [10790] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-09-02 00:19:18.790 UTC [10790] LOG:  incomplete startup packet
db_1             | 2019-09-02 00:19:59.874 UTC [10793] LOG:  could not receive data from client: Connection reset by peer

And it kept going with that warning until the 7th of September, then stopped. Nothing else until a manual restart today.

db_1             | 2019-09-07 15:06:06.663 UTC [20075] LOG:  incomplete startup packet
db_1             | 2019-09-07 15:06:06.666 UTC [20061] LOG:  incomplete startup packet
db_1             | 2019-09-07 15:06:06.667 UTC [20070] LOG:  could not receive data from client: Connection reset by peer
db_1             | 2019-09-07 15:06:06.670 UTC [20070] LOG:  incomplete startup packet
db_1             | 2019-09-08 16:47:36.119 UTC [1] LOG:  received smart shutdown request

At around the same time, I can see the regular “ping” traffic to the server stopping.

The same errors have popped up for the database after the restart.
As for the CLI, I now get network timeout at: https://api.mybalena.local/login_

I see could not fork new process for connection: Cannot allocate memory. Maybe I don’t have enough RAM on the server ? I can try adding more.

Thanks in advance for the help. I can provide any other needed info.
Tim

After some more testing, here is what happened:

I got it back to working.

First I tried double the RAM (2GB) on my DigitalOcean droplet. That didn’t change anything.

At that point, I was getting a network timeout when trying to reach any of the routes,.
I then discovered a few warnings in the redis container, that you can see in this thread

I removed them by adding two lines in my startup script for the host system:

#!/bin/sh -x

# for redis
echo never > /sys/kernel/mm/transparent_hugepage/enabled
sysctl vm.overcommit_memory=1

cd /root/open-balena
./scripts/compose up -d

Other than that, I didn’t do anything. And now, after restarting it a few times, it works fine. My devices are online again.

HOWEVER ! the errors and warning in the db container that I mentioned in the first post are still there.
I assume they are non-critical, but I would still like to know if there is a way to fix them, or if it is a known bug.

Thanks in advance, and I hope this can help someone else.

Good move to raise RAM to 2GB, it should work with less but as always the more the better. To investigate, use ./scripts/compose exec -it <service> journalctl -a -fn 100. Plain compose logs doesn’t show much as service logs are routed to journald inside the containers so Docker doesn’t have access to them. The API logs are typically the most interesting.

Since it doubles my price per month to get 2GB or RAM, I went back to 1GB, but I will definitively upgrade again when I get a few more devices connected, to be sure.

I can’t get any logs for the DB using the journalctl command. And nothing interesting in the API logs either.
So should I worry about ERROR: duplicate key value violates unique constraint "service_application_service name_key" or not ?

Thanks in advance

image
The memory usage kept increasing, so I decided to be safe and get 2GB right away. Gotta keep an eye on this.

Hey @winkelmann

I am curious as to what operation you performed last before your install went bad; the key violation seems to be hinting that some data in the database is preventing an index creation, which would only happen during a migration step… did you attempt to upgrade the API in some way prior? If this is infact the case, then the API will not start until these steps are performed and since they cannot be performed until the data is corrected, you may have to open a connection to the DB container and use a PGSQL client to correct the data…

I tested out using DigitalOcean’s $10 a month droplet, running Ubuntu 18.04 and installing Docker just a week ago, and it seemed fine so I don’t think there is any issues in the hypervisor implementation etc.

If this is infact the case, then the API will not start until these steps are performed

That’s wierd because while the error is there, everything is working fine.

I am curious as to what operation you performed last before your install went bad

I didn’t touch the server for the whole month after everything was last installed and working.
I don’t think I even pushed updates to an App.
It literally just “stopped working” on that day.

Well, it’s note like I would know what to do to fix it manually, so I’ll just leave it be for now.