First-time app upgrades failing: "DatabaseError: Rolling back transaction"

I have a fleet of devices connected to a different openBalena instance than the devices originally registered on (background here). Most devices have upgraded to new releases (4 images in each) that were deployed to this server and are reporting state information. The remaining devices are “online” and “have internet connectivity”, and create a lot of logs like the following. Any advice for how to debug this?

INSERT INTO "image install" ("device", "installs-image", "install date", "download progress", "status", "is provided by-release")
SELECT "image install"."device", "image install"."installs-image", "image install"."install date", "image install"."download progress", "image install"."status", "image install"."is provided by-release"
FROM (
        SELECT "image install"."created at", "image install"."modified at", "image install"."device", "image install"."installs-image", "image install"."id", "image install"."install date", "image install"."download progress", "image install"."status", "image install"."is provided by-release"
        FROM (
                SELECT CAST(NULL AS TIMESTAMP) AS "created at", CAST(NULL AS TIMESTAMP) AS "modified at", CAST($1 AS INTEGER) AS "device", CAST($2 AS INTEGER) AS "installs-image", CAST(NULL AS INTEGER) AS "id", CAST($3 AS TIMESTAMP) AS "install date", CAST($4 AS INTEGER) AS "download progress", CAST($5 AS VARCHAR(255)) AS "status", CAST($6 AS INTEGER) AS "is provided by-release"
        ) AS "image install"
        WHERE EXISTS (
                SELECT 1
                FROM "device" AS "image install.device"
                WHERE "image install"."device" = "image install.device"."id"
                AND (("image install.device"."actor") IS NOT NULL AND ("image install.device"."actor") = ($7)
                OR EXISTS (
                        SELECT 1
                        FROM "device" AS "image install.device.is managed by-device"
                        WHERE "image install.device"."is managed by-device" = "image install.device.is managed by-device"."id"
                        AND ("image install.device.is managed by-device"."actor") IS NOT NULL AND ("image install.device.is managed by-device"."actor") = ($7)
                ))
        )
        AND EXISTS (
                SELECT 1
                FROM "image" AS "image install.installs-image"
                WHERE "image install"."installs-image" = "image install.installs-image"."id"
                AND EXISTS (
                        SELECT 1
                        FROM "image-is part of-release" AS "image install.installs-image.image-is part of-release"
                        WHERE "image install.installs-image"."id" = "image install.installs-image.image-is part of-release"."image"
                        AND EXISTS (
                                SELECT 1
                                FROM "release" AS "im install.installs-im.im-is par of-rel.is part of-release"
                                WHERE "image install.installs-image.image-is part of-release"."is part of-release" = "im install.installs-im.im-is par of-rel.is part of-release"."id"
                                AND EXISTS (
                                        SELECT 1
                                        FROM "application" AS "arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"
                                        WHERE "im install.installs-im.im-is par of-rel.is part of-release"."belongs to-application" = "arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."id"
                                        AND (EXISTS (
                                                SELECT 1
                                                FROM "device" AS "ppae5y$lls-im.im-is par of-rel.is par of-rel.bel to-appl.ow-dev"
                                                WHERE "arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."id" = "ppae5y$lls-im.im-is par of-rel.is par of-rel.bel to-appl.ow-dev"."belongs to-application"
                                                AND ("ppae5y$lls-im.im-is par of-rel.is par of-rel.bel to-appl.ow-dev"."actor") IS NOT NULL AND ("ppae5y$lls-im.im-is par of-rel.is par of-rel.bel to-appl.ow-dev"."actor") = ($7)
                                        )
                                        OR ("arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."is public") IS NOT NULL AND ("arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."is public") = ($8)
                                        OR ("arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."is host") IS NOT NULL AND ("arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."is host") = ($9))
                                )
                        )
                )
        )
) AS "image install" [ 61, 5, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #2 ...]
) AS "image install" [ 61, 6, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #3 ...]
) AS "image install" [ 61, 7, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #4 ...]
) AS "image install" [ 61, 8, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #5 ...]
) AS "image install" [ 61, 801, 2022-05-11T01:16:46.578Z, null, 'exited', 253, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #6 ...]
) AS "image install" [ 61, 802, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #7 ...]
) AS "image install" [ 61, 803, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #8 ...]
) AS "image install" [ 61, 804, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]


Parsing GET /resin/device?$filter=(id in (61)) and (os_version eq null)&$select=id,is_of__device_type
Parsing GET /resin/device?$filter=(id in (61)) and (supervisor_version eq null)&$select=id&$expand=is_of__device_type($select=is_of__cpu_architecture,id)
Running GET /resin/device?$filter=(id in (61)) and (os_version eq null)&$select=id,is_of__device_type
Running GET /resin/device?$filter=(id in (61)) and (supervisor_version eq null)&$select=id&$expand=is_of__device_type($select=is_of__cpu_architecture,id)
SELECT "device"."id", "device"."is of-device type" AS "is_of__device_type"
FROM (
        SELECT "device"."created at", "device"."modified at", "device"."id", "device"."actor", "device"."api heartbeat state", "device"."uuid", "device"."local id", "device"."device name", "device"."note", "device"."is of-device type", "device"."belongs to-application", "device"."is online", "device"."last connectivity event", "device"."is connected to vpn", "device"."last vpn event", "device"."is locked until-date", "device"."logs channel", "device"."public address", "device"."vpn address", "device"."ip address", "device"."mac address", "device"."memory usage", "device"."memory total", "device"."storage block device", "device"."storage usage", "device"."storage total", "device"."cpu usage", "device"."cpu temp", "device"."is undervolted", "device"."cpu id", "device"."is running-release", "device"."download progress", "device"."status", "device"."os version", "device"."os variant", "device"."supervisor version", "device"."provisioning progress", "device"."provisioning state", "device"."api port", "device"."api secret", "device"."is managed by-service instance", "device"."should be running-release", "device"."should be operated by-release", "device"."is managed by-device", "device"."should be managed by-release", 0 AS "is web accessible", CASE
                WHEN ("device"."status" IN ('Ordered', 'Preparing')
                        OR "device"."is online" = 0
                        AND "device"."status" = 'Shipped') THEN LOWER("device"."status")
                WHEN 1 = 0 THEN 'inactive'
                WHEN "device"."provisioning state" = 'Post-Provisioning' THEN 'post-provisioning'
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown')
                        AND "device"."last connectivity event" IS NULL THEN 'configuring'
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown') THEN 'offline'
                WHEN "device"."download progress" IS NOT NULL
                        AND "device"."status" = 'Downloading' THEN 'updating'
                WHEN "device"."provisioning progress" IS NOT NULL THEN 'configuring'
                WHEN EXISTS (
                                SELECT 1
                                FROM "image install"
                                WHERE "image install"."device" = "device"."id"
                                AND "image install"."download progress" IS NOT NULL
                                AND "image install"."status" = 'Downloading'
                        ) THEN 'updating'
                ELSE 'idle'
        END AS "overall status", CASE
                WHEN ("device"."status" IN ('Ordered', 'Preparing')
                        OR "device"."is online" = 0
                        AND "device"."status" = 'Shipped'
                        OR 1 = 0) THEN NULL
                WHEN "device"."provisioning state" = 'Post-Provisioning' THEN "device"."provisioning progress"
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown')
                        AND "device"."last connectivity event" IS NULL THEN "device"."provisioning progress"
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown') THEN NULL
                WHEN "device"."download progress" IS NOT NULL
                        AND "device"."status" = 'Downloading' THEN "device"."download progress"
                WHEN "device"."provisioning progress" IS NOT NULL THEN "device"."provisioning progress"
                WHEN EXISTS (
                                SELECT 1
                                FROM "image install"
                                WHERE "image install"."device" = "device"."id"
                                AND "image install"."download progress" IS NOT NULL
                                AND "image install"."status" = 'Downloading'
                        ) THEN (
                        SELECT CAST(ROUND(AVG(COALESCE("image install"."download progress", 100))) AS INTEGER)
                        FROM "image install"
                        WHERE "image install"."device" = "device"."id"
                        AND "image install"."status" != 'deleted'
                        AND ("image install"."status" = 'Downloading'
                        OR "image install"."is provided by-release" = COALESCE("device"."should be running-release", (
                                SELECT "application"."should be running-release"
                                FROM "application"
                                WHERE "device"."belongs to-application" = "application"."id"
                        )))
                )
                ELSE NULL
        END AS "overall progress"
        FROM "device"
        WHERE (("device"."actor") IS NOT NULL AND ("device"."actor") = ($1)
        OR EXISTS (
                SELECT 1
                FROM "device" AS "device.is managed by-device"
                WHERE "device"."is managed by-device" = "device.is managed by-device"."id"
                AND (("device.is managed by-device"."actor") IS NOT NULL AND ("device.is managed by-device"."actor") = ($1)
                OR EXISTS (
                        SELECT 1
                        FROM "device" AS "device.is managed by-device.is managed by-device"
                        WHERE "device.is managed by-device"."is managed by-device" = "device.is managed by-device.is managed by-device"."id"
                        AND 1 = 0
                )
                OR EXISTS (
                        SELECT 1
                        FROM "application" AS "device.is managed by-device.belongs to-application"
                        WHERE "device.is managed by-device"."belongs to-application" = "device.is managed by-device.belongs to-application"."id"
                        AND EXISTS (
                                SELECT 1
                                FROM "application" AS "dev.is managed by-dev.bel to-appl.depends on-application"
                                WHERE "device.is managed by-device.belongs to-application"."depends on-application" = "dev.is managed by-dev.bel to-appl.depends on-application"."id"
                                AND EXISTS (
                                        SELECT 1
                                        FROM "device" AS "dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"
                                        WHERE "dev.is managed by-dev.bel to-appl.depends on-application"."id" = "dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."belongs to-application"
                                        AND ("dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."actor") IS NOT NULL AND ("dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."actor") = ($1)
                                )
                        )
                ))
        )
        OR EXISTS (
                SELECT 1
                FROM "application" AS "device.belongs to-application"
                WHERE "device"."belongs to-application" = "device.belongs to-application"."id"
                AND EXISTS (
                        SELECT 1
                        FROM "application" AS "device.belongs to-application.depends on-application"
                        WHERE "device.belongs to-application"."depends on-application" = "device.belongs to-application.depends on-application"."id"
                        AND EXISTS (
                                SELECT 1
                                FROM "device" AS "dev.belongs to-application.depends on-application.owns-device"
                                WHERE "device.belongs to-application.depends on-application"."id" = "dev.belongs to-application.depends on-application.owns-device"."belongs to-application"
                                AND ("dev.belongs to-application.depends on-application.owns-device"."actor") IS NOT NULL AND ("dev.belongs to-application.depends on-application.owns-device"."actor") = ($1)
                        )
                )
        ))
) AS "device"
WHERE "device"."id" IN ($2)
AND "device"."os version" IS NULL [ 122, 61 ]


SELECT (
        SELECT coalesce(array_to_json(array_agg("device.is of-device type".*)), '[]') AS "is_of__device_type"
        FROM (
                SELECT "device.is of-device type"."is of-cpu architecture" AS "is_of__cpu_architecture", "device.is of-device type"."id"
                FROM (
                        SELECT "device type"."created at", "device type"."modified at", "device type"."id", "device type"."slug", "device type"."name", "device type"."is of-cpu architecture", "device type"."logo", "device type"."contract", "device type"."belongs to-device family"
                        FROM "device type"
                ) AS "device.is of-device type"
                WHERE "device"."is of-device type" = "device.is of-device type"."id"
        ) AS "device.is of-device type"
) AS "is_of__device_type", "device"."id"
FROM (
        SELECT "device"."created at", "device"."modified at", "device"."id", "device"."actor", "device"."api heartbeat state", "device"."uuid", "device"."local id", "device"."device name", "device"."note", "device"."is of-device type", "device"."belongs to-application", "device"."is online", "device"."last connectivity event", "device"."is connected to vpn", "device"."last vpn event", "device"."is locked until-date", "device"."logs channel", "device"."public address", "device"."vpn address", "device"."ip address", "device"."mac address", "device"."memory usage", "device"."memory total", "device"."storage block device", "device"."storage usage", "device"."storage total", "device"."cpu usage", "device"."cpu temp", "device"."is undervolted", "device"."cpu id", "device"."is running-release", "device"."download progress", "device"."status", "device"."os version", "device"."os variant", "device"."supervisor version", "device"."provisioning progress", "device"."provisioning state", "device"."api port", "device"."api secret", "device"."is managed by-service instance", "device"."should be running-release", "device"."should be operated by-release", "device"."is managed by-device", "device"."should be managed by-release", 0 AS "is web accessible", CASE
                WHEN ("device"."status" IN ('Ordered', 'Preparing')
                        OR "device"."is online" = 0
                        AND "device"."status" = 'Shipped') THEN LOWER("device"."status")
                WHEN 1 = 0 THEN 'inactive'
                WHEN "device"."provisioning state" = 'Post-Provisioning' THEN 'post-provisioning'
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown')
                        AND "device"."last connectivity event" IS NULL THEN 'configuring'
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown') THEN 'offline'
                WHEN "device"."download progress" IS NOT NULL
                        AND "device"."status" = 'Downloading' THEN 'updating'
                WHEN "device"."provisioning progress" IS NOT NULL THEN 'configuring'
                WHEN EXISTS (
                                SELECT 1
                                FROM "image install"
                                WHERE "image install"."device" = "device"."id"
                                AND "image install"."download progress" IS NOT NULL
                                AND "image install"."status" = 'Downloading'
                        ) THEN 'updating'
                ELSE 'idle'
        END AS "overall status", CASE
                WHEN ("device"."status" IN ('Ordered', 'Preparing')
                        OR "device"."is online" = 0
                        AND "device"."status" = 'Shipped'
                        OR 1 = 0) THEN NULL
                WHEN "device"."provisioning state" = 'Post-Provisioning' THEN "device"."provisioning progress"
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown')
                        AND "device"."last connectivity event" IS NULL THEN "device"."provisioning progress"
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown') THEN NULL
                WHEN "device"."download progress" IS NOT NULL
                        AND "device"."status" = 'Downloading' THEN "device"."download progress"
                WHEN "device"."provisioning progress" IS NOT NULL THEN "device"."provisioning progress"
                WHEN EXISTS (
                                SELECT 1
                                FROM "image install"
                                WHERE "image install"."device" = "device"."id"
                                AND "image install"."download progress" IS NOT NULL
                                AND "image install"."status" = 'Downloading'
                        ) THEN (
                        SELECT CAST(ROUND(AVG(COALESCE("image install"."download progress", 100))) AS INTEGER)
                        FROM "image install"
                        WHERE "image install"."device" = "device"."id"
                        AND "image install"."status" != 'deleted'
                        AND ("image install"."status" = 'Downloading'
                        OR "image install"."is provided by-release" = COALESCE("device"."should be running-release", (
                                SELECT "application"."should be running-release"
                                FROM "application"
                                WHERE "device"."belongs to-application" = "application"."id"
                        )))
                )
                ELSE NULL
        END AS "overall progress"
        FROM "device"
        WHERE (("device"."actor") IS NOT NULL AND ("device"."actor") = ($1)
        OR EXISTS (
                SELECT 1
                FROM "device" AS "device.is managed by-device"
                WHERE "device"."is managed by-device" = "device.is managed by-device"."id"
                AND (("device.is managed by-device"."actor") IS NOT NULL AND ("device.is managed by-device"."actor") = ($1)
                OR EXISTS (
                        SELECT 1
                        FROM "device" AS "device.is managed by-device.is managed by-device"
                        WHERE "device.is managed by-device"."is managed by-device" = "device.is managed by-device.is managed by-device"."id"
                        AND 1 = 0
                )
                OR EXISTS (
                        SELECT 1
                        FROM "application" AS "device.is managed by-device.belongs to-application"
                        WHERE "device.is managed by-device"."belongs to-application" = "device.is managed by-device.belongs to-application"."id"
                        AND EXISTS (
                                SELECT 1
                                FROM "application" AS "dev.is managed by-dev.bel to-appl.depends on-application"
                                WHERE "device.is managed by-device.belongs to-application"."depends on-application" = "dev.is managed by-dev.bel to-appl.depends on-application"."id"
                                AND EXISTS (
                                        SELECT 1
                                        FROM "device" AS "dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"
                                        WHERE "dev.is managed by-dev.bel to-appl.depends on-application"."id" = "dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."belongs to-application"
                                        AND ("dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."actor") IS NOT NULL AND ("dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."actor") = ($1)
                                )
                        )
                ))
        )
        OR EXISTS (
                SELECT 1
                FROM "application" AS "device.belongs to-application"
                WHERE "device"."belongs to-application" = "device.belongs to-application"."id"
                AND EXISTS (
                        SELECT 1
                        FROM "application" AS "device.belongs to-application.depends on-application"
                        WHERE "device.belongs to-application"."depends on-application" = "device.belongs to-application.depends on-application"."id"
                        AND EXISTS (
                                SELECT 1
                                FROM "device" AS "dev.belongs to-application.depends on-application.owns-device"
                                WHERE "device.belongs to-application.depends on-application"."id" = "dev.belongs to-application.depends on-application.owns-device"."belongs to-application"
                                AND ("dev.belongs to-application.depends on-application.owns-device"."actor") IS NOT NULL AND ("dev.belongs to-application.depends on-application.owns-device"."actor") = ($1)
                        )
                )
        ))
) AS "device"
WHERE "device"."id" IN ($2)
AND "device"."supervisor version" IS NULL [ 122, 61 ]


Insert ID:  image_install 5764704
Insert ID:  image_install 5764705
Insert ID:  image_install 5764706
Insert ID:  image_install 5764707
Insert ID:  image_install 5764708
DatabaseError: Rolling back transaction
    at PostgresTx._rollback (/usr/src/app/node_modules/@_balena/pinejs/src/database-layer/db.ts:587:19)
    at PostgresTx.rollback (/usr/src/app/node_modules/@_balena/pinejs/src/database-layer/db.ts:341:25)
    at Object.transaction (/usr/src/app/node_modules/@_balena/pinejs/src/database-layer/db.ts:433:15)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async statePatchV2 (/usr/src/app/src/features/device-state/routes/state-patch-v2.ts:384:4)
[followed by 3 repetitions of the same DatabaseError stack message]

EDIT: followed immediately by a 401 error, even after setting the api secret column:

2022-05-11T20:12:41.844Z 207.81.194.15 a/85 PATCH /device/v2/964534cf6773f131fa75a370fdc99495/state 401 29.531ms -
1 Like

The devices with issues are also the only ones without values in the api secret column. I did not need to enter the secret for the others, so not sure if filling this hash in manually makes any difference.
Edit: possibly helped with a few, mostly not.

After burrowing into @balena/pinejs, the rollback originates in in src/database-layer/db.ts, in createTransaction(), where the supplied createFunc handle is throwing:

UnauthorizedError
    at convertToHttpError (/usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:1198:10)
    at /usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:1082:14
    at Array.map (<anonymous>)
    at /usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:1079:30
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async runURI (/usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:856:21)
    at async PinejsClient._request (/usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:792:11)
    at async PinejsClient.callWithRetry (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:884:11)
    at async upsertImageInstall (/usr/src/app/src/features/device-state/state-patch-utils.ts:141:3)
    at async /usr/src/app/src/features/device-state/routes/state-patch-v2.ts:310:8
    at async Promise.all (index 5)
    at async /usr/src/app/src/features/device-state/routes/state-patch-v2.ts:308:6
    at async Promise.all (index 1)
    at async /usr/src/app/src/features/device-state/routes/state-patch-v2.ts:389:5
    at async Object.transaction (/usr/src/app/node_modules/balena/pinejs/src/database-layer/db.ts:428:20)
    at async statePatchV2 (/usr/src/app/src/features/device-state/routes/state-patch-v2.ts:384:4) {

which leads back to upsertImageInstall(). I’ll keep looking there. It would be great if PineJS was more transparent in relaying the cause of the rollback.

I tracked the UnauthorizedError back to a PermissionError thrown in the PineJS src/sbvr-api/sbvr-utils.ts. Still tracking back from there. If necessary, blocking this (and overriding the lockfiles) allows devices to upgrade successfully.

Hello,
thanks for sharing so much information also in your other debugging thread. Our community much appreciates it!

I’d like to understand the situation better:

  • You have created a new openBalena server (what version and what was the old version?)
  • You are using the same domain on the device
  • I wasn’t able to understand from the details of moving the devices to the new server instance: Did you create new config.json files from the new instance and copied them to the devices?
  • Did you only change database entries to enable the old devices or did you change something on the devices, like config.json, certificates?
  • Can you give more details about the differences of the working and not working devices? Are they running the same balenaOS and supervisor version?
  • Have the device been pinned to different releases before?

My current guess is that there is a missing relationship between the the device ID, the device actor ID, the release and the images. I can see that the devices ID is 61 and the actor for this failing SQL requests is 122. Can you please double check the database entries for table device and table actor if both are existing and are linked properly?

I can also see the data that should be updated into the database:


) AS "image install" [ 61, 5, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
) AS "image install" [ 61, 6, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
) AS "image install" [ 61, 7, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
) AS "image install" [ 61, 8, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
) AS "image install" [ 61, 801, 2022-05-11T01:16:46.578Z, null, 'exited', 253, 122, 1, 1 ]
) AS "image install" [ 61, 802, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]
) AS "image install" [ 61, 803, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]
) AS "image install" [ 61, 804, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]

whereas 61 is the device id and the second column is the image id and the 122 is the actor id and the releaseIDs are 2 and 253. Please make sure in the database that these combinations are valid. In terms that the device 61 as actor 122 can link to the images and the releases.

You could also try to run this command from the failing device hostOS terminal to check the connectivity to the endpoint and that it can identify itself as a proper device and fetches its own data:

curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/device

This following command run from the failing device hostOS terminal will retrieve the device target state. Please compare that state with the release and imageID that are on the balenaInstance:

curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/device/v2/$(cat /mnt/boot/config.json | jq -r .uuid)/state | jq

To proceed with this, I suggest that you create new config.json files from the new openBalena instance and in the case you have remote access to the device let the device ‘re-provision’ to the new instance. An ‘easy’ way to do this, is to change the balena-api endpoint in the config.json and restart the supervisor on the device, this will detect that the ai endpoint has changed and will delete existing data from the device (images, volumes, …) and tries to re-provision. Then you change it back to the ‘real’ endpoint and restart the supervisor again.

Do you have physical access to the devices and could perform a balena join according to this documentation balena CLI Documentation - Balena Documentation ?

Best Regards
Harald

1 Like

Thank you Harald! Here are initial answers and I will match up the IDs next.

  • Same server, after following an ill-advised suggestion to delete some volumes and re-run the setup script, then finding out that the backups were useless. Took the opportunity to upgrade v3.4.0 → v3.6.0 and improve the backup system.
  • Same domain.
  • Same server instance and using a copy of the original config.json.
  • Database entries were recreated from scratch with open-balena-api, which took care of most associations. No changes on the devices, which are mostly deployed.
  • All devices are one of two images with balenaOS 2.47.0+rev1-dev or balenaOS 2.77.0+rev1-dev and there is no correlation between image and connectivity. As you guess, any difference is likely in the database.
  • All devices were previously pinned to a now-lost release of the application. All are now pinned to a new release on the application with the same name.

Difficult physical access, and this began with a server-side problem, so I am focusing on server-side changes first.

Thanks, device/actor IDs definitely match. A few devices’ updates were stuck due to missingservice install records, but after filling those and restarting the open-balena-api container again, even the log streaming is authorized, and it has been possible to upgrade everything as long as the rollback is blocked. The settings are close to perfect, and I am hoping to figure the rollbacks out at the database level, if I can find which query handles authorization and log it.

404 error (“Cannot GET device”) on the device I tried, but at least it reached the server.

Gets JSON output. This fits with what I’ve seen, that the devices are trying to upgrade and only failing because of server-side errors.