First-time app upgrades failing: "DatabaseError: Rolling back transaction"

I have a fleet of devices connected to a different openBalena instance than the devices originally registered on (background here). Most devices have upgraded to new releases (4 images in each) that were deployed to this server and are reporting state information. The remaining devices are “online” and “have internet connectivity”, and create a lot of logs like the following. Any advice for how to debug this?

INSERT INTO "image install" ("device", "installs-image", "install date", "download progress", "status", "is provided by-release")
SELECT "image install"."device", "image install"."installs-image", "image install"."install date", "image install"."download progress", "image install"."status", "image install"."is provided by-release"
FROM (
        SELECT "image install"."created at", "image install"."modified at", "image install"."device", "image install"."installs-image", "image install"."id", "image install"."install date", "image install"."download progress", "image install"."status", "image install"."is provided by-release"
        FROM (
                SELECT CAST(NULL AS TIMESTAMP) AS "created at", CAST(NULL AS TIMESTAMP) AS "modified at", CAST($1 AS INTEGER) AS "device", CAST($2 AS INTEGER) AS "installs-image", CAST(NULL AS INTEGER) AS "id", CAST($3 AS TIMESTAMP) AS "install date", CAST($4 AS INTEGER) AS "download progress", CAST($5 AS VARCHAR(255)) AS "status", CAST($6 AS INTEGER) AS "is provided by-release"
        ) AS "image install"
        WHERE EXISTS (
                SELECT 1
                FROM "device" AS "image install.device"
                WHERE "image install"."device" = "image install.device"."id"
                AND (("image install.device"."actor") IS NOT NULL AND ("image install.device"."actor") = ($7)
                OR EXISTS (
                        SELECT 1
                        FROM "device" AS "image install.device.is managed by-device"
                        WHERE "image install.device"."is managed by-device" = "image install.device.is managed by-device"."id"
                        AND ("image install.device.is managed by-device"."actor") IS NOT NULL AND ("image install.device.is managed by-device"."actor") = ($7)
                ))
        )
        AND EXISTS (
                SELECT 1
                FROM "image" AS "image install.installs-image"
                WHERE "image install"."installs-image" = "image install.installs-image"."id"
                AND EXISTS (
                        SELECT 1
                        FROM "image-is part of-release" AS "image install.installs-image.image-is part of-release"
                        WHERE "image install.installs-image"."id" = "image install.installs-image.image-is part of-release"."image"
                        AND EXISTS (
                                SELECT 1
                                FROM "release" AS "im install.installs-im.im-is par of-rel.is part of-release"
                                WHERE "image install.installs-image.image-is part of-release"."is part of-release" = "im install.installs-im.im-is par of-rel.is part of-release"."id"
                                AND EXISTS (
                                        SELECT 1
                                        FROM "application" AS "arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"
                                        WHERE "im install.installs-im.im-is par of-rel.is part of-release"."belongs to-application" = "arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."id"
                                        AND (EXISTS (
                                                SELECT 1
                                                FROM "device" AS "ppae5y$lls-im.im-is par of-rel.is par of-rel.bel to-appl.ow-dev"
                                                WHERE "arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."id" = "ppae5y$lls-im.im-is par of-rel.is par of-rel.bel to-appl.ow-dev"."belongs to-application"
                                                AND ("ppae5y$lls-im.im-is par of-rel.is par of-rel.bel to-appl.ow-dev"."actor") IS NOT NULL AND ("ppae5y$lls-im.im-is par of-rel.is par of-rel.bel to-appl.ow-dev"."actor") = ($7)
                                        )
                                        OR ("arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."is public") IS NOT NULL AND ("arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."is public") = ($8)
                                        OR ("arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."is host") IS NOT NULL AND ("arxwle$l.installs-im.im-is par of-rel.is par of-rel.bel to-appl"."is host") = ($9))
                                )
                        )
                )
        )
) AS "image install" [ 61, 5, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #2 ...]
) AS "image install" [ 61, 6, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #3 ...]
) AS "image install" [ 61, 7, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #4 ...]
) AS "image install" [ 61, 8, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #5 ...]
) AS "image install" [ 61, 801, 2022-05-11T01:16:46.578Z, null, 'exited', 253, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #6 ...]
) AS "image install" [ 61, 802, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #7 ...]
) AS "image install" [ 61, 803, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]
[... SKIPPING IDENTICAL SQL, #8 ...]
) AS "image install" [ 61, 804, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]


Parsing GET /resin/device?$filter=(id in (61)) and (os_version eq null)&$select=id,is_of__device_type
Parsing GET /resin/device?$filter=(id in (61)) and (supervisor_version eq null)&$select=id&$expand=is_of__device_type($select=is_of__cpu_architecture,id)
Running GET /resin/device?$filter=(id in (61)) and (os_version eq null)&$select=id,is_of__device_type
Running GET /resin/device?$filter=(id in (61)) and (supervisor_version eq null)&$select=id&$expand=is_of__device_type($select=is_of__cpu_architecture,id)
SELECT "device"."id", "device"."is of-device type" AS "is_of__device_type"
FROM (
        SELECT "device"."created at", "device"."modified at", "device"."id", "device"."actor", "device"."api heartbeat state", "device"."uuid", "device"."local id", "device"."device name", "device"."note", "device"."is of-device type", "device"."belongs to-application", "device"."is online", "device"."last connectivity event", "device"."is connected to vpn", "device"."last vpn event", "device"."is locked until-date", "device"."logs channel", "device"."public address", "device"."vpn address", "device"."ip address", "device"."mac address", "device"."memory usage", "device"."memory total", "device"."storage block device", "device"."storage usage", "device"."storage total", "device"."cpu usage", "device"."cpu temp", "device"."is undervolted", "device"."cpu id", "device"."is running-release", "device"."download progress", "device"."status", "device"."os version", "device"."os variant", "device"."supervisor version", "device"."provisioning progress", "device"."provisioning state", "device"."api port", "device"."api secret", "device"."is managed by-service instance", "device"."should be running-release", "device"."should be operated by-release", "device"."is managed by-device", "device"."should be managed by-release", 0 AS "is web accessible", CASE
                WHEN ("device"."status" IN ('Ordered', 'Preparing')
                        OR "device"."is online" = 0
                        AND "device"."status" = 'Shipped') THEN LOWER("device"."status")
                WHEN 1 = 0 THEN 'inactive'
                WHEN "device"."provisioning state" = 'Post-Provisioning' THEN 'post-provisioning'
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown')
                        AND "device"."last connectivity event" IS NULL THEN 'configuring'
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown') THEN 'offline'
                WHEN "device"."download progress" IS NOT NULL
                        AND "device"."status" = 'Downloading' THEN 'updating'
                WHEN "device"."provisioning progress" IS NOT NULL THEN 'configuring'
                WHEN EXISTS (
                                SELECT 1
                                FROM "image install"
                                WHERE "image install"."device" = "device"."id"
                                AND "image install"."download progress" IS NOT NULL
                                AND "image install"."status" = 'Downloading'
                        ) THEN 'updating'
                ELSE 'idle'
        END AS "overall status", CASE
                WHEN ("device"."status" IN ('Ordered', 'Preparing')
                        OR "device"."is online" = 0
                        AND "device"."status" = 'Shipped'
                        OR 1 = 0) THEN NULL
                WHEN "device"."provisioning state" = 'Post-Provisioning' THEN "device"."provisioning progress"
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown')
                        AND "device"."last connectivity event" IS NULL THEN "device"."provisioning progress"
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown') THEN NULL
                WHEN "device"."download progress" IS NOT NULL
                        AND "device"."status" = 'Downloading' THEN "device"."download progress"
                WHEN "device"."provisioning progress" IS NOT NULL THEN "device"."provisioning progress"
                WHEN EXISTS (
                                SELECT 1
                                FROM "image install"
                                WHERE "image install"."device" = "device"."id"
                                AND "image install"."download progress" IS NOT NULL
                                AND "image install"."status" = 'Downloading'
                        ) THEN (
                        SELECT CAST(ROUND(AVG(COALESCE("image install"."download progress", 100))) AS INTEGER)
                        FROM "image install"
                        WHERE "image install"."device" = "device"."id"
                        AND "image install"."status" != 'deleted'
                        AND ("image install"."status" = 'Downloading'
                        OR "image install"."is provided by-release" = COALESCE("device"."should be running-release", (
                                SELECT "application"."should be running-release"
                                FROM "application"
                                WHERE "device"."belongs to-application" = "application"."id"
                        )))
                )
                ELSE NULL
        END AS "overall progress"
        FROM "device"
        WHERE (("device"."actor") IS NOT NULL AND ("device"."actor") = ($1)
        OR EXISTS (
                SELECT 1
                FROM "device" AS "device.is managed by-device"
                WHERE "device"."is managed by-device" = "device.is managed by-device"."id"
                AND (("device.is managed by-device"."actor") IS NOT NULL AND ("device.is managed by-device"."actor") = ($1)
                OR EXISTS (
                        SELECT 1
                        FROM "device" AS "device.is managed by-device.is managed by-device"
                        WHERE "device.is managed by-device"."is managed by-device" = "device.is managed by-device.is managed by-device"."id"
                        AND 1 = 0
                )
                OR EXISTS (
                        SELECT 1
                        FROM "application" AS "device.is managed by-device.belongs to-application"
                        WHERE "device.is managed by-device"."belongs to-application" = "device.is managed by-device.belongs to-application"."id"
                        AND EXISTS (
                                SELECT 1
                                FROM "application" AS "dev.is managed by-dev.bel to-appl.depends on-application"
                                WHERE "device.is managed by-device.belongs to-application"."depends on-application" = "dev.is managed by-dev.bel to-appl.depends on-application"."id"
                                AND EXISTS (
                                        SELECT 1
                                        FROM "device" AS "dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"
                                        WHERE "dev.is managed by-dev.bel to-appl.depends on-application"."id" = "dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."belongs to-application"
                                        AND ("dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."actor") IS NOT NULL AND ("dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."actor") = ($1)
                                )
                        )
                ))
        )
        OR EXISTS (
                SELECT 1
                FROM "application" AS "device.belongs to-application"
                WHERE "device"."belongs to-application" = "device.belongs to-application"."id"
                AND EXISTS (
                        SELECT 1
                        FROM "application" AS "device.belongs to-application.depends on-application"
                        WHERE "device.belongs to-application"."depends on-application" = "device.belongs to-application.depends on-application"."id"
                        AND EXISTS (
                                SELECT 1
                                FROM "device" AS "dev.belongs to-application.depends on-application.owns-device"
                                WHERE "device.belongs to-application.depends on-application"."id" = "dev.belongs to-application.depends on-application.owns-device"."belongs to-application"
                                AND ("dev.belongs to-application.depends on-application.owns-device"."actor") IS NOT NULL AND ("dev.belongs to-application.depends on-application.owns-device"."actor") = ($1)
                        )
                )
        ))
) AS "device"
WHERE "device"."id" IN ($2)
AND "device"."os version" IS NULL [ 122, 61 ]


SELECT (
        SELECT coalesce(array_to_json(array_agg("device.is of-device type".*)), '[]') AS "is_of__device_type"
        FROM (
                SELECT "device.is of-device type"."is of-cpu architecture" AS "is_of__cpu_architecture", "device.is of-device type"."id"
                FROM (
                        SELECT "device type"."created at", "device type"."modified at", "device type"."id", "device type"."slug", "device type"."name", "device type"."is of-cpu architecture", "device type"."logo", "device type"."contract", "device type"."belongs to-device family"
                        FROM "device type"
                ) AS "device.is of-device type"
                WHERE "device"."is of-device type" = "device.is of-device type"."id"
        ) AS "device.is of-device type"
) AS "is_of__device_type", "device"."id"
FROM (
        SELECT "device"."created at", "device"."modified at", "device"."id", "device"."actor", "device"."api heartbeat state", "device"."uuid", "device"."local id", "device"."device name", "device"."note", "device"."is of-device type", "device"."belongs to-application", "device"."is online", "device"."last connectivity event", "device"."is connected to vpn", "device"."last vpn event", "device"."is locked until-date", "device"."logs channel", "device"."public address", "device"."vpn address", "device"."ip address", "device"."mac address", "device"."memory usage", "device"."memory total", "device"."storage block device", "device"."storage usage", "device"."storage total", "device"."cpu usage", "device"."cpu temp", "device"."is undervolted", "device"."cpu id", "device"."is running-release", "device"."download progress", "device"."status", "device"."os version", "device"."os variant", "device"."supervisor version", "device"."provisioning progress", "device"."provisioning state", "device"."api port", "device"."api secret", "device"."is managed by-service instance", "device"."should be running-release", "device"."should be operated by-release", "device"."is managed by-device", "device"."should be managed by-release", 0 AS "is web accessible", CASE
                WHEN ("device"."status" IN ('Ordered', 'Preparing')
                        OR "device"."is online" = 0
                        AND "device"."status" = 'Shipped') THEN LOWER("device"."status")
                WHEN 1 = 0 THEN 'inactive'
                WHEN "device"."provisioning state" = 'Post-Provisioning' THEN 'post-provisioning'
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown')
                        AND "device"."last connectivity event" IS NULL THEN 'configuring'
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown') THEN 'offline'
                WHEN "device"."download progress" IS NOT NULL
                        AND "device"."status" = 'Downloading' THEN 'updating'
                WHEN "device"."provisioning progress" IS NOT NULL THEN 'configuring'
                WHEN EXISTS (
                                SELECT 1
                                FROM "image install"
                                WHERE "image install"."device" = "device"."id"
                                AND "image install"."download progress" IS NOT NULL
                                AND "image install"."status" = 'Downloading'
                        ) THEN 'updating'
                ELSE 'idle'
        END AS "overall status", CASE
                WHEN ("device"."status" IN ('Ordered', 'Preparing')
                        OR "device"."is online" = 0
                        AND "device"."status" = 'Shipped'
                        OR 1 = 0) THEN NULL
                WHEN "device"."provisioning state" = 'Post-Provisioning' THEN "device"."provisioning progress"
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown')
                        AND "device"."last connectivity event" IS NULL THEN "device"."provisioning progress"
                WHEN "device"."is online" = 0
                        AND "device"."api heartbeat state" IN ('offline', 'unknown') THEN NULL
                WHEN "device"."download progress" IS NOT NULL
                        AND "device"."status" = 'Downloading' THEN "device"."download progress"
                WHEN "device"."provisioning progress" IS NOT NULL THEN "device"."provisioning progress"
                WHEN EXISTS (
                                SELECT 1
                                FROM "image install"
                                WHERE "image install"."device" = "device"."id"
                                AND "image install"."download progress" IS NOT NULL
                                AND "image install"."status" = 'Downloading'
                        ) THEN (
                        SELECT CAST(ROUND(AVG(COALESCE("image install"."download progress", 100))) AS INTEGER)
                        FROM "image install"
                        WHERE "image install"."device" = "device"."id"
                        AND "image install"."status" != 'deleted'
                        AND ("image install"."status" = 'Downloading'
                        OR "image install"."is provided by-release" = COALESCE("device"."should be running-release", (
                                SELECT "application"."should be running-release"
                                FROM "application"
                                WHERE "device"."belongs to-application" = "application"."id"
                        )))
                )
                ELSE NULL
        END AS "overall progress"
        FROM "device"
        WHERE (("device"."actor") IS NOT NULL AND ("device"."actor") = ($1)
        OR EXISTS (
                SELECT 1
                FROM "device" AS "device.is managed by-device"
                WHERE "device"."is managed by-device" = "device.is managed by-device"."id"
                AND (("device.is managed by-device"."actor") IS NOT NULL AND ("device.is managed by-device"."actor") = ($1)
                OR EXISTS (
                        SELECT 1
                        FROM "device" AS "device.is managed by-device.is managed by-device"
                        WHERE "device.is managed by-device"."is managed by-device" = "device.is managed by-device.is managed by-device"."id"
                        AND 1 = 0
                )
                OR EXISTS (
                        SELECT 1
                        FROM "application" AS "device.is managed by-device.belongs to-application"
                        WHERE "device.is managed by-device"."belongs to-application" = "device.is managed by-device.belongs to-application"."id"
                        AND EXISTS (
                                SELECT 1
                                FROM "application" AS "dev.is managed by-dev.bel to-appl.depends on-application"
                                WHERE "device.is managed by-device.belongs to-application"."depends on-application" = "dev.is managed by-dev.bel to-appl.depends on-application"."id"
                                AND EXISTS (
                                        SELECT 1
                                        FROM "device" AS "dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"
                                        WHERE "dev.is managed by-dev.bel to-appl.depends on-application"."id" = "dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."belongs to-application"
                                        AND ("dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."actor") IS NOT NULL AND ("dev.is managed by-dev.bel to-appl.depends on-appl.owns-device"."actor") = ($1)
                                )
                        )
                ))
        )
        OR EXISTS (
                SELECT 1
                FROM "application" AS "device.belongs to-application"
                WHERE "device"."belongs to-application" = "device.belongs to-application"."id"
                AND EXISTS (
                        SELECT 1
                        FROM "application" AS "device.belongs to-application.depends on-application"
                        WHERE "device.belongs to-application"."depends on-application" = "device.belongs to-application.depends on-application"."id"
                        AND EXISTS (
                                SELECT 1
                                FROM "device" AS "dev.belongs to-application.depends on-application.owns-device"
                                WHERE "device.belongs to-application.depends on-application"."id" = "dev.belongs to-application.depends on-application.owns-device"."belongs to-application"
                                AND ("dev.belongs to-application.depends on-application.owns-device"."actor") IS NOT NULL AND ("dev.belongs to-application.depends on-application.owns-device"."actor") = ($1)
                        )
                )
        ))
) AS "device"
WHERE "device"."id" IN ($2)
AND "device"."supervisor version" IS NULL [ 122, 61 ]


Insert ID:  image_install 5764704
Insert ID:  image_install 5764705
Insert ID:  image_install 5764706
Insert ID:  image_install 5764707
Insert ID:  image_install 5764708
DatabaseError: Rolling back transaction
    at PostgresTx._rollback (/usr/src/app/node_modules/@_balena/pinejs/src/database-layer/db.ts:587:19)
    at PostgresTx.rollback (/usr/src/app/node_modules/@_balena/pinejs/src/database-layer/db.ts:341:25)
    at Object.transaction (/usr/src/app/node_modules/@_balena/pinejs/src/database-layer/db.ts:433:15)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async statePatchV2 (/usr/src/app/src/features/device-state/routes/state-patch-v2.ts:384:4)
[followed by 3 repetitions of the same DatabaseError stack message]

EDIT: followed immediately by a 401 error, even after setting the api secret column:

2022-05-11T20:12:41.844Z 207.81.194.15 a/85 PATCH /device/v2/964534cf6773f131fa75a370fdc99495/state 401 29.531ms -
1 Like

The devices with issues are also the only ones without values in the api secret column. I did not need to enter the secret for the others, so not sure if filling this hash in manually makes any difference.
Edit: possibly helped with a few, mostly not.

After burrowing into @balena/pinejs, the rollback originates in in src/database-layer/db.ts, in createTransaction(), where the supplied createFunc handle is throwing:

UnauthorizedError
    at convertToHttpError (/usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:1198:10)
    at /usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:1082:14
    at Array.map (<anonymous>)
    at /usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:1079:30
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async runURI (/usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:856:21)
    at async PinejsClient._request (/usr/src/app/node_modules/balena/pinejs/src/sbvr-api/sbvr-utils.ts:792:11)
    at async PinejsClient.callWithRetry (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:884:11)
    at async upsertImageInstall (/usr/src/app/src/features/device-state/state-patch-utils.ts:141:3)
    at async /usr/src/app/src/features/device-state/routes/state-patch-v2.ts:310:8
    at async Promise.all (index 5)
    at async /usr/src/app/src/features/device-state/routes/state-patch-v2.ts:308:6
    at async Promise.all (index 1)
    at async /usr/src/app/src/features/device-state/routes/state-patch-v2.ts:389:5
    at async Object.transaction (/usr/src/app/node_modules/balena/pinejs/src/database-layer/db.ts:428:20)
    at async statePatchV2 (/usr/src/app/src/features/device-state/routes/state-patch-v2.ts:384:4) {

which leads back to upsertImageInstall(). I’ll keep looking there. It would be great if PineJS was more transparent in relaying the cause of the rollback.

I tracked the UnauthorizedError back to a PermissionError thrown in the PineJS src/sbvr-api/sbvr-utils.ts. Still tracking back from there. If necessary, blocking this (and overriding the lockfiles) allows devices to upgrade successfully.

Hello,
thanks for sharing so much information also in your other debugging thread. Our community much appreciates it!

I’d like to understand the situation better:

  • You have created a new openBalena server (what version and what was the old version?)
  • You are using the same domain on the device
  • I wasn’t able to understand from the details of moving the devices to the new server instance: Did you create new config.json files from the new instance and copied them to the devices?
  • Did you only change database entries to enable the old devices or did you change something on the devices, like config.json, certificates?
  • Can you give more details about the differences of the working and not working devices? Are they running the same balenaOS and supervisor version?
  • Have the device been pinned to different releases before?

My current guess is that there is a missing relationship between the the device ID, the device actor ID, the release and the images. I can see that the devices ID is 61 and the actor for this failing SQL requests is 122. Can you please double check the database entries for table device and table actor if both are existing and are linked properly?

I can also see the data that should be updated into the database:


) AS "image install" [ 61, 5, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
) AS "image install" [ 61, 6, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
) AS "image install" [ 61, 7, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
) AS "image install" [ 61, 8, 2022-05-11T01:16:46.578Z, null, 'Downloaded', 2, 122, 1, 1 ]
) AS "image install" [ 61, 801, 2022-05-11T01:16:46.578Z, null, 'exited', 253, 122, 1, 1 ]
) AS "image install" [ 61, 802, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]
) AS "image install" [ 61, 803, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]
) AS "image install" [ 61, 804, 2022-05-11T01:16:46.578Z, null, 'Running', 253, 122, 1, 1 ]

whereas 61 is the device id and the second column is the image id and the 122 is the actor id and the releaseIDs are 2 and 253. Please make sure in the database that these combinations are valid. In terms that the device 61 as actor 122 can link to the images and the releases.

You could also try to run this command from the failing device hostOS terminal to check the connectivity to the endpoint and that it can identify itself as a proper device and fetches its own data:

curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/device

This following command run from the failing device hostOS terminal will retrieve the device target state. Please compare that state with the release and imageID that are on the balenaInstance:

curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/device/v2/$(cat /mnt/boot/config.json | jq -r .uuid)/state | jq

To proceed with this, I suggest that you create new config.json files from the new openBalena instance and in the case you have remote access to the device let the device ‘re-provision’ to the new instance. An ‘easy’ way to do this, is to change the balena-api endpoint in the config.json and restart the supervisor on the device, this will detect that the ai endpoint has changed and will delete existing data from the device (images, volumes, …) and tries to re-provision. Then you change it back to the ‘real’ endpoint and restart the supervisor again.

Do you have physical access to the devices and could perform a balena join according to this documentation balena CLI Documentation - Balena Documentation ?

Best Regards
Harald

1 Like

Thank you Harald! Here are initial answers and I will match up the IDs next.

  • Same server, after following an ill-advised suggestion to delete some volumes and re-run the setup script, then finding out that the backups were useless. Took the opportunity to upgrade v3.4.0 → v3.6.0 and improve the backup system.
  • Same domain.
  • Same server instance and using a copy of the original config.json.
  • Database entries were recreated from scratch with open-balena-api, which took care of most associations. No changes on the devices, which are mostly deployed.
  • All devices are one of two images with balenaOS 2.47.0+rev1-dev or balenaOS 2.77.0+rev1-dev and there is no correlation between image and connectivity. As you guess, any difference is likely in the database.
  • All devices were previously pinned to a now-lost release of the application. All are now pinned to a new release on the application with the same name.

Difficult physical access, and this began with a server-side problem, so I am focusing on server-side changes first.

Thanks, device/actor IDs definitely match. A few devices’ updates were stuck due to missingservice install records, but after filling those and restarting the open-balena-api container again, even the log streaming is authorized, and it has been possible to upgrade everything as long as the rollback is blocked. The settings are close to perfect, and I am hoping to figure the rollbacks out at the database level, if I can find which query handles authorization and log it.

404 error (“Cannot GET device”) on the device I tried, but at least it reached the server.

Gets JSON output. This fits with what I’ve seen, that the devices are trying to upgrade and only failing because of server-side errors.

Hello @MBer,

thanks for lettings us know.

  1. When the query fails, the device is not able to get its target state and does not know to what state the device should update. It seems to me that the device tries to update to the old target state, which may not work as the old locally stored target state has IDs which may not fit anything from the new instance.
curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/device/v2/$(cat /mnt/boot/config.json | jq -r .uuid)/state | jq
  • You may want to query the same API endpoint with your users and the device UUID that the device should have (according to your database entries).
  1. Please double check that the device can fetch itself with this query:
curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" "$(cat /mnt/boot/config.json | jq -r .apiEndpoint)/v6/device('$(cat /mnt/boot/config.json | jq -r .deviceId)')" | jq
  • If this query fails the deviceID the device knows for itself is most likely not existing in the server instance.
  1. Sorry for giving the get any device request in the first response, it was misleading as this shows ANY device this device is allowed to see, for what ever reason.

Gets JSON output. This fits with what I’ve seen, that the devices are trying to upgrade and only failing because of server-side errors.

  1. Can you please explain me what you mean by:

which query handles authorization`

  • Basically each request executed in open-balena-api and pinejs is handling permissions. The authorisation is the identification who is querying and if this who is known and has valid credentials. But the permissions (who is allowed to see what resource) is handled in every request. For each request the actor is identified and it’s checked what this actor is allowed to access in the database and moreover this happens all in every single SQL query to the database.

The most important thing is, that the device-local deviceID and UUID are existing on the server.

Best Regards
Harald

Sorry, some delay before another device showed the issue.

  1. Thanks, the values extracted from the file confirm that I have the correct UUID and API key for the device, which successfully reports VPN connectivity and status to openBalena. The response looks wrong though, so I will try for server-side logs:
~# curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/device/v2/$(cat /mnt/boot/config.json | jq -r .uuid)/state | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  1. I’m pretty sure this failed too.
~# curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" "$(cat /mnt/boot/config.json | jq -r .apiEndpoint)/v6/device('$(cat /mnt/boot/config.json | jq -r .deviceId)')" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     8  100     8    0     0      9      0 --:--:-- --:--:-- --:--:--     9
{
  "d": []
}

Edit: I had missed the .deviceId in this query. The local numerical ID was indeed different. After updating the config file, this query began to succeed and return a target release:

"should_be_running__release": {
        "__id": 5
      },
  1. Thanks, so it might be any of the queries involved in updates coming up empty, due to a slight difference from the others that authorize normally.

Hello @Mber

just to wrap this issue up. Have you been able to fix all the stated problems or do you need further assistance?
I understand that all the devices are now able to connect and retrieve a device target state properly, right?

Thanks for letting us know and best regards
Harald

Thanks Harald. Almost - there are a few more that I have not yet gotten a full connection for. Some proved easier to re-flash with an image linked to the new server than to recover all keys and IDs for, Not seeking further assistance at this time as all the important ones are connected.

Hallo @mber

thanks for letting us know that you were able to reconnect the crucial devices to the new open-balena instances.

Best Regards
Harald