Issues Using Balena Preload

Heyo. I have been using balena preload for its intended purpose for some time now, but I recently started getting the following snag:

(base) meawoppl@monolith:~/repos/balena-preload$ balena preload balena-cloud-teletom-experiment-raspberrypi0-2w-64-2.87.16+rev1-dev-v12.11.0.img
Building Docker preloader image. [========================] 100%
| Checking that the image is a writable file
| Finding a free tcp port
| Checking if the image is an edison zip archive
| Creating preloader container
/ Starting preloader container
\ Reading image information
? Select a fleet g_matthew_goodman/teletom-experiment
? Select a release current
- Fetching application 1907066
/ Cleaning up temporary files
network timeout at: https://api.balena-cloud.com/device/register

Additional information may be available with the `--debug` flag.

For further help or support, visit:
https://www.balena.io/docs/reference/balena-cli/#support-faq-and-troubleshooting

Not sure what to make of that and google/forum search does not turn up much…

Adding the --debug flag as suggested isn’t super helpful:

(base) meawoppl@monolith:~/repos/balena-preload$ balena preload --debug balena-cloud-teletom-experiment-raspberrypi0-2w-64-2.87.16+rev1-dev-v12.11.0.img
[debug] new argv=[/home/meawoppl/bin/balena-cli/balena,/snapshot/versioned-source/bin/balena,preload,balena-cloud-teletom-experiment-raspberrypi0-2w-64-2.87.16+rev1-dev-v12.11.0.img] length=4
[debug] Deprecation check: 4.54837 days since last npm registry query for next major version release date.
[debug] Will not query the registry again until at least 7 days have passed.
Building Docker preloader image. [===                     ] 12%
Step 1/7 : FROM alpine:3.12
Building Docker preloader image. [======                  ] 25%
Step 2/7 : WORKDIR /usr/src/app
 ---> Using cache
Building Docker preloader image. [=========               ] 37%
Step 3/7 : RUN apk add --no-cache curl py3-pip parted btrfs-progs util-linux sfdisk file coreutils sgdisk e2fsprogs-extra docker
 ---> Using cache
Building Docker preloader image. [============            ] 50%
Step 4/7 : COPY requirements.txt ./
 ---> Using cache
Building Docker preloader image. [===============         ] 62%
Step 5/7 : RUN pip3 install --no-cache-dir -r requirements.txt
 ---> Using cache
Building Docker preloader image. [==================      ] 75%
Step 6/7 : COPY src/ ./
 ---> Using cache
Building Docker preloader image. [=====================   ] 87%
Step 7/7 : CMD ["python3", "/usr/src/app/preload.py"]
 ---> Using cache
 ---> 3749fc82f9cb
Successfully built 3749fc82f9cb
Building Docker preloader image. [========================] 100%
| Checking that the image is a writable file
| Finding a free tcp port
| Checking if the image is an edison zip archive
| Creating preloader container
/ Starting preloader container
/ Reading image informationWaiting for Docker to start...
| Reading image informationDocker started
| Reading image information
? Select a fleet g_matthew_goodman/teletom-experiment
? Select a release current
- Fetching application 1907066
/ Cleaning up temporary files
network timeout at: https://api.balena-cloud.com/device/register

FetchError: network timeout at: https://api.balena-cloud.com/device/register
    at Timeout.<anonymous> (/snapshot/versioned-source/node_modules/node-fetch/lib/index.js:1454:13)
    at listOnTimeout (internal/timers.js:549:17)
    at processTimers (internal/timers.js:492:7)
From previous event:
    at Preloader.preload (/snapshot/versioned-source/node_modules/balena-preload/build/preload.js:720:25)
    at PreloadCmd.prepareAndPreload (/snapshot/versioned-source/build/commands/preload.js:318:25)
    at async PreloadCmd.run (/snapshot/versioned-source/build/commands/preload.js:127:13)
    at async PreloadCmd._run (/snapshot/versioned-source/node_modules/@oclif/command/lib/command.js:43:20)
    at async Config.runCommand (/snapshot/versioned-source/node_modules/@oclif/config/lib/config.js:175:24)
    at async CustomMain.run (/snapshot/versioned-source/node_modules/@oclif/command/lib/main.js:27:9)
    at async CustomMain._run (/snapshot/versioned-source/node_modules/@oclif/command/lib/command.js:43:20)
    at async /snapshot/versioned-source/build/app.js:76:13
    at async Promise.all (index 2)
    at async oclifRun (/snapshot/versioned-source/build/app.js:94:5)
    at async Object.run (/snapshot/versioned-source/build/app.js:107:9)
    at async run (/snapshot/versioned-source/bin/balena:20:2)

For further help or support, visit:
https://www.balena.io/docs/reference/balena-cli/#support-faq-and-troubleshooting

Further data. Each time I run this command, an extra “inactive” device gets added to the UI. It looks like things are working, but the .img file is not getting updated with the application.

Further info. It looks like to configure the image, the SDK gets pulled to register get the “state” then remove the device. Not clear to me the what/why, but I think this is the call that is timing out:

Not going to chase this any deeper, it feels like a backend issue.

Thank you @mattyg for sharing your issue!

Could you please confirm what machine are you using for running the balena preload? How is this connected to the Internet?

@mattyg, I think you are right about where the error comes from. As you have found, the _getState function temporarily creates a device and deletes it soon after. (Looking at the code, it appears to do so in order to grab a copy of the balena supervisor’s target state for the fleet, which is then stored at /mnt/data/apps.json for consumption by the supervisor when the preloaded image is first booted, so that the supervisor can create the app containers straight away even in the absence of an internet connection for retrieval of the target state.)

The fact that you found inactive devices added to the UI indicates that the /device/register API endpoint got as far successfully creating the temporary device, but then the timeout error caused the _getState function to exit (insufficient error handling) without deleting the device. I have created a GitHub bug issue for it: Inactive (temporary) devices left behind if preloading fails between creation and deletion · Issue #270 · balena-io-modules/balena-preload · GitHub

I have attempted to reproduce the issue just now with a Raspberry Pi Zero fleet, but there was no error. This suggests that the backend issue was transient and it may now work if you try again. Let us know it goes and thanks again for investigating and reporting this issue.

@mpous I tried it on two different linux machines, both current Ubuntu LTS (I can find the exact release if you think it matters). Both wifi connected, but neither seem to have any issues with other cli endpoints.

I have been having to use this command a lot because of the other issue you have been helping me with (link). Because I am bouncing between base images to try to get the camera stack working, I prefer the preload pathway to avoid getting the pi to d/l a ~1.5GB image “update”, its just much faster for testing.

1 Like

@pdcastro thanks for the triage. Running the same command today succeeds without issue, so marking it as transient. I do think the cleanup call should probably have some try/catch logic sprinkled in there, good to see that will get scooped up at some point.

Thanks for being super responsive in this all btw. :slight_smile:

1 Like