BUG: `balena deploy` corrups ARM/ARM64 binaries: wrong ELF class / cannot open shared object file

Hi,

I’m having 2 similar issues that I think are linked. One of them is blocking me. Here is all the background on my setup.

Context

I recently installed a brand new fresh OpenBalena 3.6.0 on a Digitalocean Droplet using Ubuntu 22.04.
I’m in the process of testing everything before migrating my fleet to this new OpenBalena.
My fleet currently is managed by an old OpenBalena 1.3.0, which is running on a separate Droplet on Ubuntu 18.04.

With the new OpenBalena, I am using balena-cli 13.6.1.
With the old OpenBalena, I used to run balena-cli 11.31.26.

With the new OpenBalena, I am using BalenaOS versions such as 2.94.4 2.83.21+rev1
With the old OpenBalena, I was using BalenaOS version such as 2.32.0+rev1 2.46.1+rev1 2.48.0+rev1

With the new OpenBalena, I use the following base image in Dockerfile.template:
FROM balenalib/%%BALENA_MACHINE_NAME%%-debian-node:14-bullseye-run
With the old OpenBalena, I used the following base image in Dockerfile.template:
FROM balenalib/%%BALENA_MACHINE_NAME%%-node:14-buster-run

I am using Raspberry Pi Zero W (raspberry-pi / armv6hf) as well as Raspberry Pi Zero 2 W (raspberrypi0-2w-64 / aarch64).

As you may see, the Raspberry Pi Zero 2 W cannot run on my old Infrastructure, because it’s minimal BalenaOS version is above the maximal BalenaOS of my old OpenBalena instance.
This is the entire reason for upgrading. Being able to use both the old Raspberry Pi, and the new.

I have an application that uses Node.JS. That application uses the better-sqlite3 npm module.
That module requires a binary file to be compiled for the armv6hf arch, and in theory has a prebuilt binary for the aarch64 arch, which npm/yarn can pull automatically.

This used to work perfectly on my old OpenBalena. However it does not work on the new OpenBalena, in 2 different ways, depending on the Raspberry Pi Zero W or the Raspberry Pi Zero 2 W.

Error with Raspberry Pi Zero 2 W

This is the first error I encountered.
When using balena deploy <myfleet> --emulated --build, I get a warning at each build step that reads:

[Build]   main  ---> [Warning] The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64) and no specific platform was requested

This warning never appeared in my previous OpenBalena. I am indeed building on a linux/amd64 for linux/arm64/v8. What bothers me is that the warning suggests no platform was requested. I looked around for ways to explicitly specify the platform in docker, but nothing made that warning go away.
So I rolled with the warning. My build was successful.

However, once deployed, when the Node.JS script first requires the better-sqlite3 module, a fatal error occurs.

[Logs]    [2022-7-1 2:30:03] [main] /usr/src/app/node_modules/bindings/bindings.js:121
[Logs]    [2022-7-1 2:30:03] [main]         throw e;
[Logs]    [2022-7-1 2:30:03] [main]         ^
[Logs]    [2022-7-1 2:30:03] [main] 
[Logs]    [2022-7-1 2:30:03] [main] Error: /usr/src/app/node_modules/better-sqlite3/build/Release/better_sqlite3.node: cannot open shared object file: No such file or directory
[Logs]    [2022-7-1 2:30:03] [main]     at Object.Module._extensions..node (internal/modules/cjs/loader.js:1144:18)
[Logs]    [2022-7-1 2:30:04] [main]     at Module.load (internal/modules/cjs/loader.js:950:32)
[Logs]    [2022-7-1 2:30:04] [main]     at Function.Module._load (internal/modules/cjs/loader.js:790:12)
[Logs]    [2022-7-1 2:30:04] [main]     at Module.require (internal/modules/cjs/loader.js:974:19)
[Logs]    [2022-7-1 2:30:04] [main]     at require (internal/modules/cjs/helpers.js:93:18)
[Logs]    [2022-7-1 2:30:04] [main]     at bindings (/usr/src/app/node_modules/bindings/bindings.js:112:48)
[Logs]    [2022-7-1 2:30:04] [main]     at new Database (/usr/src/app/node_modules/better-sqlite3/lib/database.js:48:64)
[Logs]    [2022-7-1 2:30:04] [main]     at Socket.<anonymous> (/usr/src/app/<myscript>.js:18:14)
[Logs]    [2022-7-1 2:30:04] [main]     at Object.onceWrapper (events.js:520:26)
[Logs]    [2022-7-1 2:30:04] [main]     at Socket.emit (events.js:400:28) {
[Logs]    [2022-7-1 2:30:04] [main]   code: 'ERR_DLOPEN_FAILED'
[Logs]    [2022-7-1 2:30:04] [main] }

This means the binding binary (in therory downloaded as prebuilt) is nowhere to be found.
I say in theory, because indeed, the build step where yarn installs decencies was very fast, whereas it used to tak ~15min on my machine when it needed to build better-sqlite3. But since this errors says the file is not there, I am unsure if it actually downloaded it.

I logged in with SSH, and the file actually was there.
I manually ran yarn to install dependencies again through SSH, and after that, the software ran fine.

So at that point I though “alright, I’ll just move the yarn call inside my init script, instead of inside the Dockerfile, that way the dependencies are downloaded on the device directly”.

And that worked.
Then came time to deploy the same software on the Raspberry Pi Zero W…

Error with Raspberry Pi Zero W

Since this device needs to build the better-sqlite3 module from scratch every time (no prebuilt binaries), it was impractical to run yarn in the init script. This would mean a >40min build time on first boot as well as on every update from the registry.
So I decided to create a second Dockerfile specifically for the Raspberry Pi Zero W, which would install yarn dependencies in the Dockerfile, like I used to in my old OpenBalena.
image

Again, during build, I have the warning (this time the platform is linux/arm/v6, which is correct).

[Build]   main  ---> [Warning] The requested image's platform (linux/arm/v6) does not match the detected host platform (linux/amd64) and no specific platform was requested

But here comes the final issue, which I cannot work around. Building better-sqlite3 in the Docker build, like I used to in my previous OpenBalena, results in an error once deployed, odly similar to the one mentioned above, yet not exactly the same. Here it is:

[Logs]    [2022-7-1 2:30:03] [main] /usr/src/app/node_modules/bindings/bindings.js:121
[Logs]    [2022-7-1 2:30:03] [main]         throw e;
[Logs]    [2022-7-1 2:30:03] [main]         ^
[Logs]    [2022-7-1 2:30:03] [main] 
[Logs]    [2022-7-1 2:30:03] [main] Error: /usr/src/app/node_modules/better-sqlite3/build/Release/better_sqlite3.node: wrong ELF class: ELFCLASS64
[Logs]    [2022-7-1 2:30:03] [main]     at Object.Module._extensions..node (internal/modules/cjs/loader.js:1144:18)
[Logs]    [2022-7-1 2:30:04] [main]     at Module.load (internal/modules/cjs/loader.js:950:32)
[Logs]    [2022-7-1 2:30:04] [main]     at Function.Module._load (internal/modules/cjs/loader.js:790:12)
[Logs]    [2022-7-1 2:30:04] [main]     at Module.require (internal/modules/cjs/loader.js:974:19)
[Logs]    [2022-7-1 2:30:04] [main]     at require (internal/modules/cjs/helpers.js:93:18)
[Logs]    [2022-7-1 2:30:04] [main]     at bindings (/usr/src/app/node_modules/bindings/bindings.js:112:48)
[Logs]    [2022-7-1 2:30:04] [main]     at new Database (/usr/src/app/node_modules/better-sqlite3/lib/database.js:48:64)
[Logs]    [2022-7-1 2:30:04] [main]     at Socket.<anonymous> (/usr/src/app/<myscript>.js:18:14)
[Logs]    [2022-7-1 2:30:04] [main]     at Object.onceWrapper (events.js:520:26)
[Logs]    [2022-7-1 2:30:04] [main]     at Socket.emit (events.js:400:28) {
[Logs]    [2022-7-1 2:30:04] [main]   code: 'ERR_DLOPEN_FAILED'
[Logs]    [2022-7-1 2:30:04] [main] }

Conclusion

As you can see, the error for the Raspberry Pi Zero W is wrong ELF class. As if it was built for amd64.
Considering this, and considering the warning, I am suspecting very heavily that the balena deploy command does

  • something different than the old one (balena-cli 13.6.111.31.26).
  • something wrong with architectures (since the node module is not compiled for the correct arch, or absent)

I would very much appreciate the Balena Team helping me debug this. I’ve been on this “migration” for a few nights now, and I keep hitting roadblocks. I managed to overcome all of them, but this one I cannot go around. If I need to compile this module on every boot/update on every device, it is inconceivable.
I want to get the same experience as my old OpenBalena instance.

Thanks for reading. I am at your disposal for extra information.
Tim

Hello Tim,
If I understand correct, the module to be compiled is the same in all the builds right? (you don’t change this code, just compile it to be used by your node.js script), if that is the case, wouldn’t it be possible to compile it once and store the pre-built binary somewhere and just download it as you do for the Pi Zero 2W?

Hi,

You are correct, it is the same module.
It is simply listed as a dependency in my package.json, and deployed for both devices.

  "dependencies": {
    "better-sqlite3": "^7.5.3",
  }

The thing is that the downloading or not downloading happens automatically. Yarn/Npm calls a script that checks for prebuilt binaries from the module developers. It is not something I implemented. I just run the Yarn/Npm installer.

But I suppose compiling my own prebuilt would be a possible workaround. I’ll look into it.
However, it is far from an ideal situation, because I would have to manage these prebuilt versions, remake them every time a new version of better-sqlite3 comes out, and I also would need to “hack” into Yarn/Npm to make it use my own prebuilt binaries.

What I’m trying to say is, I should not have to do any of this. There is something wrong here, either with balena-cli, docker, qemu, or something else I’m missing, and fixing it would make my life much easier.

Have you also seen that “platform” warning when building for a different arch than your host machine ? What do you think of it ?

My workaround

I put in place a workflow to precompile the NodeJS module myself, then inject the precompiled binary.

But if I overwrite the binary inside Dockerfile.template, then I run into the exact same issue mentionned in my first post.
cannot open shared object file: No such file or directory on RPI Zero 2
wrong ELF class: ELFCLASS64 on RPI Zero 1.

My workaround of the workaround

To work around this, I download and patch the binary file inside node_modules in the init script (so on every boot of the container). This is a bit ugly, but not too ugly. Since the file is 1.6MB, it’s fine.

In the unlikely case that someone stubbles on this post with the exact same issue as me, also with better-sqlite3, then feel free to use my prebuilt binaries.
I do not provide any guarantee or any support for them, for obvious reasons. They are just compiled on Raspberry Pi OS 11 Bullseye with NodeJS 14.

Here is how I use it inside my application.

Dockerfile.template

[...]

# Install Yarn dependencies and clean cache
RUN yarn --prod && yarn cache clean --all
# Pass the arch into the container so we can download the correct prebuilt
ENV BALENA_ARCH=%%BALENA_ARCH%%

[...]

# will run when container starts up on the device
CMD ["bash", "./init.sh"]

init.sh

echo "[init] Downloading prebuilt better-sqlite3"
mkdir -p ./node_modules/better-sqlite3/build/Release
wget "https://raw.githubusercontent.com/naito-one/better-sqlite3-prebuilt/master/bin/7.5.3/$BALENA_ARCH/better_sqlite3.node" -O ./node_modules/better-sqlite3/build/Release/better_sqlite3.node

[...]

Conclusion

Regarding my conclusion in the first post, the build might be done for the correct platform. But the binary file is actually being corrupted at some point during the creation of the docker container.

I hope someone looks into this, because I neither have the knowledge or time to fix this, yet it is a very big inconvenience. For instance, I’m glad I do not have 10 binary modules in my application, but someone might.

Steps to reproduce: Create a NodeJS application that requires better-sqlite3 and deploy it with Balena Cli 13.6.1 on a Raspberry Pi Zero W or a Raspberry Pi Zero 2 W with balena deploy <fleet> --emulated --build

Sample app

Dockerfile.template

# see more about dockerfile templates here: https://www.balena.io/docs/learn/develop/dockerfile/#dockerfile-templates
# and about balena base images here: https://www.balena.io/docs/reference/base-images/base-images/
FROM balenalib/%%BALENA_MACHINE_NAME%%-debian-node:14-bullseye-run

# use `install_packages` if you need to install dependencies,
RUN install_packages python3 sqlite3 libsqlite3-dev build-essential

# Install nodejs app

# Defines our working directory in container
WORKDIR /usr/src/app

# Copies the package.json first for better cache on later pushes
COPY package.json package.json
COPY yarn.lock yarn.lock

# Install Yarn dependencies and clean cache
# This will be instant for the RPI Zero 2 and will take ~15-25 min for the RPI Zero 1
# Because the RPI Zero 2 has prebuilt binaries provided by the developers
RUN yarn --prod && yarn cache clean --all

# This will copy all files in our root to the working directory in the container
COPY . ./

# will run when container starts up on the device
CMD ["bash", "./init.sh"]

package.json

{
  "name": "test",
  "dependencies": {
    "better-sqlite3": "7.5.3",
  }
}

init.sh

echo "[init] Setting up Database"
sqlite3 -init "./createDatabase.sql" /data/database.db .exit

node main.js

createDatabase.sql

CREATE TABLE IF NOT EXISTS test (test TEXT);

main.js

const Database = require('better-sqlite3')

const db = new Database('/data/database.db') // crashes here