Builds failing overnight w/ cli v12.1.1

We have this multi-container build that has been working for years we are experiencing a strange builder problem. Two days ago the build succeeded, yesterday there were missing images while building resulting in failed builds, and today the images are not missing but the typescript compilation is failing.

The containers all build locally, also the containers will build successfully each on their own when pushed to a test project.

** but when we push the composed project we get strange errors when trying to compile the typescript in two of the three containers. See Below

Any help on the topic is appreciated as we have run out of ideas.

Builder Error

[base]     Step 13/30 : RUN npm run build
[base]      ---> Running in 1692b70a8832
[base]     > cloudcue-base@1.0.0 build /usr/src/app
[base]     > rm -rf dist && tsc
[base]     internal/modules/cjs/loader.js:969
[base]       throw err;
[base]       ^
[base]     Error: Cannot find module '../lib/tsc.js'
[base]     Require stack:
[base]     - /usr/src/app/node_modules/.bin/tsc
[base]         at Function.Module._resolveFilename (internal/modules/cjs/loader.js:966:15)
[base]         at Function.Module._load (internal/modules/cjs/loader.js:842:27)
[base]         at Module.require (internal/modules/cjs/loader.js:1026:19)
[base]         at require (internal/modules/cjs/helpers.js:72:18)
[base]         at Object. (/usr/src/app/node_modules/.bin/tsc:2:1)
[base]         at Module._compile (internal/modules/cjs/loader.js:1138:30)
[base]         at Object.Module._extensions..js (internal/modules/cjs/loader.js:1158:10)
[base]         at Module.load (internal/modules/cjs/loader.js:986:32)
[base]         at Function.Module._load (internal/modules/cjs/loader.js:879:14)
[base]         at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12) {
[base]       code: 'MODULE_NOT_FOUND',
[base]       requireStack: [ '/usr/src/app/node_modules/.bin/tsc' ]
[base]     }

Compose.yaml

    version: '2.1'

    volumes:
      client-data:
        # external: true

    services:
      base:
        build: ./base
        image: cloudcue-client-base
        container_name: cloudcue-base
        restart: always
        privileged: true
        volumes:
          - 'client-data:/var/lib/cloudcue'
        expose:
          - '80'
        ports:
          - '8088:80'
        labels:
          io.balena.features.supervisor-api: '1'

      ui:
        build: ./base-ui
        image: cloudcue-client-base-ui
        container_name: cloudcue-ui
        restart: always
        environment:
          - API_HOST=base
          - API_PORT=80
        depends_on:
          - base
        expose:
          - '80'
        ports:
          - '8080:80'

      wpe:
        build: ./tools/wpe
        image: cloudcue-client-wpe
        container_name: cloudcue-wpe
        restart: always
        privileged: true
        environment:
          - WPE_URL=http://ui:80
        depends_on:
          - ui

First Container [base]

    # # *** Builder Container *** ---------------------------------------------------

    FROM balenalib/raspberrypi3-alpine-node:12-build as build

    # FROM node:10.15-alpine as build

    # RUN apk --no-cache add --virtual native-deps \

    # make g++ gcc python linux-headers udev libgcc libstdc++ wxgtk wxgtk-dev

    WORKDIR /usr/src/

    ADD ./BOSSA-1.7.0.tar.gz .

    RUN make -C BOSSA-1.7.0 bin/bossac && cp BOSSA-1.7.0/bin/* /usr/local/bin/

    WORKDIR /usr/src/app

    COPY package.json ./

    RUN npm set progress=false && npm config set depth 0

    # install npm production dependencies

    RUN npm install --only=production && npm cache verify

    # copy production node_modules aside

    RUN cp -R node_modules prod_node_modules

    # install npm development dependencies

    # making sure to clean up the artifacts it creates in order to reduce the image size.

    RUN npm install --development && npm cache verify && rm -rf /tmp/*

    # build app for production

    COPY . ./

    ENV NODE_ENV=production

    RUN npm run build

    # *** Production Container *** ------------------------------------------------

    # FROM node:10.15-alpine

    # FROM balenalib/%%BALENA_MACHINE_NAME%%-alpine

    FROM balenalib/raspberrypi3-alpine-node:12-run as release

    RUN apk --no-cache add alsa-lib

    WORKDIR /usr/app

    COPY package.json ./

    # copy pre-compiled production node_modules

    COPY --from=build /usr/src/app/prod_node_modules ./node_modules

    # COPY --from=build /usr/src/app/node_modules/epoll node_modules/epoll

    # COPY --from=build /usr/src/app/node_modules/@serialport node_modules/@serialport

    COPY --from=build /usr/src/app/config config

    COPY --from=build /usr/src/app/dist/src dist/src

    COPY --from=build /usr/src/app/firmware firmware

    COPY --from=build /usr/local/bin/bossac firmware/_arm/bossac

    RUN chmod -R 755 /usr/app/firmware/_arm

    COPY udev_pause.sh .

    RUN chmod 755 udev_pause.sh

    COPY udev.rules /etc/udev/rules.d/udev.rules

    # setup environment

    ENV UDEV=1

    ENV NODE_ENV=production

    EXPOSE 80

    CMD npm start

Hi,
Can you clarify whether you use git push or the balena-cli’s balena push command to create a new release?
I would suggest you to try creating a new release with caching disabled, just so we can be sure that all layers of the image are in sync.
If you use git push you can do that with git push balena master:balena-nocache, while if you are using the balena-cli you can do that with balena push Myapp --nocache.
Let us know about the result.

Kind regards,
Thodoris

We use balena push . over the last few days I have tried it all these ways

  • Brand new project CloudCue-Dev
  • balean push CloudCue --nocache
  • Local mode build on my development pi.

all three failed in composition, yet the individual containers build and work

Could you also please clarify what you mean with

yesterday there were missing images while building resulting in failed builds

?

Yes, there was another strange error where the intermediate containers were not found like a 404 error. I didn’t copy the error.

I saw another post on it so I assumed there were Buildserver issues so I waited a day and It seems that problem cleared up yesterday.

This sounds weird.
Can you clarify whether it still fails in case your docker-compose only contains one or two of the images?

Yea, tell me about it. I have been racking my brain on this for days.
I will compose it with one [the base] image let you know the result.

Thodoris, same error composing with one container.

See: https://dashboard.balena-cloud.com/apps/1499763/releases/1425807

Hi Could you share the cli version you’re using? v12 of the cli had some significant changes on how files are included see: https://github.com/balena-io/balena-cli/wiki/CLI-v12-Release-Notes#breaking-changes

12.1.1

I read the readme with the .gitignore concerns and my projects have all the proper ,dockerignore files , doesn’t explain the fact that the container builds as expected as a non composed project see
https://dashboard.balena-cloud.com/apps/1499763/releases/1425481

I think I’m close to the cause …

I added the -gitignore flag
balena push CloudCue -gitignore

and the build worked as expected. The weird part is the .gitignore and .dockerignore files seem fine to me

Any Ideas -or- bug in latest cli?

.gitignore

# standard 
node_modules

# project specific
dist
storage

.dockerignore

# standard 
.vscode
.git
.gitignore
node_modules
Dockerfile

# project specific
dist
storage
README.md
LICENSE

Ok, there seems to be a bug with CLI v12.1.1

It’s seems like it’s not honoring the .dockerignore files in the subdirectories when packing up the source for the the build

When pushing the build on this version I noticed the push packaging went from sub minute to over a few minutes. ( I assume the composed containers node_modules folders were being sent even though they had a .dockerignore in each sub dir excluding that data .

When I added this to the root .dockerignore

# standard
**/node_modules
**/Dockerfile
!**/Dockerfile.*
!**/docker-compose.yml

# project specific
firmware
schematics
tools/bossa
tools/kiosk
tools/wifi-connect
**/.vscode
**/dist

the packaging time decreased as expected, and build is succeeding.

I have no idea what exactly caused the original error, but definitely related to .dockerignore mishandling in latest CLI

That’s indeed how it works: only a single .dockerignore file is considered at the project root. This is documented in balena help push (or build or deploy) and on the reference page. It had always been like that (since CLI v8), but it was “obscured” by the support for .gitignore files in releases v8 to v11. It is a legitimate expectation that .dockerignore files in subdirectories would be honored, especially for multi-container applications, and there is a CLI feature request issue for this: Consider a .dockerignore file per service for docker-compose compatibility · Issue #1870 · balena-io/balena-cli · GitHub

Maybe better documentation on the transition to v12 pointing out specifically only one file is honored and its non-standard to Docker’s documentation on separate .dockerignore files at the same level as container

or maybe even not using dockers filename better off using .balenaignore for multi container projects since that document type has no meaning for docker-compose

Thanks for the quick response, love your software

Thanks for the feedback! I have just updated the v12 release notes to highlight this issue. I will also implement runtime warnings if .dockerignore files are detected in subdirectories, and then I’ll implement docker-compose compatibility as per issue 1870.

Re .balenaignore, this was discussed extensively in issue 1032 and there are good arguments for it, but a decision was eventually made to avoid introducing new formats / solutions. I think / hope that implementing issue 1870 and achieving docker-compose compatibility will finally settle this matter. :slight_smile:

Hello! Just to let you know that this has been closed and the feature is implemented in CLI v12.2.0 for the push / build / deploy commands.