Is it possible to filter files to be tar'ed during balena push?

Right now, it looks like balena push tars the entire source directory before pushing content out to Balena servers.

I have several gigabytes of dependencies and build products locally I’d rather not have to clean and rebuild every time I balena push. They’re sometimes nested deeply in my project, so I can’t easily separate source code from build products. This adds about 10-15 minutes of needless packaging and uploading time when I balena push.

It looks like there’s an issue that’s been open for some time related to this. Should I wait for that to be implemented, or is there some other solution I’m missing? Thanks!

Hi,

We support .dockerignore and .gitignore. So both these files are respected when the tar stream is generated. Does this help with the problem, since this option is also mentioned in the referenced issue?

Interesting – I’m using balena --version 11.21.0, but my bundling time goes from 15 to 3 minutes when I delete some gitignored directories (npm_modules and catkin build products).

I added the following project-root-level .dockerignore and still experienced 15-minute bundling times:

**/.git

**/node_modules
**/npm-debug

**/catkin_ws/build
**/catkin_ws/devel
**/catkin_ws/logs

/data

When I delete these directories manually (save .git) I see 3 minute bundling times. My docker-compose.yaml, at the same level as .dockerignore, looks like:

version: '2'
volumes:
  resin-data: {}

services:
  devices:
    build: 
      context: ./
      dockerfile: ./App1/Dockerfile
    privileged: true
    restart: always
    network_mode: host
    volumes:
      - resin-data:/data:rw
    labels:
      io.resin.features.kernel-modules: '1'
      io.resin.features.supervisor-api: '1'
      io.resin.features.resin-api: '1'
      io.resin.features.dbus: '1'

  robot:
    build: 
      context: ./
      dockerfile: ./App2/Dockerfile
    privileged: true
    restart: always
    network_mode: host
    depends_on:
      - App1
    volumes:
      - resin-data:/data:rw
    labels:
      io.resin.features.supervisor-api: '1'
      io.resin.features.resin-api: '1'
      io.resin.features.dbus: '1'

So I expect .dockerignore to be in the same build context for both services. Notice anything strange?

the .dockerfile should be in the same directory that you’re running balena push ... from - assuming you’re doing this, then I don’t see anything obviously bad. Do you see it uploading the files in the logs straight after invoking the push command?

I’m using DEBUG=1 balena push BalenaApp --detached --source $APP_WS but I don’t see anything apart from

[debug] original argv0="node" argv=[/usr/bin/node,/home/nick/.npm-global/bin/balena,push,BalenaApp,--detached,--source,/home/nick/app] length=7
[debug] Using /home/nick/app as build source
\ Packaging the project source...

for the entire bundling process.

Hi @nckswt.

Just in case @richbayliss’ comment wasn’t clear, you need to place a .dockerignore file in the root of your application source. This is a known issue with the balena CLI. It’s conventional (especially if you’re used to docker-compose) that you would place a .dockerignore file in each service directory, however that doesn’t work at the moment. I usually do something like cat */.dockerignore > .dockerignore in my app directory whenever I change a service’s ignore file.

Another option, if you’re finding that the upload step is taking a long time is to use the balena build/belena deploy workflow to build the containers on your local machine and deploy them to the application.

Thanks,
James.

Yup – that’s what I had done in the previous message (.dockerignore at the application root, right next to docker-compose.yaml).

Just to double-check, I cp .dockerignore into each service root, but still did not see a speedup in the upload step.

I don’t think balena build or balena deploy would be an option any time soon. How else might I debug why it seems that .dockerignore and .gitignore aren’t being used?

Any chance you can paste the contents of your .dockerignore file?

Yup! It’s shown above

Sorry, yes, I missed that. Can you try manually expanding the ** to the full paths? I assume this is to match files in each of your services. Boring, but worth trying.

Sadly that full paths did not make any difference :cry:

I’ll add some logging to the balena_cli node module next. Hopefully that’ll give us some clues as to what’s happening.

So I’m wondering if it just takes a long time to marshal the file list to build the archive. Can you try running time sh -c 'find . -type f > /dev/null' and let me know how long it takes?

$time sh -c 'find . -type f > /dev/null'

real	0m0.388s
user	0m0.216s
sys	0m0.170s

Additional info:

$find . -type f | grep node_module | wc -l
108237

$find . -type f | grep build | wc -l
26951

Well I guess that rules out that problem.

I’ll add some logging to the balena_cli node module next. Hopefully that’ll give us some clues as to what’s happening.

@nckswt, I’ve done just that, but so far could not reproduce the issue (assuming that the issue is that files are not being ignored, and instead being added to the tar stream pushed to balenaCloud).

I patched the CLI code to produce extra debug messages as in this diff: https://github.com/balena-io/balena-cli/compare/v11.21.0...investigate-dockerignore-v11.21.0?expand=1
In that patch, I also added a strategic “throw new Error()” to prevent the CLI from actually sending anything to balenaCloud, as I only wanted to test what files are included in the tar stream, and what files are filtered out.

I tested with CLI versions 11.21.0 and 11.25.13. My test project folder container the following test files:

$ pwd
/Users/paulo/balena/other/support/nckswt
$ find .
./push-this.txt
./node_modules
./node_modules/sub2
./node_modules/sub2/sub2.txt
./node_modules/modules.txt
./robot
./robot/node_modules
./robot/node_modules/node_modules-robot.txt
./robot/node_modules/sub3
./robot/node_modules/sub3/sub3.txt
./.dockerignore
./docker-compose.yml
./.git
./data
./data/sub1
./data/sub1/sub1.txt
./data/data.txt
./devices
./devices/node_modules
./devices/node_modules/node_modules-devices.txt

The .dockerignore and docker-compose.yml files have the same contents as the ones you shared above. The result of running the patched CLI was:

$ ./bin/balena-dev push test-rpi --detached --source ~/balena/other/support/nckswt
FileIgnorer basePath="/Users/paulo/balena/other/support/nckswt"
addIgnoreFile type=0 fullPath="/Users/paulo/balena/other/support/nckswt/.dockerignore"
FileIgnorer filter TRUE relFile=".dockerignore"
FileIgnorer filter TRUE relFile="docker-compose.yml"
FileIgnorer filter TRUE relFile="push-this.txt"
FileIgnorer filter FALSE relFile="data/data.txt"
FileIgnorer filter FALSE relFile="node_modules/modules.txt"
FileIgnorer filter FALSE relFile="data/sub1/sub1.txt"
FileIgnorer filter FALSE relFile="devices/node_modules/node_modules-devices.txt"
FileIgnorer filter FALSE relFile="node_modules/sub2/sub2.txt"
FileIgnorer filter FALSE relFile="robot/node_modules/node_modules-robot.txt"
FileIgnorer filter FALSE relFile="robot/node_modules/sub3/sub3.txt"
tarDirectory packing file '/Users/paulo/balena/other/support/nckswt/.dockerignore'
tarDirectory packing file '/Users/paulo/balena/other/support/nckswt/docker-compose.yml'
tarDirectory packing file '/Users/paulo/balena/other/support/nckswt/push-this.txt'
foo

Note the TRUE and FALSE values above, where TRUE indicates files that are included in the tar stream, and FALSE indicates files that are filtered out. It appears to behave as intended. Something you may also notice in the output above is that excluded folders and subfolders are still traversed, even if all files in them are ultimately ignored. This behaviour is related to the fact that the dockerignore file allows specifying exceptions to filter rules using the exclamation mark (docs), something like “filter out all node_modules folders, except for this one file which is several subfolders down”. As implemented, each file is matched against each rule. I wonder if this alone could explain the time difference when you delete excluded folders; it’s what James hinted at, but I would expect not, given your find test results.

I wonder how to more closely reproduce the issue. I’ve created a zip file of the very simple project structure above, which perhaps you could try using / playing with as well – especially if you’re also able to add debug messages: https://drive.google.com/file/d/1bD5gPGcIapa3KDlzZI00pkhmUHIAauDV/view?usp=sharing

Otherwise, maybe you could try sending us your own small zip file with steps to reproduce the issue, including details about your operating system (which version of Linux?). Thank you for helping with the investigation.