Builder hangs with "Still working ..."

When building services using the Balena builder the builder takes ages to complete. It sometimes continually prints Still working ... for 20 minutes and then could fail saying the remote end has hung.

This occurs during an npm install command therefore I assume this is not something that is a fault of Belena. However I cannot recreate this problem in any other environment and as it happens only intermittently its difficult to debug. I have tried adding verbose logging and cannot see any report of errors during.

Any help would be much apreciated

Hi
The type of infrastructure you are building on depends on the type of device you are building for so for us to asses what problem you might be running into, we need to know what type of device are you building for.
If you encounter this problem again it might also be helpful to attach the logs of the build process and provide us with timestamps that allow us to trace your build in the builders logs.
Regards
Thomas

Hi Thomas,

Thanks for your quick response!

We are building for Raspberry Pi 3 and doing the push from CircleCI. Currently we still use the git.resin endpoint to push code to but I assume this would not make a difference. Here are some more details about one of the builds that had an issue.

id 792602
created_at 2019-02-14T17:23:53.235Z
status success
source cloud
start_timestamp 2019-02-14T17:23:53.223Z
end_timestamp 2019-02-14T17:47:42.783Z
update_timestamp 2019-02-14T17:47:42.789Z

As you can see it took over 20 minutes to complete the build for all services where normally they are complete in just a couple of minutes. The reason for the long build time is due to the time taken for the interface service to run npm install, I have included the full logs for the interface service build.

testing_release-log_interface-5a1953c.log (2.4 KB)

Looking at the logs there is an issue relating to the tarball data seems to be corrupted which is an issue on github where someone has suggested that this is a proxy related problem. There is another post on askubuntu which explains how to reset proxy settings however I am not sure how this would work in the builder environment.

Does the balena builder proxies all requests in order to allow them access to the internet in a controlled manor? As such could adjusting the proxy settings result in preventing all outward commands to fail?

Any thoughts or suggestions would be much apreciated.

Thanks,
Henry

The time of this degradation in service corresponds roughly to the time when we were having issues with our ARM native builder server, which was getting overloaded. This could cause long build times. We then performed an upgrade / modified some things on Feb 18 (scroll down on our status page), and during that time, images were being redirected to get built on servers which run emulated builds (these take much longer).

Did the issue clear up for you, or is this still occurring today?

Hi @dt-rush thanks for your response, I am still experiencing the issue at the moment. Here are the most recent details of a build that took over 20 minutes.

id 798760
status success
start_timestamp 2019-02-20T15:40:55.524Z
end_timestamp 2019-02-20T16:02:46.679Z

Hi @hpgmiskin,

Thanks for letting us know it’s still an issue. Does it always appear to be erroring on the @hackscience/hub-interface package, or does this vary?

I’ve just carried out a quick test, and the Arm builders are operating in a timely fashion as I would expect, so I suspect there’s either a particular package the builders are having issues with, or possibly a combination. Could you possibly publish the dependency section of your package.json so we could try and emulate what you’re seeing?

Additionally, I don’t see it in the logs, but do you happen to know which Arm builder was being used? That would also help us (it should tell you in the first few [Info] lines when a build starts).

Best regards, Heds

Hi @hpgmiskin,

I just wanted to check if you found the time to maybe post the dependency section of your package.json and the builder server used, as Heds suggested? Thanks!

Sorry for the delay in response. I am not able to share the dependency section because we install private packages from NPM therefore it would not allow replication of the issue.

I have been experimenting with using yarn to install packages which has not given an issue as of yet but I would like to allow some time to pass in order to get some more confidence. Here is a snippet from the Dockerfile which we use to install yarn.

# Install HTTPS method drivers
RUN apt-get update -qq \
    && apt-get install -y -qq dbus apt-transport-https ca-certificates

# Install yarn to solve extract issue https://github.com/npm/npm/issues/14059
RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add - \
    && echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list \
    && apt-get update -qq \
    && apt-get install -y -qq yarn

Thanks for your help. For the moment please accept this issue as stale and I will update when I get more information.