Hi,
This is the second time we “bricked” a device while switching it to local mode. I have yet to make a fully replicable process but here are the events that we can recall:
- Have an application running on the device, with an active update lock on
/tmp/balena/updates.lock
- On the balenaCloud dashboard, click “Enable local mode” for the device.
- Push to the device, build goes smoothly, but starting the services gets stuck on
[Live] Waiting for device state to settle...
- Disable local mode, device doesn’t react.
- Put it back in local mode, still the same issue.
- Get frustrated, investigate
In order to unbrick our device’s supervisor, we connected through SSH (balena ssh <uuid.local>
).
balena ps -a
showed the following results:
root@uuid:~# balena ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d5cf185f8751 fa7ca3bc2ac4 "bash start.sh 'expo…" 2 hours ago Up 2 minutes browser_3508250_1764659
c2af3dc495f8 7fa68b5dac67 "/usr/bin/entry.sh /…" 2 hours ago Exited (137) 29 minutes ago frontend_3508249_1764659
58bbf1ba40c2 10cdf088c9b2 "/usr/bin/entry.sh n…" 2 hours ago Up 2 minutes 0.0.0.0:8080->8080/tcp, 0.0.0.0:9229->9229/tcp app_3508248_1764659
1525d2f5e06c registry2.balena-cloud.com/v2/ee8a630b4962f1e2b4ad682dd9468f7a:latest "/usr/src/app/entry.…" 12 days ago Up About a minute (health: starting) resin_supervisor
Uh oh. Supervisor seems to be restarting. Let’s check the logs (trimmed for readability)
root@79c84ce:~# balena logs resin_supervisor
[info] Supervisor v12.4.6 starting up...
[info] Setting host to discoverable
[warn] Invalid firewall mode: . Reverting to state: off
[info] 🔥 Applying firewall mode: off
[debug] Starting logging infrastructure
[info] Starting firewall
[warn] Ignoring unsupported or unknown compose fields: stdinOpen, envFile
[debug] Performing database cleanup for container log timestamps
[success] 🔥 Firewall mode applied
[debug] Starting api binder
(node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
[info] API Binder bound to: https://api.balena-cloud.com/v6/
[event] Event: Supervisor start {}
[debug] Spawning journald with: chroot /mnt/root journalctl -a -S 2021-04-14 19:07:31 -o json CONTAINER_ID_FULL=d5cf185f87516c052104e587e4da28654b0b9d5912cd9e07416a5cbc11c4e0d1
[debug] Spawning journald with: chroot /mnt/root journalctl -a -S 2021-04-14 19:07:37 -o json CONTAINER_ID_FULL=58bbf1ba40c2db1e4528b73b4c46502e98b80b48ba9f86caab359dbdc1379653
[debug] Connectivity check enabled: true
[debug] Starting periodic check for IP addresses
[info] Reporting initial state, supervisor version and API info
[debug] Skipping preloading
[debug] VPN status path exists.
[info] VPN connection is active.
[info] Waiting for connectivity...
[info] Starting API server
[info] Supervisor API successfully started on port 48484
[info] Applying target state
[debug] Ensuring device is provisioned
[error] Scheduling another update attempt in 1000ms due to failure: Error: (HTTP code 400) unexpected - 2 matches found based on name: network 1_default is ambiguous
[error] at /usr/src/app/dist/app.js:10:2303379
[error] at /usr/src/app/dist/app.js:10:2303311
[error] at Modem.buildPayload (/usr/src/app/dist/app.js:10:2303331)
[error] at IncomingMessage.<anonymous> (/usr/src/app/dist/app.js:10:2302584)
[error] at IncomingMessage.emit (events.js:322:22)
[error] at endReadableNT (_stream_readable.js:1187:12)
[error] at processTicksAndRejections (internal/process/task_queues.js:84:21)
[error] Device state apply error Error: (HTTP code 400) unexpected - 2 matches found based on name: network 1_default is ambiguous
[error] at /usr/src/app/dist/app.js:10:2303379
[error] at /usr/src/app/dist/app.js:10:2303311
[error] at Modem.buildPayload (/usr/src/app/dist/app.js:10:2303331)
[error] at IncomingMessage.<anonymous> (/usr/src/app/dist/app.js:10:2302584)
[error] at IncomingMessage.emit (events.js:322:22)
[error] at endReadableNT (_stream_readable.js:1187:12)
[error] at processTicksAndRejections (internal/process/task_queues.js:84:21)
[...]
Looks like there is two networks with an identical name:
root@79c84ce:~# balena network ls
NETWORK ID NAME DRIVER SCOPE
4f4a4c1f23ca 1_default bridge local
128b8c05392d 1_default bridge local
4c8184636574 1790024_default bridge local
5374b2c3295f bridge bridge local
f1991144dff9 host host local
e9f4bdaa0b25 none null local
2f97014f4a38 supervisor0 bridge local
In order to restore the device to a usable state, we decided to remove one of the duplicate network:
root@79c84ce:~# balena stop browser_3508250_1764659 frontend_3508249_1764659 app_3508248_1764659
browser_3508250_1764659
frontend_3508249_1764659
app_3508248_1764659
root@79c84ce:~# balena rm browser_3508250_1764659 frontend_3508249_1764659 app_3508248_1764659
browser_3508250_1764659
frontend_3508249_1764659
app_3508248_1764659
root@79c84ce:~# balena network ls
NETWORK ID NAME DRIVER SCOPE
4f4a4c1f23ca 1_default bridge local
128b8c05392d 1_default bridge local
4c8184636574 1790024_default bridge local
5374b2c3295f bridge bridge local
f1991144dff9 host host local
e9f4bdaa0b25 none null local
2f97014f4a38 supervisor0 bridge local
root@79c84ce:~# balena network rm 4f4a4c1f23ca
4f4a4c1f23ca
root@79c84ce:~# balena ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1525d2f5e06c registry2.balena-cloud.com/v2/ee8a630b4962f1e2b4ad682dd9468f7a:latest "/usr/src/app/entry.…" 12 days ago Up 3 minutes (health: starting) resin_supervisor
After removing the network, waiting a few minutes for the supervisor’s next try, everything went back to normal and we could put the device in local mode and push to it again.