Issues with takeover script

Hi, we’re planning to migrate a few hundred devices to Balena OS. These devices are mostly Intel NUCs running Ubuntu. My plan was to use the takeover script but unfortunately I can’t get it running.

I already fixed the names for the downloaded images, as it failed with the old resinos image names. See this commit. I also disabled the OS check, as we’re using Ubuntu 20.04.

The command I’m using:

./takeover_bin --pretend --no-os-check --log-level trace -c os_config.json -v "2.98.33" -i balena-cloud-genericx86-64-ext-2.98.33.img.gz 

Whenever I run the script, it fails with the following error:

2022-08-18 13:37:19 WARN  [takeover::stage1::migrate_info] Failed to remove takeover directory: '/balena-takeover', error : Os { code: 16, kind: ResourceBusy, message: "Device or resource busy" }
2022-08-18 13:37:19 ERROR [takeover] Migrate stage 1 returned an error: An invalid state was encountered, context: Failed to find root device

These are my block devices:

NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0                       7:0    0    62M  1 loop /snap/core20/1593
loop1                       7:1    0    62M  1 loop /snap/core20/1611
loop2                       7:2    0  67,2M  1 loop /snap/lxd/21835
loop3                       7:3    0  43,6M  1 loop /snap/snapd/14978
loop4                       7:4    0  67,8M  1 loop /snap/lxd/22753
loop5                       7:5    0    47M  1 loop /snap/snapd/16292
nvme0n1                   259:0    0 232,9G  0 disk 
├─nvme0n1p1               259:1    0   1,1G  0 part /boot/efi
├─nvme0n1p2               259:2    0     2G  0 part /boot
└─nvme0n1p3               259:3    0 229,9G  0 part 
  └─ubuntu--vg-ubuntu--lv 253:0    0   100G  0 lvm  /

I added some debug logging and found out, that in fact both root_device and root_partition in file block_device_info.rs are not set. I was able to manually set nvme0n1 as the root device but root_partition will still not be set. Mounting ubuntu–vg-ubuntu–lv as /dev/root also didn’t work.

I don’t know what to try next but would like to avoid sending out hundreds of USB sticks to migrate our devices to Balena OS. Does someone have an idea how to fix this?

Hello @jonmueller welcome to the balena community.

I’ve never used the takeover script so I just pinged internally the people who can help on this.

Let us know if you test other things and the results that you get, please.

Let’s stay connected

Hey @jonmueller

We had an internal conversation about the takeover script. As you can see, the repo it’s unmaintained, and we work case by case with customers who need to takeover their fleets.

Currently we are working on a new version of this but it’s not clear when this will be public. However we are happy to support you with our best effort in the forums helping you to succeed porting your devices to balenaOS.

Happy to learn from the tests that you are doing!

Update:

I managed to successfully take over one of our NUCs.

I had to set both root_device and root_partition manually. My root_device was the SSD nvme0n1 and the root_partition was device dm-0, as this was the device used for ubuntu--vg-ubuntu--lv.

After that, I had an issue, where /balena-takeover/bin/takeover could not be found. This was due to me renaming the takeover binary to takeover_bin. After renaming it back to takeover the script worked as expected and I now see the NUC in our Balena device list.

1 Like

Hey Jon
It’s great to know you made the script work. And thanks for letting us know.
I wonder if those changes you made manually were easy to find, or if the documentation gave you enough information. Do you think it should be improved? As my colleague said, the repo is unmaintained, but if we can update it with this small changes the community will benefit from it and it will encourage others to improve it as well.
In any case, good luck with your project!

1 Like

@jonmueller congratulations! good job!

As my colleague said yesterday, is there anything that we can improve on the documentation of the takeover script? Could you please suggest us changes to enable other people to takeover their devices easily?

Thanks

It would be nice to have a general explanation of how the script works behind the scenes in the documentation. The script searches for the root device and root partition. If they can’t be found, they can be set manually by hardcoding it in the script. If I would have known this earlier, it would have saved me a few hours of debugging. Nice to have would be if one could set root device and partition via command line arguments. I will add this functionality for internal use but maybe can do a PR. Other important things to mention in the documentation are the outdated image file names (resinos → balenaos) and that the compiled takeover binary must not be renamed.

1 Like

@jonmueller please if you can do a PR that would be amazing.

On the other hand, I will share your feedback with the OS team to improve the docs (e.g. explain how the script works and more).