For our test and demo environments I’d like to be able to host some ‘virtual’(?) Balena devices in AWS. I would like to manage them through the regular Balena dashboard as much as possible. I just need a place where they’ll stick around without the need to organise a bunch of hardware.
I managed to get an instance going in VirtualBox. But AFAIK you can’t put VirtualBox on EC2 or in a Docker container. Otherwise I could make something manageable using Vagrant.
I tried to make Balena in container work under Kubernetes. But I couldn’t figure it out. It seemed like Kubernetes would not let Balena do al the mounting things and I could not find a way around that.
I found this pull request that looks perfect for me. I could just start an EC2 container with a Balena OS AMI. But the pull request looks like it has been idle for a while. And I don’t even know where to begin with that to contribute, looks like a really steep learning curve there.
Since I got it going on VirtualBox I’m going to try if I can import that somehow using AWS VM Import. But that feels like a bit of a long shot not to mention cumbersome. But if I manage to get EBS volumes for boot, state and data at the right mount points it might work right?
In conclusion; I’m kind of stuck with this. Does anyone have any practical tips on how I could get a few virtual Balena devices setup on AWS?
Small update. The AWS VM import doesn’t work. The AWS import process fails with ClientError: Unsupported GRUB configuration - Unable to determine kernel version
Hey there @ErikHH I definitely know some of my colleagues have been working on this but I’m not sure what the latest is or if their work has been written up anywhere. Let’s ping @ab77 who I think may be able to give an update and or some pointers when he gets a chance.
And after a while I have a brand new Balena AMI in my AWS account!
Then I fire up a t3a.small instance (it’s roughly the same spec as our actual devices).
And then… sort of nothing happens at all. The AWS instance says it’s running, but nothing
shows up in the Balena dashboard. And there is really no way I can get to it for any trouble shooting. The EC2 serial console doesn’t show anything.
When I try SSH on port 22222 using the preload key the first attempt I get:
kex_exchange_identification: read: Connection reset by peer
Subsequent SSH attempts are rejected with a Connection refused until I reboot the instance.
Any idea on how I can begin to troubleshoot this? Why can’t I get in with SSH?
Good to see that you made progress on running it with AWS instance. The kex_exchange_identification looks to me like a routing problem. Can you share a verbose output of ssh with ssh -v <instance-addr>?
Also, if I understand you correctly, you are able to see the device online on the dashboard. Can you try the console/terminal from the dashboard on that particular device?
OpenSSH_8.1p1, LibreSSL 2.7.3
debug1: Reading configuration data ~/.ssh/config
debug1: ~/.ssh/config line 1: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 47: Applying options for *
debug1: Connecting to 54.154.135.226 [54.154.135.226] port 22222.
debug1: Connection established.
debug1: identity file ~/fleet-key.pem type -1
debug1: identity file ~/fleet-key.pem-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.1
kex_exchange_identification: Connection closed by remote host
The main problem is that the machine does not show up in the Balena dashboard after the EC2 instance is started. There is no console or terminal for me on that end.
You must have the private key corresponding to the public key you are inserting into the config. Otherwise your image will boot up, but you wont have any access into it. You can use balena-cli to configure your image or just mount the /boot partition from the image and update config.json manually. AWS SSH key-pairs have no effect what so ever in this configuration.
The other possibility, if you are using a configured images for our application, is that your EC2 instance running balenaOS, doesn’t have Internet access. This could be because it’s sitting in a private subnet without a NAT gateway.
I would recommend you start with a configured image (download from the dashboard). If you are using an anonymous balenaOS image, you’ll need to preload it with the correct application, which actually does the configuration and joins to balenaCloud. Our preload code isn’t currently public.
It has internet connectivity. It’s a EC2 instance with public IP in the default VPC. I’ve double checked it by launching a regular AWS Linux in the same place.
I have the private key to that pair and it matches the public key I inserted in the config.
I have tried over the weekend with a pre configured image that I downloaded from the dashboard. Put my own ssh key into that. But the result in the end is exactly the same.
So at this point I don’t see this going anywhere soon. So I’m just going to call this a day. If someone has a bright idea I’ll give it another go. But I need to spend time on other aspects of our project now. We’ll just have to make do with the few real devices we have at our disposal.
The only other thing worth trying is to deploy a dev. instance of balenaOS (** ensuring SSH access is locked down in the AWS EC2 security group to your IP only**). That way you know it’s going to be listening on SSH port 2222 without username/password so you can terminal in and troubleshoot.
Hello, I was also able to create an AMI with the generate_ami.sh scirpt. The instances created from that Image show up in the balena dashboard but the don’t start the provisioning.
With every reboot of my ec2 instance, a new device is created in balena.
I am also not able, to get shell access.
In a screenshot, I took from the ec2 console, there is an error from the resin flasher service.
Hi Stefan, can you try deploying a dev. instance of balenaOS (ensuring SSH access is locked down in the AWS EC2 security group to your IP only). That way you know it’s going to be listening on SSH port 2222 without username/password so you can terminal in and troubleshoot.
It’s been ages, sorry about that, work happened.
I’ve made some progress on this. I can troubleshoot it now.
I have launched a development image (intel-nuc-2.83.18+rev1-dev-v12.10.3) that I downloaded from our Balena dashboard (with SSH locked down to my IP). I cannot get in with ssh on port 2222 it times out.
But luckily I can get in using the EC2 serial console. I’m looking at the resin-init-flasher.service since that fails to start. And here’s what I got.
root@19c323d:~# systemctl status resin-init-flasher.service
* resin-init-flasher.service - Resin init flasher service
Loaded: loaded (/lib/systemd/system/resin-init-flasher.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2022-01-25 14:39:38 UTC; 52s ago
Process: 461 ExecStart=/bin/bash /usr/bin/resin-init-flasher (code=exited, status=1/FAILURE)
Main PID: 461 (code=exited, status=1/FAILURE)
Jan 25 14:39:22 localhost bash[461]: [resin-init-flasher] INFO: balena configuration found.
Jan 25 14:39:38 19c323d bash[461]: [resin-init-flasher] INFO: Timeout while waiting for openvpn to come alive. No>
Jan 25 14:39:38 19c323d bash[461]: [resin-init-flasher] INFO: Flash internal device... will take around 5 minutes>
Jan 25 14:39:38 19c323d bash[461]: [resin-init-flasher] INFO: nvme0n1 is our install media, skip it...
Jan 25 14:39:38 19c323d bash[680]: [ERROR] resin-device-progress : Device registration not complete, provisioning>
Jan 25 14:39:38 19c323d bash[461]: [resin-init-flasher] ERROR: Failed to find any block devices in nvme0n1 sda sd>
Jan 25 14:39:38 19c323d bash[461]: [resin-init-flasher] Cleanup.
Jan 25 14:39:38 19c323d systemd[1]: resin-init-flasher.service: Main process exited, code=exited, status=1/FAILURE
Jan 25 14:39:38 19c323d systemd[1]: resin-init-flasher.service: Failed with result 'exit-code'.
Jan 25 14:39:38 19c323d systemd[1]: Failed to start Resin init flasher service.
So it seems it can’t find a block device. That’s all I get from that. But fdisk -l gives me
Disk /dev/nvme0n1: 8 GiB, 8589934592 bytes, 16777216 sectors
Disk model: Amazon Elastic Block Store
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000
Device Boot Start End Sectors Size Id Type
/dev/nvme0n1p1 * 8192 90111 81920 40M e W95 FAT16 (LBA)
/dev/nvme0n1p2 90112 3014655 2924544 1.4G 83 Linux
/dev/nvme0n1p3 3014656 3022847 8192 4M 83 Linux
/dev/nvme0n1p4 3022848 3063807 40960 20M f W95 Ext'd (LBA)
/dev/nvme0n1p5 3031040 3039231 8192 4M 83 Linux
/dev/nvme0n1p6 3047424 3063807 16384 8M 83 Linux
Disk /dev/nvme1n1: 20 GiB, 21474836480 bytes, 41943040 sectors
Disk model: Amazon Elastic Block Store
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Looks like plenty of block devices to me. I’m confused
Any ideas what other things I can check or do?
Thank you @ErikHH for creating this thread and sharing your progress. After you got in touch with our paid support, a colleague advised that the NUC and generic-x86 balenaOS images are “flasher images” that wrap the actual balenaOS image inside. (The purpose of the flasher image is to “flash” (byte copy) the inner OS image to the device’s hard disk / SSD disk.) The flasher unwrap tool - GitHub - balena-os/balena-image-flasher-unwrap: Tool for unwrapping balena-image from a balena-image-flasher - can be used to extract the inner OS image. I understand that you had success in deploying a virtual device to AWS when using the inner OS image. I am sharing this here just in case it is helpful to other users who may be considering doing the same.