Balena RAID Arrays Support and Best Practices

Hi,

Is there any documentation on Balena’s side about RAID arrays? We have been following these guidelines to create software arrays and it has been working great until now, but that is working now does not mean that is being done the best possible way.

Regards,

Hello Eugenio,
Docs lead here. I am not aware of any documentation or reference guides we have about setting up software RAID. However, I would love to know more about your usecase with it and if it is an opportunity for something we can support or write more about?

Hi @vipulgupta2048,

Thanks for replying, our use case is a very simple one, we have a VMS built which usually records live video. After some time a requirement came about using RAID, which is very doable via software in Ubuntu.

We ended up enabling this in a Balena Docker container by following the instructions on the article that you see above.

Main actions to setup a RAID Array via software in a container are:

  • Installing mdadm
  • Saving mdadm.conf in a volume that points to /etc/mdadm/mdadm.conf
  • Have an automated setup process to create/assemble the RAID array which is very well explained in the document above as well.

The only step that is missing for us is the following:

  • Run update-initramfs -u
  • Save the mount point in /etc/fstab

Both of them we are not sure how to do them from a container. Can you help us with that?

Hi,

balenaOS should have all the necessary kernel bits for running a MD array. Unfortunately there is currently no support in the host OS for the on-boot automation you are looking for and the Ubuntu’s approach is not directly applicable to balenaOS as is - mdadm.conf, fstab and the initramdisk are all read-only on balenaOS so you can not update them with the configuration you need.

MD arrays can still be used though, I see no technical obstacles, you just need to implement the array assembly, start and mount into your container’s startup script. A very simple version of this would be just running mdadm --assemble --scan and then mount /dev/mdXXX /somewhere. Some more logic will likely be necessary, since I would expect the array to auto-assemble on boot but stay in read-only mode by default. That said it will not be as straightforward as with Ubuntu but if you are willing to experiment a bit we would be happy to assist and find out how to make this work.

Hi @mtoman ,

One of the reasons I ask it’s because on one of our previous products we were able to get it to mount on boot… It was on Balena v2.58 though…

Right now we are doing exactly what you mention, we assemble the array and mount. The only issue for us is that the array after it gets created as /dev/md0 after a reboot is recognized by the device as /dev/md127 and we see this when the device boots, some error regarding an env var substitution…

The odd thing is that in our previous devices this is fine, the array is assembled automatically as /dev/md0 and it only requires mounting… On this new devices we now have to stop the RAID array using mdadm --stop /dev/md* and then mdadm --assemble --scan and then mount.

And what’s concerning for us is if this is gonna bring any issues in the future and also we are concerned about why is different…

Could you point me out to where the initramdisk files are in Balena and what commands would be used to update them, even if they are read-only?

Thanks in advance, I’ll post here anything new that I discover.

Oh also, I’m happy to enable support access for a device that has the RAID array set properly and that does not requires --assemble to be ran manually…

Hello, I’m reposting something mtoman sent yesterday, but for some reason did not appear here:

Hi,

thanks for the details. The substitution errors you are getting from udev are harmless, they come from rules that we use to identify balenaOS partitions internally and while they might fail for other devices, they won’t really touch them in any way. Could you confirm my understanding that you have balenaOS running on a dedicated “OS drive” and the RAID is on a set of dedicated drives for data only?

The most likely thing that changed the behavior is that since balenaOS v2.80.12 we enabled RAID auto-assembly in the kernel and started shipping mdadm as a part of balenaOS. We did this because we started experimenting with balenaOS booting off a MD RAID1 array. It is in fact surprising to me that the auto-assembly was “just working” before.

As for the rename/renumbering - I suspect the ambiguity comes from different auto-assembly strategies in the kernel and in mdadm but in general I would not recommend relying on the numbers much - it is similar to expecting that e.g. sda will point to the same drive every time.

I would like to understand whether having the array on md0 is of any particular importance to you or whether the only issue is with the mount command. If the mount command is the only concern, I can suggest mounting by filesystem UUID - you can use e.g. mount UUID=118ca7e1-dbe5-46db-bc01-34e6cb5723fd /somewhere instead of mount /dev/md0 /somewhere in your script to avoid specifying the device altogether, since the filesystem is usually what you are interested in anyway. A similar thing can be achieved with filesystem labels instead of UUID.

If the md0 name is important for any other reason, you can use the mdadm --stop --scan and mdadm --assemble --scan trick as you are doing now, it would pick up the array name from the metadata and likely end up as md0 if it was created as such. Another way that comes to mind is creating a custom udev rule that would rename the device. The most correct way for mdadm configuration would indeed be with mdadm.conf but there is no support for this in balenaOS at this moment - the initramdisk is generated at OS build time and bundled with the kernel into a single file (e.g. /mnt/boot/Image.gz on x86 device types). While it is theoretically possible to unbundle it, inject mdadm.conf and then pit it back together, it feels an order of magnitude more complicated than any of the suggestions outlined above. And an OS update would overwrite the changes anyway.

Do you think any of the suggestions might work for you?

Hi @rosswesleyporter ,

Nice to know that the UDEV rules failures are not important.

As far as our use case, we do have a dedicated SSD for the OS and multiple dedicated HDDs for data storage.

We arrived to same conclusion as you, that the 2.83.X version by including mdadm was conflicting with what we are doing right now, that is managing the RAID array from one of our services.

As far as the relying on the path, we do stuff like HDD encryption that relies as far as I know passing a path to the encryption command, so that would be the only reason to use the /dev/md0 path. We will take a look at our process to see if UUIDs or labels are usable for us.

As far as the solution, internally we discussed and right now we have the same opinion as you, we are gonna stick with the stop/assemble process, because either ways, we do not feel confortable with the RAID array being assembled by the OS automatically without relaying in any metadata or any persistent saved configuration based on the creation of the array.

As far as modifying the OS, we also are not fans of doing that unless is extremely necessary, but thanks.

Now, as for the reason that we are having to modify our solution:

  • Can Balena add a label that allows to import that mdadm.conf into the containers to be written and have a single file for this?
  • Can Balena add a way of disabling mdadm in the HostOS via a Device Configuration variable like the firewall or delta updates one?

We do feel like this is something that users should have the option to enable/disable this introduced OS functionality and not have to be working around it. Is this something that can be or will be introduced?

Hi Eugenio,

we do stuff like HDD encryption that relies as far as I know passing a path to the encryption command

You should be able to convert the UUID to a path inside of your service using, e.g.:

$ blkid --uuid "[UUID]"

As far as your other questions go, I’m doing some experiments to see what solutions are available to us here.

Hello again,

Apologies for the lack of followup. I wrote the below message on Friday and it showed as posted on my end, but it didn’t seem to make its way to the forum.

I did some experiments with my own balenaOS device using v2.89.15. I was able to create a RAID array, and upon rebooting, the array was already assembled at /dev/md127 with the correct capacity. Inside of a container, I was able to mount this array without stopping and reassembling it.

blkid, however, does not work inside the container without UDEV populating the links at /dev/disks/*, so I used a balenalib base image, enabled the UDEV=1 variable, and ran the container privileged. This ensures that udev runs inside the container, creating the appropriate links.

Dockerfile

FROM balenalib/%%BALENA_ARCH%%-alpine

RUN apk add --update blkid # We want the full-fledged version that can resolve disks by label

COPY entry.sh .
CMD [ "./entry.sh" ]

entry.sh

#!/bin/sh
STORAGE_LABEL="STORAGE"
STORAGE_MNT_DIR="/mnt/storage"
STORAGE_DEV=$(blkid --label ${STORAGE_LABEL})
mkdir -p ${STORAGE_MNT_DIR}
mount ${STORAGE_DEV} ${STORAGE_MNT_DIR}

sleep infinity

docker-compose.yml

version: '2'
services:
  app:
    build: ./
    privileged: true

After pushing this to my device in local mode, I can see the array is mounted.

/ # mount
overlay on / type overlay <snip>
/dev/md127 on /mnt/storage type ext4 (rw,relatime,stripe=256)

As to your current workflow:

we do not feel confortable with the RAID array being assembled by the OS automatically without relaying in any metadata or any persistent saved configuration based on the creation of the array

Can you elaborate on this? My understanding is that the configuration file is not necessary, the metadata for the array is stored on the disks that are part of that array, and the configuration file’s primary use is to keep track of arrays and member disks without inspecting the individual disks using e.g. mdadm --examine. Indeed, looking at the metadata for my member disks shows that they have the same metadata version and UUID, the latter of which is used to assemble the array automatically:

root@31bea91:~# mdadm --examine /dev/vdb
/dev/vdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 939bb66c:1699bb32:0be35c30:a3470102
           Name : 31bea91:0  (local to host 31bea91)
  Creation Time : Fri Apr 29 22:01:00 2022
     Raid Level : raid0
   Raid Devices : 2

 Avail Dev Size : 8378368 (4.00 GiB 4.29 GB)
    Data Offset : 10240 sectors
   Super Offset : 8 sectors
   Unused Space : before=10160 sectors, after=0 sectors
          State : clean
    Device UUID : cf487592:7b489cf6:cc8a45e2:d9f33240

    Update Time : Fri Apr 29 22:01:00 2022
  Bad Block Log : 512 entries available at offset 8 sectors
       Checksum : 8b7aedd6 - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
root@31bea91:~# mdadm --examine /dev/vdc
/dev/vdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 939bb66c:1699bb32:0be35c30:a3470102
           Name : 31bea91:0  (local to host 31bea91)
  Creation Time : Fri Apr 29 22:01:00 2022
     Raid Level : raid0
   Raid Devices : 2

 Avail Dev Size : 8378368 (4.00 GiB 4.29 GB)
    Data Offset : 10240 sectors
   Super Offset : 8 sectors
   Unused Space : before=10160 sectors, after=0 sectors
          State : clean
    Device UUID : d1b17c2b:8046e46d:f2fba814:fa05058e

    Update Time : Fri Apr 29 22:01:00 2022
  Bad Block Log : 512 entries available at offset 8 sectors
       Checksum : 1bffd824 - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

Hi @eeb,

I just want to make sure you saw that jakogut sent a bit more information (above). In case it is useful to you.

Hi @rosswesleyporter and @jakogut ,

Sorry for the late reply, we have been the last week and a half adjusting our solution to make it ready for production.

The blkid approach is solution, but still, it does not answer a number of questions or concerns we had:

  • The /dev/md127 auto assemble of the array, even if the metadata stored (/etc/mdadm/mdadm.conf) is pointing to /dev/md0 it was our main concern. We honestly prefer to be explicit in our solution that to rely on what mdadm puts together automatically because it does not know what to do… After this situation we definitely prefer stop/assemble the array compared to rely on a path generated by the OS and mdadm not based in the configuration the array is supposed to have.
  • My questions about configurability of features like this were not answered… The reason I asked them it’s because the introduction of a feature that could potentially affect solutions in the field should be something that developers should decide to use or not, mostly if it’s a solution that was not even documented… I pasted the questions below for reference.
    • Can Balena add a label that allows to import that mdadm.conf into the containers to be written and have a single file for this?
    • Can Balena add a way of disabling mdadm in the HostOS via a Device Configuration variable like the firewall or delta updates one?

Regards,

No worries about the reply :slight_smile:

We honestly prefer to be explicit in our solution that to rely on what mdadm puts together automatically because it does not know what to do…

Can you elaborate on this? Regardless of whether the array is assembled automatically by the kernel, or using mdadm to manually scan for arrays, the metadata is read from the members of the array in order to assemble it. It’s the same process, mdadm.conf simply allows for keeping a record of the desired arrays, and overriding defaults, which is to automatically assemble available arrays from the on-disk metadata.

Can Balena add a label that allows to import that mdadm.conf into the containers to be written and have a single file for this?

It may be possible to bind mount this configuration to an overlay, which could allow user applications to configure arrays on boot, but this remains to be tested. I suspect, after reviewing the merge that added this feature for all balenaOS devices, that modifying this config on the host OS will not have any effect, as mdadm isn’t being used to assemble arrays there. Rather, the kernel is finding and assembling them at boot without any help from userspace. However, this can also be done inside a container by stopping and reassembling an array with your configuration, as you’re doing now. Is there a reason this approach is currently deficient?

Can Balena add a way of disabling mdadm in the HostOS via a Device Configuration variable like the firewall or delta updates one?

I’m not 100% on this one, but I believe mdadm is currently unused in the host OS, as it’s simply a userspace utility for managing and monitoring software RAID devices, and the existing functionality is all kernel side. Even if we removed it, arrays would still be assembled on boot, as CONFIG_MD_AUTODETECT is enabled. This was also enabled in v2.58.6, as seen below.

root@balena:~# cat /etc/os-release
ID="balena-os"
NAME="balenaOS"
VERSION="2.58.6+rev1"
VERSION_ID="2.58.6+rev1"
PRETTY_NAME="balenaOS 2.58.6+rev1"
MACHINE="genericx86-64-ext"
VARIANT="Development"
VARIANT_ID="dev"
META_BALENA_VERSION="2.58.6"
RESIN_BOARD_REV="54c6ad9"
META_RESIN_REV="ef55525"
SLUG="genericx86-64-ext"
root@balena:~# uname -a
Linux balena 5.2.10-yocto-standard #1 SMP Fri Oct 23 09:05:55 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@balena:~# zcat /proc/config.gz | grep _MD
CONFIG_TCP_MD5SIG=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=y
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID10=y
CONFIG_MD_RAID456=y

This explains why this array was assembled in this previous OS release as well. The above referenced merge simply enables this for all kernels and device types, even if their kernel was previously configured to have this not enabled. For the generic x86_64 image, this PR has no effect, as these configs were already enabled, and mdadm is unused. As far as mdadm goes, the presence of this utility simply allows for managing and monitoring arrays directly from the host OS shell, though this could also be done from a container, using, e.g.:

balena run --privileged alpine /bin/sh -c "apk add --update mdadm && mdadm ..."

The change from assembling the array at /dev/md0 to /dev/md127 was likely the result of an upstream kernel change, and is not unexpected, as these device names are not static. The upstream recommended way of finding the appropriate device is to use the partition or filesystem label or UUID.

Let me know what you think, and if this helps clarify anything.