Live Migration of Containers using Resin

Hi,
I’m planning to do a live migration of docker containers from one resin device to other. Is there a provision in resin OS to do so? Or is it possible? Just to be clear, here is the link which explains the check point and restore method used to live migrate the containers. Appreciate the replies! Thanks.

Hi @roop, for context, are you using the managed service (on resin.io), or you are running the open source resinOS yourself?

Hi @imrehg, I’m using the managed service on resin.io to control and monitor my devices.
Is there any available method to implement live migration using resin? Thanks!

Hi, can you also give a little bit or more context as to what you are trying to achieve and why?
Also, what docker version would work with CRIU?

Hi @floion, I’m having a web service deployed into an embedded device (Pi 3) for an edge computing project through resin.io. I’m trying to live migrate the container to another embedded device (Pi 3) so that if the 1st device goes down I can give continued service. For now, the web service is just displaying the number of seconds the service is up. Eg., if the displayed seconds count is 29 when device A goes down, then after this live migration the device B should start counting from 29. CRIU is supported from Docker version 1.13. But I’m using version 17.03. Here is the link for CRIU.

P.S. Apologies for late reply.

Hey @roop, no worries at all! Your example looks interesting. I would guess it’s pretty far from the usual resin use case at the moment, because what we’ve seen so far, the devices are not really interchangeable (ie. the task has to run on specific device), while your’s is indeed more hosting-farm style. Which makes it interesting. :slight_smile:

From the link to CRIU, it uses Docker’s experimental checkpoint feature, so by default it won’t work on current resin.io devices. If you’d like to try, I’d recommend maybe getting a -dev release of the latest resinOS on a device, modify the docker startup script to enable this feature, put it into Local Mode, log into the host, and try it out, how does it work. Should have access to everything that is available.

A question about the use case, though. You say “device A goes down, then it’s migrated to device B” (paraphrasing). When would that migration take place if the device goes down? (which I’m guessing means become unreachable). Trying to understand how this would work in your expectation.

Hey @imrehg, Thanks for the reply! At the moment, I’m planning to implement using the Master-Agents concept, meaning, a master node will keep track or be in sync with all the agent nodes’ status and as an agent goes down, the master will sync the state to a free node. By this way, high availability service will be provided. I’m still researching on implementing this idea. I’m not really sure if this will work, hence this post. Would love to get some inputs from you guys! Thanks again