QEMU devices with management issues

Hi, we have a couple of devices below, with containers running, but one of them doesn’t seem to be logging anything and both are not accessible via Resin.io TTY:

apps/307727/devices/763172
apps/307727/devices/519904

They are both on Resin OS 2.3.0+rev1 (prod) sup: 6.1.3.

Is it possible to check from your end what’s happening in the host OS?

Both devices are running on QEMU network bridges and the iptables config is container default.

– ab1

Hello,
the first device appears offline. Is it offline on purpose?

Are those devices in production? Is it ok if we try to access the device?

They are live, but the user must have taken it offline.

Perhaps you could take a look at TTY access on apps/307727/devices/519904?

I see that the session is disconnected as soon as the terminal tries to start. I wonder if there’s something in that network blocking the access to the web terminal.

The container has full access to the network as far as I can see, but no traffic is going across resin-vpn interface.

OK, I’ve managed to fix one of the devices (it was the firewall).

The remaining one /apps/307727/devices/763172/summary seems to have a problem with the supervisor not running (nothing listening on port 48484):

# ifconfig docker0
docker0   Link encap:Ethernet  HWaddr 02:42:cd:e0:aa:0c
          inet addr:10.114.101.1  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

# ping 10.114.101.1 -c 1
PING 10.114.101.1 (10.114.101.1) 56(84) bytes of data.
64 bytes from 10.114.101.1: icmp_seq=1 ttl=64 time=0.134 ms
...

# telnet 10.114.101.1 48484
Trying 10.114.101.1...
telnet: Unable to connect to remote host: Connection refused

# telnet 127.0.0.1 48484
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused

firewall rules seem in order:

*filter
:INPUT ACCEPT [469:80190]
:FORWARD ACCEPT [17:1023]
:OUTPUT ACCEPT [450:48818]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
-A INPUT -i resin-vpn -p tcp -m tcp --dport 48484 -j ACCEPT
-A INPUT -i tun0 -p tcp -m tcp --dport 48484 -j ACCEPT
-A INPUT -i docker0 -p tcp -m tcp --dport 48484 -j ACCEPT
-A INPUT -i lo -p tcp -m tcp --dport 48484 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 48484 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER-ISOLATION -j RETURN
COMMIT

Any ideas?

we are taking a look and we’ll let you know. I may have to reach to the team for more suggestions on this matter.
Thanks,
ilias

how much RAM are you assigning to this device?

             total       used       free     shared    buffers     cached
Mem:           992        687        305         16         12        529
-/+ buffers/cache:        144        847
Swap:            0          0          0```

Not enough?

Do the other devices that you have in the same application have the same configurations?

This one /apps/307727/devices/826833/summary is more or less the same as far as I can see and is working.

Are you able to get onto the host OS on the problematic device?

Is the supervisor running?

The supervisor is not running, but it is not a supervisor-related problem. The fact that is not running is probably due to other factors but we are not sure yet. We are thinking of filesystem corruptions because we see lots of I/O errors but we are still investigating.

It is possible - I have no direct access to the device and the person who has isn’t responding. So if you can’t do anything from your end, let’s just forget about it and when (if) they get in touch with me, I’ll tell them to rebuild.