ODROID bigLittle only uses 1/3 processing power


#1

On an ODROID-XU4 multi-container app, no combination of CPU governing params will allow a specific container to consume more than 50% of CPU.

recorder:
    cpu_shares: 820
    cpu_quota: 300000
    build:
      context: ./host-images/recorder
    privileged: true
    cpuset: 0-3
    restart: unless-stopped
    depends_on:
      - "config"
    network_mode: host
    mem_swappiness: 10
    volumes:
      - "resin-data:/data"

#4

With some further testing, even running a single-container setup, CPU is still limited to 50%. :woman_white_haired:


#5

Hi there,
What is the value of cpu_shares for other containers? Some more helpful info may be found here. Have you tried removing/adjusting cpu_quota?


#6

I’ve tried both. cpu_shares on the other containers is set to (1024 - 820) / num_containers.

Could this be a bug related to the XU4’s big.Little ARM architecture? On the host, /sys/devices/system/cpu/ lists all 8 CPUs, but only 4 will be active at a time. Instead of 4 CPUs, I’m seeing 4 capped at 50% of their potential clock speed.


#7

Some further testing: using this cpuset test, everything performs as expected.

It is only cpu_quota and cpu_shares that seem to be ignored.


#8

Seems like you are hitting this issue?


#9

@zubairlk Good catch. That’s exactly what’s happening. Considering it’s a two year old issue, should I consider it wontfix?


#10

@kazazes
I can’t say much about where the bug is at the moment thought. kernel or docker.
Or whether the latest kernel/docker run ok with bigLittle arm cores.

You could try running docker on Odroid - XU4’s default distribution. If it manages to run all cpu cores, then we might be able to patch balenaOS/docker to make it work.

But if that distribution (usually latest stuff as maintained by the device maker) can’t make all CPUs work with docker, we might be in a sticky situation.


#11

Docker runs fine on the default distribution. This patch seems to be the one.


#12

Hi there,
We opened an issue for tracking progress on this here
Thanks for reporting!
Cheers


#13

@kazazes

Can you please share the following for the odroid-xu4 default distribution where you found docker running fine.

uname -a
cat /etc/os-release
docker info

Is there a particular reason you think that particular patch is the root cause?

Thanks
ZubairLK


#14

@zubairlk stock ODROID bionic release, docker installed with get.docker.com script. No, I have no reason to be sure that that patch is the reason, I just knew it worked on the hardkernel kernel, and started looking through their patches. As you can see there are 8 CPUs available from within docker.

root@odroid:~# uname -a
Linux odroid 4.14.85-152 #1 SMP PREEMPT Mon Dec 3 03:00:02 -02 2018 armv7l armv7l armv7l GNU/Linux

root@odroid:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

root@odroid:~# docker info
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 18.09.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
seccomp
  Profile: default
Kernel Version: 4.14.85-152
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: armv7l
CPUs: 8
Total Memory: 1.948GiB
Name: odroid
ID: LNUC:IVMY:B2XH:XHNA:ZOLJ:2SIF:ZRZH:VTKV:EIAW:2VGB:TM5J:RAQR
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

root@odroid:~# lscpu
Architecture:        armv7l
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           2
Vendor ID:           ARM
Model:               3
Model name:          Cortex-A7
Stepping:            r0p3
CPU max MHz:         2000.0000
CPU min MHz:         200.0000
BogoMIPS:            90.00
Flags:               half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae

root@odroid:~# docker run -it ubuntu /bin/bash
    root@c40b776dfdb3:/# lscpu
    Architecture:        armv7l
    Byte Order:          Little Endian
    CPU(s):              8
    On-line CPU(s) list: 0-7
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           2
    Vendor ID:           ARM
    Model:               3
    Model name:          Cortex-A7
    Stepping:            r0p3
    CPU max MHz:         2000.0000
    CPU min MHz:         200.0000
    BogoMIPS:            90.00
    Flags:               half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae

#16

I’ve tested akuster/meta-odroid@sumo…thud and can confirm that the fix lies somewhere in there. Considering the bug doesn’t exist on @hardkernel’s even older images, the regression seems to have been introduced in meta-odroid:sumo (which was being built off of hardkernel until thud , I believe).

meta-odroid:thud is compatible with sumo , but can’t be pulled into this repo until there is some patch re-work. There are some compile-time problems with both layers trying to write the boot image, and meta-balena 's config checks are throwing warnings because of the kernel bump. I will get to this after Christmas, but would love some help.

@balena-os, what is the timeline on balena-os/meta-balena#1313? thud compatibility on the meta-balena-* side would do away with this instantly.

meta-odroid is flying ahead with 4.19 LTS support with just one dev. I have been so impressed with balena to-date on the tooling end, but in the middle of a time-critical development cycle this is really crushing us, we’ve committed to this stack happily, but I’m now sensing that balena is focused on listing as much semi-functional board compatibility as possible rather than getting true support for a subset of boards beyond Pis. I’m only realizing now that I can’t find any test cases for balena kernel images.

This may be a very simple fix, I don’t mean to come off as condescending or unappreciative, but we’ve committed months to this platform for a large scale production run and I really hope a critical malfunction of one of your more powerful supported boards would be a priority. This issue seems to have gone unnoticed through the entire development of this image. The above referenced moby/moby#26150 is two years old. Surely someone would have noticed that this board was running at a third of its processing power if there was any hardware QA happening? There are a year’s worth of one-line-fix issues in meta-resin-odroid. I’m sorry to be blunt, but these are the questions I have to ask myself as I’m evaluating balena for a 1k+ node deployment.

The product in question is extremely dependent on consistent functionality of the underlying OS, and as Linus would say, there are no userspace bugs.

I fell in love with balena. I will stick with it. I’ll switch our entire architecture to another supported board if I have to, redesign our enclosures, thermal management, etc., we pivoted our entire stack when we realized how good of a fit this would be. I just want to voice my concern that this issue has existed for so long, unnoticed, and that there is no published LTS plan (see trello, the roadmap is completely out of sync with product release by >6 months in some instances), and your minimum SLA is $16k/year. Where is the focus? Hobbyist boards or a revolutionary IaaS provider?

If balenaCloud and OS are beta products, please, please advertise them as such. People (myself included) are building products, and eventually businesses on the assumption that this will scale with them. There seems to be a lot of flux and not a lot of direction.


#17

Hi @kazazes, I would like to first apologise for the inconvenience this bug has caused and thank you for highlighting it. We always try our best to cover as much of the supported hardware on boards as possible and unfortunately have to rely heavily on the BSP layers for that. In most cases we use or make a BSP, ideally official, and our tests pass. The QA tests themselves guarantees a minimal standard of functionality, not that we push the board to its limits, at least not without community assistance.

The XU4 board has historically had been added when we rapidly expanded the number of boards supported and it was hoped that it would gain a lot of traction, but to this day, there are less than 50 in use on the platform which means that bugs like this can unfortunately go unnoticed for a long time, where as it this bug had been on the RPI3, Intel NUC or TX2 it would have been caught by the community very early on.

All that being said, we have seen a marked improvement in the XU4 support both from hardkernel itself and the BSP layer and I think this board will start to get greater usage. I have asked the OS team for a timeline on thud support, it shouldn’t be too far off. When I have a more firm time frame I will come back and add it here.

On balenaCloud and the OS, these are definitely not beta products and the have hundreds on companies deploy thousands of devices each and relying on us completely as the backbone for their infrastructure. Some of more serious and niche customers even rely on us to build and support their custom hardware or device type in which we work with them to ensure their boards have the exact hardware functionality they need for their usecase. We are in the process of defining our support release versions of the OS, so keep an eye out for that early next year :slight_smile:

We really value your feedback and everyone in the community. We are working our hardest to try provide a platform that is both rock solid and constantly improving. In 2019 we hope to launch a number of things to improve stability and visibility across that stack, its gonna be a great year for Balena.


#18

@shaunmulligan, thank you for such a thorough response. I had no idea there were so few on the platform, I’d figured there were way more! I understand that you can’t run hardware tests on all of the boards, so that makes sense.

I look forward to moving forward with the platform as we move forward with a commercial release; we’ve switched to an armbian build in the interim. Please let me know when you have that timeframe.

Peter


#19

Hey @kazazes, just and update on this, Thud support is under way here: https://github.com/balena-os/meta-balena/pull/1351 , the only one worry here is the "balena-engine needs support for the go version provided by thud (1.11.1) " part, as docker/balena-engine is particularly sensitive to updates in go version, so we need to treat that with care and test thoroughly. Other than that, things seem to be progressing well and hopefully we can have thud out soon.


#20

Any update here?


#21

Hi @kazazes,
The PR is still in progress of finalizing.
We will get back to you once this gets released.


#22

@thgreasi could you please provide a link to the PR’s fork for testing? Can’t seem to find it on GitHub.


#23

@kazazes the branch for the above PR should be https://github.com/balena-os/meta-balena/tree/ag/thud-support .