I was playing with LXC to debug a problem a user had with OAI inside an unprivileged LXC container. So I set up LXC on my machine. What follows is a recap of the steps. This was run on an up-to-date Fedora 41. This is compiled from various sources:
- Basic installation and running: LXC Getting Started
- Fedora-specific tips: Setting up unprivileged containers with LXC on Fedora 38
- Some details about Linux kernel primitives: Tutorial: Using Linux Primitives to Build Your Own Containers - Stéphane Graber & Christian Brauner. The interesting effect of an unprivileged container root account having all privileges but not the “right to use them” is discussed starting at minute 39.
Installation
Install LXC and start the relevant services.
sudo dnf install lxc lxc-templates lxc-extra
sudo systemctl start lxc
This also starts lxc-net for networking, and there is now an lxcbr0 bridge
that will provide connectivity to the container.
The firewall on this default Fedora system is installed and activated, but would block IP assignment to the containers. I did not bother further with this and disabled the firewall for my tests, but this is a security risk.
sudo systemctl stop firewalld.service
Some information on how this could be configured properly is here.
Easy container start
The easy way, which is not recommended, is to start a privileged container. The following will guide through OS selection and container creation, then start that container.
sudo lxc-create --name mycontainer --template download
sudo lxc-start --name mycontainer
You can then start a shell inside the container, inspect it, show running or stopped containers like so:
sudo lxc-attach --name mycontainer
sudo lxc-info -n mycontainer
sudo lxc-ls --fancy
sudo lxc-ls --stopped
The lxc-info command will show the IP address. Without some firewall rules or
disabled firewall, it won’t have IP connectivity.
A container can be stopped, and then disposed like the following. Note that it’s not recommended to destroy a container while it’s running.
sudo lxc-stop --name mycontainer
sudo lxc-destroy --name mycontainer
The general LXC configuration is at /etc/lxc/default.conf, and the
corresponding container configuration is at /var/lib/lxc/mycontainer/config.
In the latter directory, there is also the rootfs for the container.
The problem with that approach is that the root user inside the container is mapped to the root user on the host. For security reasons, it’s better to start containers unprivileged.
Correct container start
Creating unprivileged containers is slightly more complex, as it requires to make a mapping of the host user IDs to a new user ID range for the root user inside the container. The LXC documentation specifies to create this, but in my case, it was already there:
$ cat /etc/subuid
richie:524288:65536
$ cat /etc/subgid
richie:524288:65536
Means that for user richie, 65536 UIDs and GIDs will be mapped to the host
UID/GID starting at 524288 (the LXC documentation uses 1000000 instead of the
preconfigured 524288 above).
Now, we need to tell LXC about this ID mapping. First, copy the general LXC
default config for new containers from /etc/lxc/default.conf into
.config/lxc/, then append the UID/GID mapping to that file (the LXC getting
starting guide has a script for that!)
lxc.idmap = u 0 524288 65536
lxc.idmap = g 0 524288 65536
Finally, the host user needs to be enabled to create network devices, so enable this:
echo "$(id -un) veth lxcbr0 10" | sudo tee -a /etc/lxc/lxc-usernet
Now we are ready to start a container. It cannot be directly started with
lxc-create, because, as the LXC documentation explains:
To run unprivileged containers as an unprivileged user, the user must be allocated an empty delegated cgroup (this is required because of the leaf-node and delegation model of cgroup2, not because of liblxc). […] It is not possible to simply start a container from a shell as a user and automatically delegate a cgroup. Therefore, you need to wrap each call to any of the
lxc-*commands in asystemd-runcommand.
Use this to create and start the container:
systemd-run --unit=my-unit --user --scope -p "Delegate=yes" -- lxc-create --name oai --template download
systemd-run --unit=my-unit --user --scope -p "Delegate=yes" -- lxc-start --name oai
An additional complication was that the container start was aborted with an error message
Permission denied - Could not access /home/richie
It did not have enough rights to read from my home directory
(which has the container configuration and rootfs in .local/share/lxc/oai/config).
The solution here is to create a file access control list permission for the
(mapped) UID of the container:
setfacl -m u:524288:x /home/richie/
getfacl /home/richie/
It was then possible to attach and stop the container as previously:
lxc-attach --name oai
lxc-stop --name oai
Mount a directory into the container
It’s relatively straight-forward to mount a directory into an existing
container. First, inside the running container, create the directory. I wanted
to mount the OAI directory to debug the problem, so in the container, I created
/oai, then stopped the container. Adding
lxc.mount.entry = /home/richie/oai oai none bind 0 0
to .local/share/lxc/oai/config mounts the (existing) /home/richie/oai from
the host into /oai of the container. Restart the container, and it should be
feasible to see the files inside the container.
The OAI problem
The problem with the unprivileged container is that root inside the container
also has all capabilities (such as CAP_SYS_NICE), but not necessarily the
“right” to use them, as the user that created the container does not have the
capability.
OAI uses a syscall to check the (effective) capabilities for CAP_SYS_NICE,
which should indicate that it can create threads with increased priority or
specific capabilities. The container root has that capability, but on my
system, the normal user account does not have it. Therefore, OAI first detects
the capability, but then fails to actually create the threads.
One solution is to drop the capability before starting OAI. This can be achieved by adding
lxc.cap.drop = sys_nice
to the container configuration file .local/share/lxc/oai/config.
After stopping and restarting the container, OAI should detect the missing capability, print a warning, and then run normally.
This “limitation” is described in the Youtube video I linked to.