Skip to content
Go back

Linux Namespaces from Scratch

By SumGuy 15 min read
Linux Namespaces from Scratch

Pull Back the Docker Curtain

You’ve heard it a thousand times: “containers are lightweight VMs” or “they’re just processes with some isolation.” Cool. But what’s actually happening under the hood when you run docker run -it ubuntu bash? What is this isolation, really?

Here’s the thing — containers aren’t magic. They’re not some orchestration of wizardry happening in the kernel. They’re just namespaces and cgroups. That’s it. A namespace is a kernel feature that makes one group of processes see a different view of system resources than another group. A cgroup is a kernel mechanism that limits how many resources a group of processes can consume.

Docker, Podman, Kubernetes — they’re all just sophisticated scripts that call unshare() and clone() under the hood. And honestly, once you understand how to build a container yourself with nothing but unshare and a bash shell, the mystique evaporates. You realize that containerization is less “advanced cloud magic” and more “clever use of kernel features that have been around since the early 2000s.”

This article is a walk-through. We’re going to build a container from scratch using only Linux namespaces. No Docker. No libraries. Just you, the kernel, and some shell commands. By the end, you’ll understand what a container actually is, how to inspect namespaces on your own system, debug containerized processes from the host, and why containers aren’t the security boundary you might think they are.


The Eight Namespace Types

The Linux kernel offers eight namespace types. Each one isolates a different subset of system resources. Here’s the rundown:

PID Namespace

Isolates process IDs. Inside a PID namespace, processes see a different PID tree. The process that creates the namespace becomes PID 1 inside it — like its own init. Processes outside the namespace still exist, but the inside-processes can’t see them.

This is why docker run ubuntu ps aux shows only a handful of processes. Your host has 150+ processes running, but inside the container, you only see the ones that belong to that namespace.

Network (NET) Namespace

Isolates network interfaces, routing tables, firewall rules, and the loopback interface. Each NET namespace has its own view of ifconfig, ip route, and iptables. You can have two processes on the same host, both binding to port 8080, if they’re in different NET namespaces.

Docker uses this to give each container its own virtual network interface (usually a veth pair) and its own loopback.

Mount (MNT) Namespace

Isolates filesystem mounts. A process in one MNT namespace might see / as /var/lib/docker/overlay2/abc123/merged/, while the host sees / as the real root. You can mount filesystems, unmount them, change mount options — all without affecting other namespaces.

This is how Docker’s layered filesystems work: each container gets its own view of mounted volumes and the filesystem hierarchy.

UTS Namespace

Isolates the hostname and domain name. Inside a UTS namespace, you can hostname newname, and the process sees that hostname. The host sees a different one. It’s the shortest namespace (UTS = Unix Time Sharing), and it’s why your container can have hostname my-app-1 while the host is called prod-server-7.

IPC Namespace

Isolates System V IPC resources: message queues, shared memory segments, and semaphores. Processes in different IPC namespaces can’t see each other’s message queues or shared memory. Useful if you want two apps on the same host to not accidentally collide on IPC resources.

USER Namespace

Isolates user and group IDs. This is the interesting one. Inside a USER namespace, you can be root (UID 0) without being root on the host. The namespace maintains a mapping: “UIDs 0-65536 inside this namespace map to UIDs 100000-165536 on the host.” Brilliant for security — a privilege escalation inside the namespace is just an unprivileged user on the host.

CGROUP and TIME Namespaces

CGROUP namespaces isolate the view of cgroups — each namespace sees the cgroup hierarchy as if it’s the root. TIME namespaces (added in kernel 5.6) isolate the system clock and boot time, letting containers think they booted at a different time than the host. Both are less commonly used directly, but both matter for a complete containerization story.


The unshare Command: Your Container Launcher

unshare() is a syscall that creates a new namespace and places the calling process (and optionally its children) into it. The unshare command-line tool wraps this syscall and lets you experiment.

Here’s the simplest example:

Terminal window
$ unshare --pid --fork --mount-proc bash
root@host:/home/you# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 5880 3360 pts/0 S 12:34 0:00 bash
root 7 0.0 0.0 9660 2940 pts/0 R+ 12:34 0:00 ps aux

Notice: your bash shell is now PID 1 inside the namespace. On the host, it might be PID 18394. But inside? It’s the big cheese.

The --pid flag creates a new PID namespace. --fork is crucial — without it, the current shell would try to enter the namespace, but PID 1 must be a real process that stays alive. --fork spawns a child in the namespace and leaves the parent to reap zombies.

--mount-proc is a convenience flag that remounts /proc so that ps and other tools see the right process list.


Building a Mini-Container in 50 Lines

Let’s write a shell script that creates a “container” using only unshare, mount, and a few basic commands. No Docker. No OCI runtimes. Just the kernel.

mini-container.sh
#!/bin/bash
# mini-container.sh — build a lightweight container from scratch
set -e
CONTAINER_NAME=${1:-"mycontainer"}
ROOTFS="/tmp/${CONTAINER_NAME}-root"
# Step 1: Create a minimal rootfs
echo "[*] Creating rootfs at $ROOTFS"
mkdir -p "$ROOTFS"/{bin,lib,lib64,etc,proc,sys,tmp,root}
# Step 2: Copy essential binaries (bash, coreutils, ps)
# In a real setup, you'd do this more carefully or extract a base image
for bin in bash cat ls ps mkdir rm touch; do
cp -v "$(which $bin)" "$ROOTFS/bin/" 2>/dev/null || echo "Warning: $bin not found"
done
# Step 3: Copy shared libraries (crude but works for demo)
ldd /bin/bash | grep "=>" | awk '{print $3}' | xargs -I {} cp {} "$ROOTFS/lib/" 2>/dev/null || true
# Step 4: Create a basic /etc/passwd
echo "root:x:0:0:root:/root:/bin/bash" > "$ROOTFS/etc/passwd"
# Step 5: Enter namespaces and chroot
echo "[*] Entering container namespace..."
unshare \
--pid --fork \
--mount \
--uts \
--ipc \
--net \
--mount-proc \
chroot "$ROOTFS" bash
echo "[*] Container exited"

Run this and you’ve got a “container” — a bash shell in a new PID, UTS, IPC, and NET namespace, with a fresh mount tree and its own view of /proc. It’s isolated from the host.

From the host, you can see the process with ps aux. From inside, it’s PID 1.


The Live Demo: Isolation in Action

Let’s see namespaces in action without the script scaffolding.

Setting the Hostname (UTS Namespace)

Terminal window
# On the host
$ hostname
prod-server-7
# Enter a UTS namespace and change the hostname
$ unshare --uts bash
root@host:/home/you# hostname new-container
root@host:/home/you# hostname
new-container
# Exit back to the host
root@host:/home/you# exit
$ hostname
prod-server-7

The host’s hostname is unchanged. The UTS namespace let us lie to the process inside.

Network Isolation (NET Namespace)

Terminal window
# Host: two network interfaces
$ ip addr
1: lo: ...
2: eth0: inet 192.168.1.100/24 ...
# Enter a NET namespace
$ unshare --net bash
root@host:/home/you# ip addr
1: lo: <LOOPBACK> ...
# Only loopback! No eth0. We're isolated from the host's network.
root@host:/home/you# ip link add veth0 type veth peer name veth0-peer
# Now we have a virtual ethernet pair
root@host:/home/you# ip addr
1: lo: ...
2: veth0: ...

Inside the namespace, we created a veth pair. The host doesn’t see it (it only sees veth0-peer on the host side). Two isolated network stacks.

Process Isolation (PID Namespace)

Terminal window
# Host: hundreds of processes
$ ps aux | wc -l
156
# Inside a PID namespace
$ unshare --pid --fork --mount-proc bash
root@host:/home/you# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 5880 3360 pts/0 S 12:34 0:00 bash
root 8 0.0 0.0 9660 2940 pts/0 R+ 12:34 0:00 ps aux
# Only two processes visible.

The host’s 156 processes still exist — the PID namespace just hides them.


Inspecting Namespaces from the Host

Every process has an associated set of namespaces. You can inspect them in /proc/<pid>/ns/.

Terminal window
# Start a container in one terminal
$ docker run -it --rm ubuntu sleep 600
# Note the container ID or process ID
# In another terminal, find the PID
$ docker inspect --format='{{.State.Pid}}' <container-id>
12345
# Inspect its namespaces
$ ls -la /proc/12345/ns/
lrwx------ 1 root root 0 Jun 14 12:45 cgroup -> 'cgroup:[4026532668]'
lrwx------ 1 root root 0 Jun 14 12:45 ipc -> 'ipc:[4026532669]'
lrwx------ 1 root root 0 Jun 14 12:45 mnt -> 'mnt:[4026532670]'
lrwx------ 1 root root 0 Jun 14 12:45 net -> 'net:[4026532671]'
lrwx------ 1 root root 0 Jun 14 12:45 pid -> 'pid:[4026532672]'
lrwx------ 1 root root 0 Jun 14 12:45 uts -> 'uts:[4026532673]'
lrwx------ 1 root root 0 Jun 14 12:45 user -> 'user:[4026532674]'
# The numbers are namespace IDs. Processes in the same namespace share the same ID.
# Compare with the host bash:
$ ls -la /proc/$$/ns/
lrwx------ 1 you you 0 Jun 14 12:45 cgroup -> 'cgroup:[4026531835]'
lrwx------ 1 you you 0 Jun 14 12:45 ipc -> 'ipc:[4026531839]'
...
# Different IDs = different namespaces. The container is isolated.

To see all namespaces on your system:

Terminal window
$ lsns
NS TYPE NPROCS PID USER COMMAND
4026531835 cgroup 237 1 root /sbin/init
4026531839 ipc 237 1 root /sbin/init
4026531840 mnt 237 1 root /sbin/init
4026531956 net 237 1 root /sbin/init
4026531837 pid 237 1 root /sbin/init
4026531838 uts 237 1 root /sbin/init
4026531834 user 237 1 root /sbin/init
4026532668 cgroup 1 12345 root sleep 600
4026532669 ipc 1 12345 root sleep 600
...

Each row is a namespace. Processes in the same namespace share the same namespace ID.


Entering a Namespace: nsenter

Now you’re running something inside a container namespace (maybe via Docker). How do you inspect it from the host? You jump into its namespace with nsenter.

Terminal window
# Container is running (PID 12345 on the host)
$ nsenter --target 12345 --pid --mount --uts --ipc --net bash
# You're now running a bash inside the container's namespaces
# But you're still logged in as your host user (not the container process)
# This is incredibly useful for debugging: you can inspect the container's
# filesystem, network, and process tree without relying on tools inside the container

nsenter is your debugging superpower. If a container is wedged and you can’t exec into it, nsenter lets you jump into its namespace directly from the host.


User Namespaces and Privilege Escalation

Here’s where it gets clever (and slightly scary): the USER namespace.

A USER namespace maps UIDs inside the namespace to different UIDs on the host. The canonical example:

Terminal window
# On the host, you're user 1000
$ id
uid=1000(you) gid=1000(you) ...
# Inside a USER namespace, you can become root (UID 0)
# and the host only sees you as UID 1000
$ unshare --user --map-root-user bash
root@host:/home/you# id
uid=0(root) gid=0(root) ...
# But on the host, the process is still 1000
$ ps aux | grep bash
you 18394 ... bash

This is powerful for security. A process that manages to break out of a USER namespace and become root inside is just an unprivileged user on the host. It can’t actually escalate.

However — and this is a big however — USER namespaces are complex. The mapping is configured in /etc/subuid and /etc/subgid. If not set up correctly, you can still have privilege escalation risks. And not all kernel versions handle USER namespaces equally. Docker, for example, has had historical issues with USER namespace support.

The lesson: USER namespaces are a layer of defense, not an iron wall. A well-designed container runtime uses them, but they’re not the only security mechanism at play.


What Containers DON’T Isolate

This is the part that keeps security engineers up at night.

Kernel: All containers share the same kernel. If there’s a kernel vulnerability, containers don’t protect you. A malicious container can potentially exploit a kernel bug and affect the host or other containers.

Time: Unless you’re using the TIME namespace (and most container runtimes don’t enable it by default), all containers see the same system clock. You can’t make a container think it’s 2020 while the host is in 2026.

System calls: All containers use the same kernel syscall interface. There’s no filtering at the namespace level. If a container process calls a dangerous syscall (like opening /dev/mem), the kernel honors it. This is why seccomp and AppArmor exist — they’re userspace mechanisms that sit on top of namespaces and filter syscalls.

Cgroups (sometimes): Namespaces isolate the view of cgroups, but they don’t enforce limits on their own. If you create a PID namespace without also setting cgroup limits, a runaway process in the namespace can consume all the host’s CPU and memory. Namespaces + cgroups together = real isolation.


How Docker Ties It Together

When you run docker run -it ubuntu bash, Docker:

  1. Creates a new UTS namespace (hostname)
  2. Creates a new PID namespace (process isolation)
  3. Creates a new IPC namespace (message queues)
  4. Creates a new MNT namespace (filesystem isolation)
  5. Creates a new NET namespace (network isolation)
  6. Optionally creates a USER namespace (if enabled)
  7. Sets up cgroup limits (CPU, memory, PIDs)
  8. Extracts a rootfs (usually via overlay2 layering) into the MNT namespace
  9. Remounts /proc, /sys, /dev inside the namespace
  10. Runs your process (e.g., bash) inside all those namespaces
  11. Sets up veth pairs to bridge networking

Docker is just a sophisticated orchestrator of kernel features. It reads a Dockerfile, builds layers, and when you run, it combines all the above into a cohesive “container” experience.


Debugging Containerized Processes from the Host

You’ve got a container running, and something’s wrong. The logs are useless. How do you investigate?

Inspect /proc of the Container Process

Terminal window
# Find the container's host PID
$ docker inspect --format='{{.State.Pid}}' <container-id>
12345
# Check its file descriptors
$ ls -la /proc/12345/fd/
# What files is the process using?
# Check its memory map
$ cat /proc/12345/maps
# What memory pages does it have?
# Check its cgroup limits
$ cat /proc/12345/cgroup
# What resource limits are applied?

Jump Into the Container’s Namespace with nsenter

Terminal window
# Execute a command inside the container's namespaces (but from the host)
$ nsenter --target 12345 --pid --mount --uts --ipc --net ps aux
# You can now see the container's processes without relying on tools inside it
# Or get an interactive shell
$ nsenter --target 12345 --pid --mount --uts --ipc --net bash
# Now you're debugging from inside the container's view, but still as the host user

Trace Syscalls

Terminal window
# See what syscalls the container's process is making
$ strace -p 12345
# Watch in real-time
# Or record and analyze
$ timeout 5 strace -p 12345 > /tmp/trace.txt 2>&1
$ grep open /tmp/trace.txt # what files is it opening?

These techniques let you debug containers without ever logging into them, which is handy when the container is broken or your tools aren’t installed inside.


Your Mission: Build a Container in 50 Lines

Now it’s your turn. Here’s the skeleton; fill in the rest:

your-container.sh
#!/bin/bash
# YOUR ASSIGNMENT: Complete this script
# 1. Create a minimal rootfs with /bin, /etc, /proc, /sys, /tmp, /root
# 2. Copy bash and essential binaries (cat, ls, ps, mkdir)
# 3. Copy their shared libraries
# 4. Create /etc/passwd with a root entry
# 5. Use unshare to create PID, MNT, UTS, IPC, and NET namespaces
# 6. Mount /proc inside the namespace (so ps works)
# 7. Set a custom hostname inside the namespace
# 8. chroot into the rootfs
# 9. Drop into a bash shell
# 10. Verify that ps aux shows only your shell and ps process (PID 1)
# 11. Verify that hostname shows your custom name, not the host's
# 12. Exit and confirm the host is unaffected
# Bonus: Set up a veth pair for networking. Give the container a virtual interface.

The challenge: make it work without Docker. Once you do, you’ll have built a real container, and you’ll understand what Docker is actually doing.


The Takeaway

Containers aren’t magic. They’re a clever combination of kernel features that have existed for two decades. Namespaces isolate resources; cgroups enforce limits. Together, they create the illusion of a lightweight VM.

Understanding namespaces means you can:

Next time someone tells you “containers are just VMs,” you’ll smile. You know better. You’ve built one yourself.

Now go forth, build some minimal containers, and remember: every running process is already in a namespace. The kernel has been containerizing workloads since before Docker was a glimmer in Solomon Hykes’s eye.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
rclone vs Restic: Sync vs Backup

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts