Linux Namespaces from Scratch

Pull Back the Docker Curtain

You’ve heard it a thousand times: “containers are lightweight VMs” or “they’re just processes with some isolation.” Cool. But what’s actually happening under the hood when you run docker run -it ubuntu bash? What is this isolation, really?

Containers aren’t magic. They’re not some orchestration of wizardry happening in the kernel. They’re just namespaces and cgroups. That’s it. A namespace is a kernel feature that makes one group of processes see a different view of system resources than another group. A cgroup is a kernel mechanism that limits how many resources a group of processes can consume.

Docker, Podman, Kubernetes are all just sophisticated scripts that call unshare() and clone() under the hood. And honestly, once you understand how to build a container yourself with nothing but unshare and a bash shell, the mystique evaporates. You realize that containerization is less “advanced cloud magic” and more “clever use of kernel features that have been around since the early 2000s.”

We’re going to build a container from scratch using only Linux namespaces. No Docker. No libraries. Just you, the kernel, and some shell commands. By the end, you’ll understand what a container actually is, how to inspect namespaces on your own system, debug containerized processes from the host, and why containers aren’t the security boundary you might think they are.

The Eight Namespace Types

The Linux kernel offers eight namespace types. Each one isolates a different subset of system resources. Here’s the rundown:

PID Namespace

Isolates process IDs. Inside a PID namespace, processes see a different PID tree. The process that creates the namespace becomes PID 1 inside it, like its own init. Processes outside the namespace still exist, but the inside-processes can’t see them.

This is why docker run ubuntu ps aux shows only a handful of processes. Your host has 150+ processes running, but inside the container, you only see the ones that belong to that namespace.

Network (NET) Namespace

Isolates network interfaces, routing tables, firewall rules, and the loopback interface. Each NET namespace has its own view of ifconfig, ip route, and iptables. You can have two processes on the same host, both binding to port 8080, if they’re in different NET namespaces.

Docker uses this to give each container its own virtual network interface (usually a veth pair) and its own loopback.

Mount (MNT) Namespace

Isolates filesystem mounts. A process in one MNT namespace might see / as /var/lib/docker/overlay2/abc123/merged/, while the host sees / as the real root. You can mount filesystems, unmount them, change mount options, all without affecting other namespaces.

This is how Docker’s layered filesystems work: each container gets its own view of mounted volumes and the filesystem hierarchy.

UTS Namespace

Isolates the hostname and domain name. Inside a UTS namespace, you can hostname newname, and the process sees that hostname. The host sees a different one. It’s the shortest namespace (UTS = Unix Time Sharing), and it’s why your container can have hostname my-app-1 while the host is called prod-server-7.

IPC Namespace

Isolates System V IPC resources: message queues, shared memory segments, and semaphores. Processes in different IPC namespaces can’t see each other’s message queues or shared memory. Useful if you want two apps on the same host to not accidentally collide on IPC resources.

USER Namespace

Isolates user and group IDs. This is the interesting one. Inside a USER namespace, you can be root (UID 0) without being root on the host. The namespace maintains a mapping: “UIDs 0-65536 inside this namespace map to UIDs 100000-165536 on the host.” Brilliant for security: a privilege escalation inside the namespace is just an unprivileged user on the host.

CGROUP and TIME Namespaces

CGROUP namespaces isolate the view of cgroups: each namespace sees the cgroup hierarchy as if it’s the root. TIME namespaces (added in kernel 5.6) isolate the system clock and boot time, letting containers think they booted at a different time than the host. Both are less commonly used directly, but both matter for a complete containerization story.

The `unshare` Command: Your Container Launcher

unshare() is a syscall that creates a new namespace and places the calling process (and optionally its children) into it. The unshare command-line tool wraps this syscall and lets you experiment.

Here’s the simplest example:

$ unshare --pid --fork --mount-proc bash
root@host:/home/you# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   5880  3360 pts/0    S    12:34   0:00 bash
root         7  0.0  0.0   9660  2940 pts/0    R+   12:34   0:00 ps aux

Notice: your bash shell is now PID 1 inside the namespace. On the host, it might be PID 18394. But inside? It’s the big cheese.

The --pid flag creates a new PID namespace. --fork is crucial: without it, the current shell would try to enter the namespace, but PID 1 must be a real process that stays alive. --fork spawns a child in the namespace and leaves the parent to reap zombies.

--mount-proc is a convenience flag that remounts /proc so that ps and other tools see the right process list.

Building a Mini-Container in 50 Lines

Let’s write a shell script that creates a “container” using only unshare, mount, and a few basic commands. No Docker. No OCI runtimes. Just the kernel.

#!/bin/bash

# mini-container.sh — build a lightweight container from scratch

set -e

CONTAINER_NAME=${1:-"mycontainer"}
ROOTFS="/tmp/${CONTAINER_NAME}-root"

# Step 1: Create a minimal rootfs
echo "[*] Creating rootfs at $ROOTFS"
mkdir -p "$ROOTFS"/{bin,lib,lib64,etc,proc,sys,tmp,root}

# Step 2: Copy essential binaries (bash, coreutils, ps)
# In a real setup, you'd do this more carefully or extract a base image
for bin in bash cat ls ps mkdir rm touch; do
  cp -v "$(which $bin)" "$ROOTFS/bin/" 2>/dev/null || echo "Warning: $bin not found"
done

# Step 3: Copy shared libraries (crude but works for demo)
ldd /bin/bash | grep "=>" | awk '{print $3}' | xargs -I {} cp {} "$ROOTFS/lib/" 2>/dev/null || true

# Step 4: Create a basic /etc/passwd
echo "root:x:0:0:root:/root:/bin/bash" > "$ROOTFS/etc/passwd"

# Step 5: Enter namespaces and chroot
echo "[*] Entering container namespace..."
unshare \
  --pid --fork \
  --mount \
  --uts \
  --ipc \
  --net \
  --mount-proc \
  chroot "$ROOTFS" bash

echo "[*] Container exited"

Run this and you’ve got a “container”, a bash shell in a new PID, UTS, IPC, and NET namespace, with a fresh mount tree and its own view of /proc. It’s isolated from the host.

From the host, you can see the process with ps aux. From inside, it’s PID 1.

The Live Demo: Isolation in Action

Let’s see namespaces in action without the script scaffolding.

Setting the Hostname (UTS Namespace)

# On the host
$ hostname
prod-server-7

# Enter a UTS namespace and change the hostname
$ unshare --uts bash
root@host:/home/you# hostname new-container
root@host:/home/you# hostname
new-container

# Exit back to the host
root@host:/home/you# exit
$ hostname
prod-server-7

The host’s hostname is unchanged. The UTS namespace let us lie to the process inside.

Network Isolation (NET Namespace)

# Host: two network interfaces
$ ip addr
1: lo: ...
2: eth0: inet 192.168.1.100/24 ...

# Enter a NET namespace
$ unshare --net bash
root@host:/home/you# ip addr
1: lo: <LOOPBACK> ...

# Only loopback! No eth0. We're isolated from the host's network.
root@host:/home/you# ip link add veth0 type veth peer name veth0-peer
# Now we have a virtual ethernet pair
root@host:/home/you# ip addr
1: lo: ...
2: veth0: ...

Inside the namespace, we created a veth pair. Both ends live in this namespace; the host sees neither, since the interfaces were created inside the isolated network stack. (To actually bridge a container to the host, you create the pair on the host and then move one end into the namespace with ip link set veth0-peer netns <pid>, which is exactly what Docker does.)

Process Isolation (PID Namespace)

# Host: hundreds of processes
$ ps aux | wc -l
156

# Inside a PID namespace
$ unshare --pid --fork --mount-proc bash
root@host:/home/you# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   5880  3360 pts/0    S    12:34   0:00 bash
root         8  0.0  0.0   9660  2940 pts/0    R+   12:34   0:00 ps aux

# Only two processes visible.

The host’s 156 processes still exist; the PID namespace just hides them.

Inspecting Namespaces from the Host

Every process has an associated set of namespaces. You can inspect them in /proc/<pid>/ns/.

# Start a container in one terminal
$ docker run -it --rm ubuntu sleep 600
# Note the container ID or process ID

# In another terminal, find the PID
$ docker inspect --format='{{.State.Pid}}' <container-id>
12345

# Inspect its namespaces
$ ls -la /proc/12345/ns/
lrwx------ 1 root root 0 Jun 14 12:45 cgroup -> 'cgroup:[4026532668]'
lrwx------ 1 root root 0 Jun 14 12:45 ipc -> 'ipc:[4026532669]'
lrwx------ 1 root root 0 Jun 14 12:45 mnt -> 'mnt:[4026532670]'
lrwx------ 1 root root 0 Jun 14 12:45 net -> 'net:[4026532671]'
lrwx------ 1 root root 0 Jun 14 12:45 pid -> 'pid:[4026532672]'
lrwx------ 1 root root 0 Jun 14 12:45 uts -> 'uts:[4026532673]'
lrwx------ 1 root root 0 Jun 14 12:45 user -> 'user:[4026532674]'

# The numbers are namespace IDs. Processes in the same namespace share the same ID.
# Compare with the host bash:
$ ls -la /proc/$$/ns/
lrwx------ 1 you you 0 Jun 14 12:45 cgroup -> 'cgroup:[4026531835]'
lrwx------ 1 you you 0 Jun 14 12:45 ipc -> 'ipc:[4026531839]'
...

# Different IDs = different namespaces. The container is isolated.

To see all namespaces on your system:

$ lsns
NS TYPE   NPROCS PID USER COMMAND
4026531835 cgroup   237   1 root /sbin/init
4026531839 ipc      237   1 root /sbin/init
4026531840 mnt      237   1 root /sbin/init
4026531956 net      237   1 root /sbin/init
4026531837 pid      237   1 root /sbin/init
4026531838 uts      237   1 root /sbin/init
4026531834 user     237   1 root /sbin/init
4026532668 cgroup     1 12345 root sleep 600
4026532669 ipc        1 12345 root sleep 600
...

Each row is a namespace. Processes in the same namespace share the same namespace ID.

Entering a Namespace: `nsenter`

Now you’re running something inside a container namespace (maybe via Docker). How do you inspect it from the host? You jump into its namespace with nsenter.

# Container is running (PID 12345 on the host)
$ nsenter --target 12345 --pid --mount --uts --ipc --net bash

# You're now running a bash inside the container's namespaces
# But you're still logged in as your host user (not the container process)
# This is incredibly useful for debugging: you can inspect the container's
# filesystem, network, and process tree without relying on tools inside the container

nsenter is your debugging superpower. If a container is wedged and you can’t exec into it, nsenter lets you jump into its namespace directly from the host.

User Namespaces and Privilege Escalation

Here’s where it gets clever (and slightly scary): the USER namespace.

A USER namespace maps UIDs inside the namespace to different UIDs on the host. The canonical example:

# On the host, you're user 1000
$ id
uid=1000(you) gid=1000(you) ...

# Inside a USER namespace, you can become root (UID 0)
# and the host only sees you as UID 1000
$ unshare --user --map-root-user bash
root@host:/home/you# id
uid=0(root) gid=0(root) ...

# But on the host, the process is still 1000
$ ps aux | grep bash
you      18394  ...  bash

This is powerful for security. A process that manages to break out of a USER namespace and become root inside is just an unprivileged user on the host. It can’t actually escalate.

However, and this is a big however, USER namespaces are complex. The mapping is configured in /etc/subuid and /etc/subgid. If not set up correctly, you can still have privilege escalation risks. And not all kernel versions handle USER namespaces equally. Docker, for example, has had historical issues with USER namespace support.

The lesson: USER namespaces are a layer of defense, not an iron wall. A well-designed container runtime uses them, but they’re not the only security mechanism at play.

What Containers DON’T Isolate

This is the part that keeps security engineers up at night.

Kernel: All containers share the same kernel. If there’s a kernel vulnerability, containers don’t protect you. A malicious container can exploit a kernel bug and affect the host or other containers.

Time: Unless you’re using the TIME namespace (and most container runtimes don’t enable it by default), all containers see the same system clock. You can’t make a container think it’s 2020 while the host is in 2026.

System calls: All containers use the same kernel syscall interface. There’s no filtering at the namespace level. If a container process calls a dangerous syscall (like opening /dev/mem), the kernel honors it. This is why seccomp and AppArmor exist: they’re userspace mechanisms that sit on top of namespaces and filter syscalls.

Cgroups (sometimes): Namespaces isolate the view of cgroups, but they don’t enforce limits on their own. If you create a PID namespace without also setting cgroup limits, a runaway process in the namespace can consume all the host’s CPU and memory. Namespaces + cgroups together = real isolation.

How Docker Ties It Together

When you run docker run -it ubuntu bash, Docker:

Creates a new UTS namespace (hostname)
Creates a new PID namespace (process isolation)
Creates a new IPC namespace (message queues)
Creates a new MNT namespace (filesystem isolation)
Creates a new NET namespace (network isolation)
Optionally creates a USER namespace (if enabled)
Sets up cgroup limits (CPU, memory, PIDs)
Extracts a rootfs (usually via overlay2 layering) into the MNT namespace
Remounts /proc, /sys, /dev inside the namespace
Runs your process (e.g., bash) inside all those namespaces
Sets up veth pairs to bridge networking

Docker is just a sophisticated orchestrator of kernel features. It reads a Dockerfile, builds layers, and when you run, it combines all the above into a cohesive “container” experience.

Debugging Containerized Processes from the Host

You’ve got a container running, and something’s wrong. The logs are useless. How do you investigate?

Inspect `/proc` of the Container Process

# Find the container's host PID
$ docker inspect --format='{{.State.Pid}}' <container-id>
12345

# Check its file descriptors
$ ls -la /proc/12345/fd/
# What files is the process using?

# Check its memory map
$ cat /proc/12345/maps
# What memory pages does it have?

# Check its cgroup limits
$ cat /proc/12345/cgroup
# What resource limits are applied?

Jump Into the Container’s Namespace with `nsenter`

# Execute a command inside the container's namespaces (but from the host)
$ nsenter --target 12345 --pid --mount --uts --ipc --net ps aux
# You can now see the container's processes without relying on tools inside it

# Or get an interactive shell
$ nsenter --target 12345 --pid --mount --uts --ipc --net bash
# Now you're debugging from inside the container's view, but still as the host user

Trace Syscalls

# See what syscalls the container's process is making
$ strace -p 12345
# Watch in real-time

# Or record and analyze
$ timeout 5 strace -p 12345 > /tmp/trace.txt 2>&1
$ grep open /tmp/trace.txt  # what files is it opening?

These techniques let you debug containers without ever logging into them, which is handy when the container is broken or your tools aren’t installed inside.

Your Mission: Build a Container in 50 Lines

Now it’s your turn. Here’s the skeleton; fill in the rest:

#!/bin/bash
# YOUR ASSIGNMENT: Complete this script

# 1. Create a minimal rootfs with /bin, /etc, /proc, /sys, /tmp, /root
# 2. Copy bash and essential binaries (cat, ls, ps, mkdir)
# 3. Copy their shared libraries
# 4. Create /etc/passwd with a root entry
# 5. Use unshare to create PID, MNT, UTS, IPC, and NET namespaces
# 6. Mount /proc inside the namespace (so ps works)
# 7. Set a custom hostname inside the namespace
# 8. chroot into the rootfs
# 9. Drop into a bash shell
# 10. Verify that ps aux shows only your shell and ps process (PID 1)
# 11. Verify that hostname shows your custom name, not the host's
# 12. Exit and confirm the host is unaffected

# Bonus: Set up a veth pair for networking. Give the container a virtual interface.

The challenge: make it work without Docker. Once you do, you’ll have built a real container, and you’ll understand what Docker is actually doing.

The Takeaway

Containers aren’t magic. They’re a clever combination of kernel features that have existed for two decades. Namespaces isolate resources; cgroups enforce limits. Together, they create the illusion of a lightweight VM.

Understanding namespaces means you can:

Debug containers from the host
Spot the difference between namespace isolation and security
Build your own container runtime (if you’re feeling ambitious)
Avoid common pitfalls (like forgetting that containers share the kernel)
Appreciate the elegance of Unix philosophy: small tools, big powers

Next time someone tells you “containers are just VMs,” you’ll smile. You know better. You’ve built one yourself.

Now go forth, build some minimal containers, and remember: every running process is already in a namespace. The kernel has been containerizing workloads since before Docker was a glimmer in Solomon Hykes’s eye.

Pull Back the Docker Curtain

The Eight Namespace Types

PID Namespace

Network (NET) Namespace

Mount (MNT) Namespace

UTS Namespace

IPC Namespace

USER Namespace

CGROUP and TIME Namespaces

The `unshare` Command: Your Container Launcher

Building a Mini-Container in 50 Lines

The Live Demo: Isolation in Action

Setting the Hostname (UTS Namespace)

Network Isolation (NET Namespace)

Process Isolation (PID Namespace)

Inspecting Namespaces from the Host

Entering a Namespace: `nsenter`

User Namespaces and Privilege Escalation

What Containers DON’T Isolate

How Docker Ties It Together

Debugging Containerized Processes from the Host

Inspect `/proc` of the Container Process

Jump Into the Container’s Namespace with `nsenter`

Trace Syscalls

Your Mission: Build a Container in 50 Lines

The Takeaway

Responses from around the web

Discussion

Related Posts

tini vs dumb-init vs --init

Linux Suspend and Hibernate: Teaching Your Machine to Take a Nap Without Dying

Sysctl Tuning: The Linux Kernel Knobs That Actually Matter

Kernel Live Patching: Security Updates Without the 3am Reboot

Linux Namespaces from Scratch

Pull Back the Docker Curtain

The Eight Namespace Types

PID Namespace

Network (NET) Namespace

Mount (MNT) Namespace

UTS Namespace

IPC Namespace

USER Namespace

CGROUP and TIME Namespaces

The unshare Command: Your Container Launcher

Building a Mini-Container in 50 Lines

The Live Demo: Isolation in Action

Setting the Hostname (UTS Namespace)

Network Isolation (NET Namespace)

Process Isolation (PID Namespace)

Inspecting Namespaces from the Host

Entering a Namespace: nsenter

User Namespaces and Privilege Escalation

What Containers DON’T Isolate

How Docker Ties It Together

Debugging Containerized Processes from the Host

Inspect /proc of the Container Process

Jump Into the Container’s Namespace with nsenter

Trace Syscalls

Your Mission: Build a Container in 50 Lines

The Takeaway

Related Reading

Responses from around the web

Discussion

Related Posts

tini vs dumb-init vs --init

Linux Suspend and Hibernate: Teaching Your Machine to Take a Nap Without Dying

Sysctl Tuning: The Linux Kernel Knobs That Actually Matter

Kernel Live Patching: Security Updates Without the 3am Reboot

The `unshare` Command: Your Container Launcher

Entering a Namespace: `nsenter`

Inspect `/proc` of the Container Process

Jump Into the Container’s Namespace with `nsenter`