Longhorn vs Rook-Ceph

I just want PVCs to work in my cluster

You deploy a database to your Kubernetes cluster. Pod spins up. You add data. Then you reschedule the pod onto a different node. Your data evaporates like your budget after a conference trip.

Welcome to the persistent storage problem.

Kubernetes has this cute fiction that pods are ephemeral. Practical life says your Postgres needs data to actually… persist. So you need a storage backend — something that survives pod death and node reboots. You declare a PersistentVolumeClaim (PVC) in your manifests, and some provider has to make that real.

The two popular choices in the homelab / small-to-medium cluster space are Longhorn and Rook-Ceph. Both work. Both will chew through your afternoon. But they solve the problem in radically different ways — and which one doesn’t make you want to flip a table depends entirely on what you’re actually running.

The Storage Backend Problem

Before we pit them against each other, let’s be clear on what we’re actually solving.

Kubernetes itself is just an orchestrator. It schedules workloads, restarts failed pods, and manages networking. It does not come with persistent storage. (LocalPath provisioners exist, but they’re a trap — your data lives on a single node and if that node dies, so does your data. Great for testing. A disaster for anything stateful.)

A real storage backend needs to:

Accept data from a pod on any node
Keep it durable (replicated across multiple nodes, ideally)
Serve it back to the pod even if the pod moves
Not lose anything if a node catches fire

Longhorn and Rook-Ceph are the two philosophies for doing this at homelab scale.

Longhorn: The Lightweight Option

Longhorn is Rancher’s answer to the question: “What if we just made a storage provisioner that doesn’t require DevOps superpowers?”

Design Philosophy

Longhorn treats each volume independently. When you create a PVC, Longhorn:

Spins up a small iSCSI “engine” pod (really a Longhorn replica set manager)
Creates N replicas (default 3) across different nodes
Syncs data between replicas in real-time
Exposes the volume as an iSCSI target that the workload pod connects to

This is intentionally simple. You’re not managing a distributed filesystem or a crush map. You’re just saying “I have a volume, make copies of it on multiple nodes.”

What Longhorn Gives You

Install: One Helm chart. Seriously.

helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace

Five minutes. No cluster topology planning. No ceph-deploy ceremony.

Resource footprint: Minimal. Each node runs a small agent and instance manager (~50MB RAM each). Replicas add iSCSI engine pods (~200MB per engine). Total? A small 3-node cluster uses ~2GB RAM across the infrastructure.

StorageClass: Dead simple.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn
provisioner: driver.longhorn.io
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"  # minutes until a replica is considered stale
  fstype: "ext4"
allowVolumeExpansion: true

Web UI: Friendly. See your volumes, replicas, snapshot status, backups. Click around without fear. It’s not Ceph’s crush map — it’s just a volume list.
Backups: You can snapshot volumes and back them up to S3, NFS, or wherever. Incremental backups. Restore to a different cluster. This is genuinely useful for homelab ops.
Single cluster: Longhorn assumes one Kubernetes cluster. If you want multi-cluster replication, you’ll be disappointed.

Longhorn’s Tradeoffs

Scalability ceiling: Longhorn works great on 3–10 nodes. At 20+ nodes, you start fighting performance. Each volume replica needs network I/O sync, and Longhorn wasn’t architected for massive scale.
Network dependency: Replication happens over regular network. If your cluster’s network is flaky, so is your storage. Ceph is more forgiving.
No shared filesystem: Longhorn gives you block storage (RWO — ReadWriteOnce). If you need multiple pods to read the same volume (RWX — ReadWriteMany), you’re bolting NFS on top of Longhorn. That’s a second tool. Ceph’s CephFS handles this natively.
Performance: Good, not stellar. Replication adds latency. If you have a workload that’s I/O-obsessed (high-speed data ingestion), Longhorn’s throughput won’t embarrass you, but Ceph on good hardware will still win.
Upgrade story: Pretty clean, honestly. Helm upgrades are smooth, and they’ve gotten better at rolling updates without evicting all your pods at once.

Rook-Ceph: The Industrial Warehouse

Rook is an operator that deploys real Ceph into Kubernetes. Ceph is an open-source distributed storage system that’s been hardened in production at massive scale (CERN, OpenStack clouds, etc.).

Design Philosophy

Ceph is a distributed storage system. It doesn’t think in terms of individual volumes. It thinks in terms of:

MONs (monitors): The metadata cluster that keeps Ceph state consistent
OSD (object storage daemons): The nodes that actually store data
MGR (manager): Orchestration and metrics

Rook wraps this into Kubernetes operators so you don’t have to hand-configure a Ceph cluster separately. You write a CephCluster CR and Rook does the heavy lifting.

What Rook-Ceph Gives You

Install: More complex, but still reasonable.

helm repo add rook-release https://charts.rook.io/release
helm repo update
helm install rook-ceph rook-release/rook-ceph \
  --namespace rook-ceph \
  --create-namespace \
  --values values.yaml  # defines node selector, storage paths, etc.

# Then deploy the CephCluster CR
kubectl apply -f cluster.yaml

The cluster CR takes more time to stabilize (MONs need to elect, OSDs need to format and join). Budget 10–15 minutes.

Cluster topology: You define it explicitly.

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v18.2.0
  mon:
    count: 3
  mgr:
    count: 2
  storage:
    useAllNodes: false
    useAllDevices: false
    nodes:
    - name: node-1
      devices:
      - name: /dev/sdb  # dedicated storage device
    - name: node-2
      devices:
      - name: /dev/sdb
    - name: node-3
      devices:
      - name: /dev/sdb
  # tune Ceph performance / safety settings here
  cephConfig:
    global:
      osd_pool_default_size: "3"
      osd_pool_default_min_size: "2"

You’re telling Ceph exactly which nodes and which devices to use. This is powerful and scary.

Storage classes: Multiple flavors. Block (RBD), shared filesystem (CephFS), S3-compatible object storage (RGW).

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: replicapool
  imageFormat: "2"
  imageFeatures: "layering"
allowVolumeExpansion: true

Shared filesystems: Want multiple pods to write to the same volume? CephFS has your back. RWX (ReadWriteMany) works natively.
Web UI: Ceph Dashboard is available (Rook exposes it). It’s more industrial — crush map visualization, CRUSH rule editing, OSD status, backfill/recovery progress. Powerful, less user-friendly than Longhorn’s UI.
Observability: Prometheus metrics out of the box. Ceph is very instrumented. You get pool usage, OSD latency, recovery rates, rebalancing status. Operators love this.
Performance: Ceph scales. With proper hardware (NVMe, 10GbE network), Ceph will handle serious throughput. This is why enterprises use it.

Rook-Ceph’s Tradeoffs

Resource overhead: Rook deploys a lot of pods.
- 3 MONs (~1GB RAM each)
- 2 MGRs (~1GB RAM each)
- 3+ OSDs (one per storage node, ~2GB RAM each for the OSD daemon)
- Ceph Dashboard
- MDS (if using CephFS)
Total for a minimal 3-node cluster: 15–20GB RAM just for Ceph. That’s a factor of 10 more than Longhorn.
Complexity: You need to understand Ceph concepts (MONs, OSDs, pools, CRUSH rules). If something breaks, debugging means reading Ceph docs. Longhorn breaks less often because there’s less surface area.
Dedicated devices: Ceph really wants dedicated block devices (SSDs for MONs, HDDs or NVMe for data). You can’t just share the same disk Kubernetes and Ceph are running on. Your nodes need extra storage attached.
Minimum cluster size: Ceph wants 3+ nodes for HA. You can run it on 1–2 nodes in a pinch, but you’re defeating the purpose.
Network: Ceph benefits from a fast, low-latency network (10GbE is recommended for serious deployments). Gigabit ethernet works, but you’re not getting full potential.
Upgrade complexity: Ceph rolling updates are better than they used to be, but they’re still orchestrated operations. Rook automates a lot, but you still monitor the upgrade carefully. You don’t want to evict all your OSDs at once.

The Comparison Matrix

Factor	Longhorn	Rook-Ceph
Install time	5 min	15 min
Cluster size fit	3–10 nodes	3–100+ nodes
RAM overhead	~2GB (3 nodes)	~15–20GB (3 nodes)
Disk requirement	Shared OK	Dedicated devices required
Setup complexity	Simple	Moderate–complex
Block storage (RWO)	Yes, iSCSI	Yes, RBD
Shared FS (RWX)	No (need NFS on top)	Yes, CephFS
Snapshots	Yes	Yes
Backups	S3, NFS integration	S3 via RGW, snapshots
Performance	Good	Excellent at scale
Failure domain	Per-volume replicas	Cluster-wide, tunable
Web UI	Friendly	Industrial
Observability	Basic	Prometheus + Ceph Dashboard
Multi-cluster	No	Possible (advanced)
Upgrade story	Smooth	Careful coordination

The Real Decision Tree

Start with Longhorn if:

You have 3–5 nodes
You want storage working in under an hour
You don’t have spare disks for a Ceph cluster
You want backups to S3 (native support)
You don’t need shared filesystem volumes
Your network is unreliable (Longhorn is more forgiving)

Go Rook-Ceph if:

You have 10+ nodes and want to scale
You have dedicated block devices available
You need multiple pods reading/writing the same volume (CephFS)
You want S3-compatible object storage (RGW) or metrics-driven observability
Your infrastructure team can stomach complexity
You’re building for 3+ year horizon (Ceph is battle-tested, supported everywhere)
You have fast network (10GbE) and want to use it

Honest check: Do you need either?

Before you choose, ask yourself: Am I actually running stateful workloads that can’t tolerate data loss? Or am I running databases that should be managed by the cloud provider (RDS, etc.)?

For many homelab setups, local-path-provisioner with backups to S3 gets 90% of the way there. You’re not handling node failures gracefully, but you’re also not running a second full Kubernetes cluster just for storage.

If you’re running your own Postgres, Redis, or Elasticsearch — pick Longhorn. It’s simpler, and Postgres can handle brief rebalancing. If you’re running something that demands serious distributed storage semantics (OpenStack compute, a data lake), Rook-Ceph is the correct choice.

The Setup You’ll Actually Do

Longhorn in 10 Lines

# Add the Helm repo
helm repo add longhorn https://charts.longhorn.io && helm repo update

# Install to the cluster
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultDataPath="/var/lib/longhorn"

# Verify
kubectl get po -n longhorn-system

Now create a PVC and use it:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mydata
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: postgres-test
spec:
  containers:
  - name: postgres
    image: postgres:16
    volumeMounts:
    - name: data
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: mydata

Deploy it. Longhorn creates the volume, replicas sync, Postgres starts. Done.

Rook-Ceph: A Bit More Ceremony

# Install the Rook operator
helm install rook-ceph rook-release/rook-ceph \
  --namespace rook-ceph \
  --create-namespace

# Wait for the operator to be ready
kubectl wait --for=condition=ready pod \
  -l app=rook-ceph-operator \
  -n rook-ceph \
  --timeout=300s

# Deploy the cluster (see YAML above, or use Rook's examples)
kubectl apply -f cluster.yaml

# Monitor the rollout
kubectl get cephcluster -n rook-ceph -w

Once the cluster is healthy (all MONs running, OSDs joining), create a pool and StorageClass. Then you can use PVCs the same way.

The difference: Longhorn gives you storage in 5 minutes. Ceph gives you enterprise storage in 20 minutes, and you’ll spend the next 3 months understanding CRUSH rules.

The Honest Take

Longhorn wins on simplicity. It’s the forklift that works. It’s not the fastest, it’s not the most scalable, but it gets your data off the floor without requiring a structural engineering degree.

Rook-Ceph wins on durability, scale, and features. If you’re serious about self-hosting infrastructure, Ceph is what every cloud provider uses under the hood. It handles complexity so you don’t have to (after you learn it once).

For most homelab scenarios, Longhorn is the right call. Save Ceph for when you’ve outgrown Longhorn and have the ops bandwidth to manage it.

Your 2 AM self will thank you for picking the simpler option.

I just want PVCs to work in my cluster

The Storage Backend Problem

Longhorn: The Lightweight Option

Design Philosophy

What Longhorn Gives You

Longhorn’s Tradeoffs

Rook-Ceph: The Industrial Warehouse

Design Philosophy

What Rook-Ceph Gives You

Rook-Ceph’s Tradeoffs

The Comparison Matrix

The Real Decision Tree

The Setup You’ll Actually Do

Longhorn in 10 Lines

Rook-Ceph: A Bit More Ceremony

The Honest Take

Responses from around the web

Discussion

Related Posts

Velero: K8s Backup and DR

cert-manager: ACME at Scale

Borg vs Duplicacy: Dedup Backup Wars

rclone vs Restic: Sync vs Backup

Longhorn vs Rook-Ceph

I just want PVCs to work in my cluster

The Storage Backend Problem

Longhorn: The Lightweight Option

Design Philosophy

What Longhorn Gives You

Longhorn’s Tradeoffs

Rook-Ceph: The Industrial Warehouse

Design Philosophy

What Rook-Ceph Gives You

Rook-Ceph’s Tradeoffs

The Comparison Matrix

The Real Decision Tree

The Setup You’ll Actually Do

Longhorn in 10 Lines

Rook-Ceph: A Bit More Ceremony

The Honest Take

Related Reading

Responses from around the web

Discussion

Related Posts

Velero: K8s Backup and DR

cert-manager: ACME at Scale

Borg vs Duplicacy: Dedup Backup Wars

rclone vs Restic: Sync vs Backup