I just want PVCs to work in my cluster
You deploy a database to your Kubernetes cluster. Pod spins up. You add data. Then you reschedule the pod onto a different node. Your data evaporates like your budget after a conference trip.
Welcome to the persistent storage problem.
Kubernetes has this cute fiction that pods are ephemeral. Practical life says your Postgres needs data to actually… persist. So you need a storage backend — something that survives pod death and node reboots. You declare a PersistentVolumeClaim (PVC) in your manifests, and some provider has to make that real.
The two popular choices in the homelab / small-to-medium cluster space are Longhorn and Rook-Ceph. Both work. Both will chew through your afternoon. But they solve the problem in radically different ways — and which one doesn’t make you want to flip a table depends entirely on what you’re actually running.
The Storage Backend Problem
Before we pit them against each other, let’s be clear on what we’re actually solving.
Kubernetes itself is just an orchestrator. It schedules workloads, restarts failed pods, and manages networking. It does not come with persistent storage. (LocalPath provisioners exist, but they’re a trap — your data lives on a single node and if that node dies, so does your data. Great for testing. A disaster for anything stateful.)
A real storage backend needs to:
- Accept data from a pod on any node
- Keep it durable (replicated across multiple nodes, ideally)
- Serve it back to the pod even if the pod moves
- Not lose anything if a node catches fire
Longhorn and Rook-Ceph are the two philosophies for doing this at homelab scale.
Longhorn: The Lightweight Option
Longhorn is Rancher’s answer to the question: “What if we just made a storage provisioner that doesn’t require DevOps superpowers?”
Design Philosophy
Longhorn treats each volume independently. When you create a PVC, Longhorn:
- Spins up a small iSCSI “engine” pod (really a Longhorn replica set manager)
- Creates N replicas (default 3) across different nodes
- Syncs data between replicas in real-time
- Exposes the volume as an iSCSI target that the workload pod connects to
This is intentionally simple. You’re not managing a distributed filesystem or a crush map. You’re just saying “I have a volume, make copies of it on multiple nodes.”
What Longhorn Gives You
-
Install: One Helm chart. Seriously.
Terminal window helm repo add longhorn https://charts.longhorn.iohelm repo updatehelm install longhorn longhorn/longhorn \--namespace longhorn-system \--create-namespaceFive minutes. No cluster topology planning. No
ceph-deployceremony. -
Resource footprint: Minimal. Each node runs a small agent and instance manager (~50MB RAM each). Replicas add iSCSI engine pods (~200MB per engine). Total? A small 3-node cluster uses ~2GB RAM across the infrastructure.
-
StorageClass: Dead simple.
storage-class.yaml apiVersion: storage.k8s.io/v1kind: StorageClassmetadata:name: longhornprovisioner: driver.longhorn.ioparameters:numberOfReplicas: "3"staleReplicaTimeout: "2880" # minutes until a replica is considered stalefstype: "ext4"allowVolumeExpansion: true -
Web UI: Friendly. See your volumes, replicas, snapshot status, backups. Click around without fear. It’s not Ceph’s crush map — it’s just a volume list.
-
Backups: You can snapshot volumes and back them up to S3, NFS, or wherever. Incremental backups. Restore to a different cluster. This is genuinely useful for homelab ops.
-
Single cluster: Longhorn assumes one Kubernetes cluster. If you want multi-cluster replication, you’ll be disappointed.
Longhorn’s Tradeoffs
-
Scalability ceiling: Longhorn works great on 3–10 nodes. At 20+ nodes, you start fighting performance. Each volume replica needs network I/O sync, and Longhorn wasn’t architected for massive scale.
-
Network dependency: Replication happens over regular network. If your cluster’s network is flaky, so is your storage. Ceph is more forgiving.
-
No shared filesystem: Longhorn gives you block storage (RWO — ReadWriteOnce). If you need multiple pods to read the same volume (RWX — ReadWriteMany), you’re bolting NFS on top of Longhorn. That’s a second tool. Ceph’s CephFS handles this natively.
-
Performance: Good, not stellar. Replication adds latency. If you have a workload that’s I/O-obsessed (high-speed data ingestion), Longhorn’s throughput won’t embarrass you, but Ceph on good hardware will still win.
-
Upgrade story: Pretty clean, honestly. Helm upgrades are smooth, and they’ve gotten better at rolling updates without evicting all your pods at once.
Rook-Ceph: The Industrial Warehouse
Rook is an operator that deploys real Ceph into Kubernetes. Ceph is an open-source distributed storage system that’s been hardened in production at massive scale (CERN, OpenStack clouds, etc.).
Design Philosophy
Ceph is a distributed storage system. It doesn’t think in terms of individual volumes. It thinks in terms of:
- MONs (monitors): The metadata cluster that keeps Ceph state consistent
- OSD (object storage daemons): The nodes that actually store data
- MGR (manager): Orchestration and metrics
Rook wraps this into Kubernetes operators so you don’t have to hand-configure a Ceph cluster separately. You write a CephCluster CR and Rook does the heavy lifting.
What Rook-Ceph Gives You
-
Install: More complex, but still reasonable.
Terminal window helm repo add rook-release https://charts.rook.io/releasehelm repo updatehelm install rook-ceph rook-release/rook-ceph \--namespace rook-ceph \--create-namespace \--values values.yaml # defines node selector, storage paths, etc.# Then deploy the CephCluster CRkubectl apply -f cluster.yamlThe cluster CR takes more time to stabilize (MONs need to elect, OSDs need to format and join). Budget 10–15 minutes.
-
Cluster topology: You define it explicitly.
cluster.yaml apiVersion: ceph.rook.io/v1kind: CephClustermetadata:name: rook-cephnamespace: rook-cephspec:cephVersion:image: quay.io/ceph/ceph:v18.2.0mon:count: 3mgr:count: 2storage:useAllNodes: falseuseAllDevices: falsenodes:- name: node-1devices:- name: /dev/sdb # dedicated storage device- name: node-2devices:- name: /dev/sdb- name: node-3devices:- name: /dev/sdb# tune Ceph performance / safety settings herecephConfig:global:osd_pool_default_size: "3"osd_pool_default_min_size: "2"You’re telling Ceph exactly which nodes and which devices to use. This is powerful and scary.
-
Storage classes: Multiple flavors. Block (RBD), shared filesystem (CephFS), S3-compatible object storage (RGW).
rbd-storageclass.yaml apiVersion: storage.k8s.io/v1kind: StorageClassmetadata:name: rook-ceph-blockprovisioner: rook-ceph.rbd.csi.ceph.comparameters:clusterID: rook-cephpool: replicapoolimageFormat: "2"imageFeatures: "layering"allowVolumeExpansion: true -
Shared filesystems: Want multiple pods to write to the same volume? CephFS has your back. RWX (ReadWriteMany) works natively.
-
Web UI: Ceph Dashboard is available (Rook exposes it). It’s more industrial — crush map visualization, CRUSH rule editing, OSD status, backfill/recovery progress. Powerful, less user-friendly than Longhorn’s UI.
-
Observability: Prometheus metrics out of the box. Ceph is very instrumented. You get pool usage, OSD latency, recovery rates, rebalancing status. Operators love this.
-
Performance: Ceph scales. With proper hardware (NVMe, 10GbE network), Ceph will handle serious throughput. This is why enterprises use it.
Rook-Ceph’s Tradeoffs
-
Resource overhead: Rook deploys a lot of pods.
- 3 MONs (~1GB RAM each)
- 2 MGRs (~1GB RAM each)
- 3+ OSDs (one per storage node, ~2GB RAM each for the OSD daemon)
- Ceph Dashboard
- MDS (if using CephFS)
Total for a minimal 3-node cluster: 15–20GB RAM just for Ceph. That’s a factor of 10 more than Longhorn.
-
Complexity: You need to understand Ceph concepts (MONs, OSDs, pools, CRUSH rules). If something breaks, debugging means reading Ceph docs. Longhorn breaks less often because there’s less surface area.
-
Dedicated devices: Ceph really wants dedicated block devices (SSDs for MONs, HDDs or NVMe for data). You can’t just share the same disk Kubernetes and Ceph are running on. Your nodes need extra storage attached.
-
Minimum cluster size: Ceph wants 3+ nodes for HA. You can run it on 1–2 nodes in a pinch, but you’re defeating the purpose.
-
Network: Ceph benefits from a fast, low-latency network (10GbE is recommended for serious deployments). Gigabit ethernet works, but you’re not getting full potential.
-
Upgrade complexity: Ceph rolling updates are better than they used to be, but they’re still orchestrated operations. Rook automates a lot, but you still monitor the upgrade carefully. You don’t want to evict all your OSDs at once.
The Comparison Matrix
| Factor | Longhorn | Rook-Ceph |
|---|---|---|
| Install time | 5 min | 15 min |
| Cluster size fit | 3–10 nodes | 3–100+ nodes |
| RAM overhead | ~2GB (3 nodes) | ~15–20GB (3 nodes) |
| Disk requirement | Shared OK | Dedicated devices required |
| Setup complexity | Simple | Moderate–complex |
| Block storage (RWO) | Yes, iSCSI | Yes, RBD |
| Shared FS (RWX) | No (need NFS on top) | Yes, CephFS |
| Snapshots | Yes | Yes |
| Backups | S3, NFS integration | S3 via RGW, snapshots |
| Performance | Good | Excellent at scale |
| Failure domain | Per-volume replicas | Cluster-wide, tunable |
| Web UI | Friendly | Industrial |
| Observability | Basic | Prometheus + Ceph Dashboard |
| Multi-cluster | No | Possible (advanced) |
| Upgrade story | Smooth | Careful coordination |
The Real Decision Tree
Start with Longhorn if:
- You have 3–5 nodes
- You want storage working in under an hour
- You don’t have spare disks for a Ceph cluster
- You want backups to S3 (native support)
- You don’t need shared filesystem volumes
- Your network is unreliable (Longhorn is more forgiving)
Go Rook-Ceph if:
- You have 10+ nodes and want to scale
- You have dedicated block devices available
- You need multiple pods reading/writing the same volume (CephFS)
- You want S3-compatible object storage (RGW) or metrics-driven observability
- Your infrastructure team can stomach complexity
- You’re building for 3+ year horizon (Ceph is battle-tested, supported everywhere)
- You have fast network (10GbE) and want to use it
Honest check: Do you need either?
Before you choose, ask yourself: Am I actually running stateful workloads that can’t tolerate data loss? Or am I running databases that should be managed by the cloud provider (RDS, etc.)?
For many homelab setups, local-path-provisioner with backups to S3 gets 90% of the way there. You’re not handling node failures gracefully, but you’re also not running a second full Kubernetes cluster just for storage.
If you’re running your own Postgres, Redis, or Elasticsearch — pick Longhorn. It’s simpler, and Postgres can handle brief rebalancing. If you’re running something that demands serious distributed storage semantics (OpenStack compute, a data lake), Rook-Ceph is the correct choice.
The Setup You’ll Actually Do
Longhorn in 10 Lines
# Add the Helm repohelm repo add longhorn https://charts.longhorn.io && helm repo update
# Install to the clusterhelm install longhorn longhorn/longhorn \ --namespace longhorn-system \ --create-namespace \ --set defaultSettings.defaultDataPath="/var/lib/longhorn"
# Verifykubectl get po -n longhorn-systemNow create a PVC and use it:
apiVersion: v1kind: PersistentVolumeClaimmetadata: name: mydataspec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 10Gi---apiVersion: v1kind: Podmetadata: name: postgres-testspec: containers: - name: postgres image: postgres:16 volumeMounts: - name: data mountPath: /var/lib/postgresql/data volumes: - name: data persistentVolumeClaim: claimName: mydataDeploy it. Longhorn creates the volume, replicas sync, Postgres starts. Done.
Rook-Ceph: A Bit More Ceremony
# Install the Rook operatorhelm install rook-ceph rook-release/rook-ceph \ --namespace rook-ceph \ --create-namespace
# Wait for the operator to be readykubectl wait --for=condition=ready pod \ -l app=rook-ceph-operator \ -n rook-ceph \ --timeout=300s
# Deploy the cluster (see YAML above, or use Rook's examples)kubectl apply -f cluster.yaml
# Monitor the rolloutkubectl get cephcluster -n rook-ceph -wOnce the cluster is healthy (all MONs running, OSDs joining), create a pool and StorageClass. Then you can use PVCs the same way.
The difference: Longhorn gives you storage in 5 minutes. Ceph gives you enterprise storage in 20 minutes, and you’ll spend the next 3 months understanding CRUSH rules.
The Honest Take
Longhorn wins on simplicity. It’s the forklift that works. It’s not the fastest, it’s not the most scalable, but it gets your data off the floor without requiring a structural engineering degree.
Rook-Ceph wins on durability, scale, and features. If you’re serious about self-hosting infrastructure, Ceph is what every cloud provider uses under the hood. It handles complexity so you don’t have to (after you learn it once).
For most homelab scenarios, Longhorn is the right call. Save Ceph for when you’ve outgrown Longhorn and have the ops bandwidth to manage it.
Your 2 AM self will thank you for picking the simpler option.