Modern datacenters and beyond

Storage -- Quick Reference

Key Numbers

Parameter VMware/vSAN OVE (ODF/Ceph) Azure Local (S2D)
Max nodes/cluster (storage) 64 (vSAN) No hard limit (tested 100+) 16 nodes
Replication factor RF=1 (RAID) to RF=2/3 (mirroring) RF=2 or RF=3 (configurable per pool) 2-way or 3-way mirror
Usable capacity (RF=3) ~33% of raw ~33% of raw ~33% of raw
Usable capacity (RF=2) ~50% of raw ~50% of raw ~50% of raw
Typical NVMe IOPS/OSD N/A 10,000-50,000 per OSD N/A (pooled)
Latency target (NVMe, 4K rw) <200 us <500 us (Ceph RBD, tuned) <200 us (S2D all-NVMe)
Latency target (SSD, 4K rw) <500 us <1 ms (Ceph RBD) <500 us
OSD memory default N/A 4 GiB per OSD (osd_memory_target) N/A
MON count N/A 3 (small) or 5 (large) N/A
PG count (per pool, default) N/A 128-256 (autoscaler adjusts) N/A
Max PVC size VMDK: 62 TB No hard limit (tested multi-TB) VHDX: 64 TB
Volume snapshot support VMDK delta disks CSI VolumeSnapshot (Ceph instant) Hyper-V checkpoints
Online volume expansion Yes (vSAN) Yes (CSI ExpandVolume) Yes (S2D + ReFS)
Thin provisioning Yes Yes (Ceph RBD default) Yes (ReFS)
Encryption at rest vSAN encryption ODF cluster/PV encryption (LUKS) BitLocker
Deduplication vSAN 7+ No native dedup in RBD ReFS dedup + compression
Compression vSAN compression BlueStore compression (zstd/lz4) ReFS compression
Scrubbing interval N/A Daily (light), weekly (deep) Background (Storage Spaces)

Decision Matrix

Scenario Recommended Backend Volume Mode StorageClass Notes
VM boot disk (general) Ceph RBD Block ocs-storagecluster-ceph-rbd Block mode = less overhead, better perf
VM data disk (DB, high IOPS) Ceph RBD (NVMe pool) Block ceph-rbd-nvme (custom) Separate pool on NVMe-only OSDs
Shared filesystem (RWX) CephFS Filesystem ocs-storagecluster-cephfs For VMs needing shared mount
Bulk/archive storage CephFS or NFS Filesystem cephfs-bulk (custom) Cheaper tier, HDD-backed pool
External SAN integration NetApp ONTAP (Trident) Block ontap-san iSCSI or FC; existing SAN investment
External NAS integration NetApp ONTAP (Trident) Filesystem ontap-nas NFS; good for legacy NFS mounts
Live migration support Any RWX-capable backend Block (RWX) Ceph RBD (default RWX) RWO blocks live migration
VM template / golden image Ceph RBD Block Same as target Use CSI clone (COW, near-instant)
Disaster recovery (sync) ODF Metro DR Block DR-enabled StorageClass RPO=0, requires stretched cluster
Disaster recovery (async) ODF Regional DR Block DR-enabled StorageClass RPO=minutes, multi-site

Access mode cheat sheet:

Access Mode Meaning Live Migration? Use Case
RWO (ReadWriteOnce) Single node read/write No (pinned to one node) Non-migratable VMs, temp disks
RWX (ReadWriteMany) Multi-node read/write Yes Production VMs (default for Ceph RBD block)
ROX (ReadOnlyMany) Multi-node read-only N/A Shared config disks, ISOs

Essential Commands

Task VMware/vSAN OVE (ODF/Ceph) Azure Local (S2D)
List datastores/pools govc datastore.info oc get cephblockpool -n openshift-storage Get-StoragePool -CimSession $c
List storage classes N/A (SPBM policies) oc get sc Get-StoragePool
List PVCs N/A oc get pvc -A Get-Volume -CimSession $c
List PVs N/A oc get pv Get-VirtualDisk -CimSession $c
Check storage capacity govc datastore.info ds1 oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph df Get-StoragePool \| select FriendlyName,Size,AllocatedSize
Check OSD/disk status N/A oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph osd tree Get-PhysicalDisk -CimSession $c
Check cluster health esxcli vsan health cluster list oc exec ... -- ceph status Get-HealthFault -CimSession $c
Create PVC (block) N/A oc apply -f pvc-block.yaml N/A (Azure portal/PS)
Expand PVC Extend VMDK oc patch pvc my-pvc -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}' Resize-VirtualDisk
Create snapshot govc snapshot.create oc apply -f volume-snapshot.yaml Checkpoint-VM
Restore from snapshot govc snapshot.revert oc apply -f volume-snapshot-restore.yaml Restore-VMSnapshot
Clone PVC N/A DataVolume with source.pvc (CSI clone) Copy-VHD
Upload disk image Content Library virtctl image-upload dv my-dv --image-path=disk.qcow2 --size=50Gi Azure portal
Check I/O stats esxtop oc exec ... -- ceph osd perf Get-StorageQoSFlow
List slow OSD ops N/A oc exec ... -- ceph daemon osd.0 dump_blocked_ops N/A
Repair/scrub esxcli vsan trace oc exec ... -- ceph pg deep-scrub <pg-id> Repair-VirtualDisk
Check rebalance progress N/A oc exec ... -- ceph -w (watch) Get-StorageJob -CimSession $c
Set pool replication N/A oc exec ... -- ceph osd pool set <pool> size 3 Set-StorageTier
Ceph toolbox shell N/A oc rsh -n openshift-storage deploy/rook-ceph-tools N/A
ODF dashboard N/A OpenShift Console > Storage > Data Foundation N/A

Architecture at a Glance (OVE/ODF)

+===============================================================================+
| VM Guest OS: /dev/vda (virtio-blk or virtio-scsi)                             |
+===============================================================================+
| QEMU block layer (inside virt-launcher pod)                                   |
|   raw block device (block-mode PVC) or qcow2 file (filesystem-mode PVC)      |
+===============================================================================+
| Kubernetes PVC (PersistentVolumeClaim)                                        |
|   StorageClass -> CSI driver -> Ceph RBD image                                |
+===============================================================================+
| CSI Layer: ceph-csi (rbd plugin)                                              |
|   Controller (Deployment): CreateVolume, Snapshot, Expand, Attach             |
|   Node (DaemonSet): Stage, Publish (map RBD to /dev on host, bind-mount)      |
+===============================================================================+
| Rook-Ceph Operator (manages all Ceph daemons as K8s workloads)                |
+===============================================================================+
| RADOS Cluster                                                                 |
|   MON (x3): Paxos quorum, cluster map, CRUSH map                             |
|   MGR (x2): Dashboard, Prometheus metrics, balancer                           |
|   OSD (1 per disk): BlueStore -> raw NVMe/SSD                                |
|     Pool -> PG -> OSD set (selected by CRUSH algorithm)                       |
|     Replication: primary writes -> replicate to secondary + tertiary OSD      |
+===============================================================================+
| Physical: NVMe/SSD drives, 25GbE NICs (public + cluster network)              |
+===============================================================================+

Storage Troubleshooting Quick Checks

1. Ceph cluster not healthy (HEALTH_WARN or HEALTH_ERR)

# Check overall status
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph status
# Check specific health warnings
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph health detail
# Check OSD status (look for down/out)
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph osd tree

2. PVC stuck in Pending state

# Check PVC events for provisioning errors
oc describe pvc my-pvc -n my-namespace
# Check CSI controller logs for CreateVolume failures
oc logs -n openshift-storage -l app=csi-rbdplugin-provisioner -c csi-rbdplugin --tail=50
# Check StorageClass exists and is default
oc get sc

3. VM disk I/O slow (high latency)

# Check OSD latency (commit_latency_ms > 10 = problem)
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph osd perf
# Check for slow ops (blocked > 30s = degraded)
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph daemon osd.0 dump_blocked_ops
# Check if recovery/backfill is consuming I/O
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph -s | grep -E 'recovery|backfill'

4. OSD crashed or not starting

# Check OSD pod status
oc get pods -n openshift-storage -l app=rook-ceph-osd --field-selector=status.phase!=Running
# Check OSD pod logs
oc logs -n openshift-storage rook-ceph-osd-<id>-<hash> --previous
# Check underlying disk health
oc debug node/<node-name> -- chroot /host smartctl -a /dev/nvme0n1

5. Volume not attaching to VM (VM stuck in scheduling)

# Check VolumeAttachment objects
oc get volumeattachment | grep <pv-name>
# Check CSI node plugin logs on target node
oc logs -n openshift-storage -l app=csi-rbdplugin --field-selector spec.nodeName=<node> -c csi-rbdplugin --tail=50
# Check if RBD image is locked by another node (stale mapping)
oc exec -n openshift-storage deploy/rook-ceph-tools -- rbd status <pool>/<image>