Storage -- Quick Reference
Key Numbers
| Parameter |
VMware/vSAN |
OVE (ODF/Ceph) |
Azure Local (S2D) |
| Max nodes/cluster (storage) |
64 (vSAN) |
No hard limit (tested 100+) |
16 nodes |
| Replication factor |
RF=1 (RAID) to RF=2/3 (mirroring) |
RF=2 or RF=3 (configurable per pool) |
2-way or 3-way mirror |
| Usable capacity (RF=3) |
~33% of raw |
~33% of raw |
~33% of raw |
| Usable capacity (RF=2) |
~50% of raw |
~50% of raw |
~50% of raw |
| Typical NVMe IOPS/OSD |
N/A |
10,000-50,000 per OSD |
N/A (pooled) |
| Latency target (NVMe, 4K rw) |
<200 us |
<500 us (Ceph RBD, tuned) |
<200 us (S2D all-NVMe) |
| Latency target (SSD, 4K rw) |
<500 us |
<1 ms (Ceph RBD) |
<500 us |
| OSD memory default |
N/A |
4 GiB per OSD (osd_memory_target) |
N/A |
| MON count |
N/A |
3 (small) or 5 (large) |
N/A |
| PG count (per pool, default) |
N/A |
128-256 (autoscaler adjusts) |
N/A |
| Max PVC size |
VMDK: 62 TB |
No hard limit (tested multi-TB) |
VHDX: 64 TB |
| Volume snapshot support |
VMDK delta disks |
CSI VolumeSnapshot (Ceph instant) |
Hyper-V checkpoints |
| Online volume expansion |
Yes (vSAN) |
Yes (CSI ExpandVolume) |
Yes (S2D + ReFS) |
| Thin provisioning |
Yes |
Yes (Ceph RBD default) |
Yes (ReFS) |
| Encryption at rest |
vSAN encryption |
ODF cluster/PV encryption (LUKS) |
BitLocker |
| Deduplication |
vSAN 7+ |
No native dedup in RBD |
ReFS dedup + compression |
| Compression |
vSAN compression |
BlueStore compression (zstd/lz4) |
ReFS compression |
| Scrubbing interval |
N/A |
Daily (light), weekly (deep) |
Background (Storage Spaces) |
Decision Matrix
| Scenario |
Recommended Backend |
Volume Mode |
StorageClass |
Notes |
| VM boot disk (general) |
Ceph RBD |
Block |
ocs-storagecluster-ceph-rbd |
Block mode = less overhead, better perf |
| VM data disk (DB, high IOPS) |
Ceph RBD (NVMe pool) |
Block |
ceph-rbd-nvme (custom) |
Separate pool on NVMe-only OSDs |
| Shared filesystem (RWX) |
CephFS |
Filesystem |
ocs-storagecluster-cephfs |
For VMs needing shared mount |
| Bulk/archive storage |
CephFS or NFS |
Filesystem |
cephfs-bulk (custom) |
Cheaper tier, HDD-backed pool |
| External SAN integration |
NetApp ONTAP (Trident) |
Block |
ontap-san |
iSCSI or FC; existing SAN investment |
| External NAS integration |
NetApp ONTAP (Trident) |
Filesystem |
ontap-nas |
NFS; good for legacy NFS mounts |
| Live migration support |
Any RWX-capable backend |
Block (RWX) |
Ceph RBD (default RWX) |
RWO blocks live migration |
| VM template / golden image |
Ceph RBD |
Block |
Same as target |
Use CSI clone (COW, near-instant) |
| Disaster recovery (sync) |
ODF Metro DR |
Block |
DR-enabled StorageClass |
RPO=0, requires stretched cluster |
| Disaster recovery (async) |
ODF Regional DR |
Block |
DR-enabled StorageClass |
RPO=minutes, multi-site |
Access mode cheat sheet:
| Access Mode |
Meaning |
Live Migration? |
Use Case |
| RWO (ReadWriteOnce) |
Single node read/write |
No (pinned to one node) |
Non-migratable VMs, temp disks |
| RWX (ReadWriteMany) |
Multi-node read/write |
Yes |
Production VMs (default for Ceph RBD block) |
| ROX (ReadOnlyMany) |
Multi-node read-only |
N/A |
Shared config disks, ISOs |
Essential Commands
| Task |
VMware/vSAN |
OVE (ODF/Ceph) |
Azure Local (S2D) |
| List datastores/pools |
govc datastore.info |
oc get cephblockpool -n openshift-storage |
Get-StoragePool -CimSession $c |
| List storage classes |
N/A (SPBM policies) |
oc get sc |
Get-StoragePool |
| List PVCs |
N/A |
oc get pvc -A |
Get-Volume -CimSession $c |
| List PVs |
N/A |
oc get pv |
Get-VirtualDisk -CimSession $c |
| Check storage capacity |
govc datastore.info ds1 |
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph df |
Get-StoragePool \| select FriendlyName,Size,AllocatedSize |
| Check OSD/disk status |
N/A |
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph osd tree |
Get-PhysicalDisk -CimSession $c |
| Check cluster health |
esxcli vsan health cluster list |
oc exec ... -- ceph status |
Get-HealthFault -CimSession $c |
| Create PVC (block) |
N/A |
oc apply -f pvc-block.yaml |
N/A (Azure portal/PS) |
| Expand PVC |
Extend VMDK |
oc patch pvc my-pvc -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}' |
Resize-VirtualDisk |
| Create snapshot |
govc snapshot.create |
oc apply -f volume-snapshot.yaml |
Checkpoint-VM |
| Restore from snapshot |
govc snapshot.revert |
oc apply -f volume-snapshot-restore.yaml |
Restore-VMSnapshot |
| Clone PVC |
N/A |
DataVolume with source.pvc (CSI clone) |
Copy-VHD |
| Upload disk image |
Content Library |
virtctl image-upload dv my-dv --image-path=disk.qcow2 --size=50Gi |
Azure portal |
| Check I/O stats |
esxtop |
oc exec ... -- ceph osd perf |
Get-StorageQoSFlow |
| List slow OSD ops |
N/A |
oc exec ... -- ceph daemon osd.0 dump_blocked_ops |
N/A |
| Repair/scrub |
esxcli vsan trace |
oc exec ... -- ceph pg deep-scrub <pg-id> |
Repair-VirtualDisk |
| Check rebalance progress |
N/A |
oc exec ... -- ceph -w (watch) |
Get-StorageJob -CimSession $c |
| Set pool replication |
N/A |
oc exec ... -- ceph osd pool set <pool> size 3 |
Set-StorageTier |
| Ceph toolbox shell |
N/A |
oc rsh -n openshift-storage deploy/rook-ceph-tools |
N/A |
| ODF dashboard |
N/A |
OpenShift Console > Storage > Data Foundation |
N/A |
Architecture at a Glance (OVE/ODF)
+===============================================================================+
| VM Guest OS: /dev/vda (virtio-blk or virtio-scsi) |
+===============================================================================+
| QEMU block layer (inside virt-launcher pod) |
| raw block device (block-mode PVC) or qcow2 file (filesystem-mode PVC) |
+===============================================================================+
| Kubernetes PVC (PersistentVolumeClaim) |
| StorageClass -> CSI driver -> Ceph RBD image |
+===============================================================================+
| CSI Layer: ceph-csi (rbd plugin) |
| Controller (Deployment): CreateVolume, Snapshot, Expand, Attach |
| Node (DaemonSet): Stage, Publish (map RBD to /dev on host, bind-mount) |
+===============================================================================+
| Rook-Ceph Operator (manages all Ceph daemons as K8s workloads) |
+===============================================================================+
| RADOS Cluster |
| MON (x3): Paxos quorum, cluster map, CRUSH map |
| MGR (x2): Dashboard, Prometheus metrics, balancer |
| OSD (1 per disk): BlueStore -> raw NVMe/SSD |
| Pool -> PG -> OSD set (selected by CRUSH algorithm) |
| Replication: primary writes -> replicate to secondary + tertiary OSD |
+===============================================================================+
| Physical: NVMe/SSD drives, 25GbE NICs (public + cluster network) |
+===============================================================================+
Storage Troubleshooting Quick Checks
1. Ceph cluster not healthy (HEALTH_WARN or HEALTH_ERR)
# Check overall status
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph status
# Check specific health warnings
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph health detail
# Check OSD status (look for down/out)
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph osd tree
2. PVC stuck in Pending state
# Check PVC events for provisioning errors
oc describe pvc my-pvc -n my-namespace
# Check CSI controller logs for CreateVolume failures
oc logs -n openshift-storage -l app=csi-rbdplugin-provisioner -c csi-rbdplugin --tail=50
# Check StorageClass exists and is default
oc get sc
3. VM disk I/O slow (high latency)
# Check OSD latency (commit_latency_ms > 10 = problem)
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph osd perf
# Check for slow ops (blocked > 30s = degraded)
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph daemon osd.0 dump_blocked_ops
# Check if recovery/backfill is consuming I/O
oc exec -n openshift-storage deploy/rook-ceph-tools -- ceph -s | grep -E 'recovery|backfill'
4. OSD crashed or not starting
# Check OSD pod status
oc get pods -n openshift-storage -l app=rook-ceph-osd --field-selector=status.phase!=Running
# Check OSD pod logs
oc logs -n openshift-storage rook-ceph-osd-<id>-<hash> --previous
# Check underlying disk health
oc debug node/<node-name> -- chroot /host smartctl -a /dev/nvme0n1
5. Volume not attaching to VM (VM stuck in scheduling)
# Check VolumeAttachment objects
oc get volumeattachment | grep <pv-name>
# Check CSI node plugin logs on target node
oc logs -n openshift-storage -l app=csi-rbdplugin --field-selector spec.nodeName=<node> -c csi-rbdplugin --tail=50
# Check if RBD image is locked by another node (stale mapping)
oc exec -n openshift-storage deploy/rook-ceph-tools -- rbd status <pool>/<image>