Storage Study Topics
1. Foundational Concepts
- Block vs File vs Object Storage: The three fundamental storage paradigms — Block provides raw disk volumes, File provides shared filesystems (NFS/SMB), Object provides HTTP-accessible buckets with metadata.
- LVM (Logical Volume Management): The Linux abstraction layer that sits between physical disks and filesystems, allowing volumes to be resized, snapshotted, and striped without touching hardware.
- RAID Levels: The data protection strategies (RAID 0/1/5/6/10) that distribute data and parity across multiple disks for redundancy and performance — still the foundation of all storage systems.
- Thin Provisioning: Allocating storage space on paper but only consuming physical disk as data is actually written — allows overcommitment of capacity.
- Storage Tiering (Hot / Warm / Cold): Automatically moving data between fast (SSD/NVMe) and slow (HDD/archive) storage based on access frequency to optimize cost and performance.
- IOPS / Throughput / Latency: The three performance dimensions of storage — operations per second, data transfer rate, and response time — and how to benchmark them.
2. Current State / VMware Baseline
- vSAN (VMware Virtual SAN): VMware's HCI storage layer that pools local disks across ESXi hosts into a shared datastore — the storage baseline being migrated away from.
3. Storage Protocols
- iSCSI (Internet Small Computer Systems Interface): A way to send traditional hard drive commands (SCSI) over a standard Ethernet/Internet network.
- NVMe-oF (Non-Volatile Memory Express over Fabrics): The standard for sending ultra-fast flash drive commands across a high-speed network "fabric."
- MPIO (Multipath Input/Output): The logic that allows a computer to see multiple cables to a hard drive as one single, highly reliable disk.
- Fibre Channel: The dedicated, purpose-built high-speed network used exclusively for storage traffic in enterprise data centers — separate from Ethernet.
- NFSv3: The classic network filesystem protocol — stateless, simple, widely supported, but lacks built-in security and locking.
- NFSv4: The modern evolution of NFS — stateful, with built-in security (Kerberos), ACLs, and a single port for firewall simplicity.
- SMB / CIFS: The file sharing protocol native to Windows environments — required for Windows VM workloads accessing shared storage.
4. Storage Architectures
- SAN (Storage Area Network): A dedicated high-speed network connecting servers to centralized block storage arrays — the traditional enterprise approach.
- NAS (Network Attached Storage): A file-level storage appliance that serves files over the regular data network via NFS or SMB.
- HCI / Software-Defined Storage (SDS): The paradigm of replacing dedicated storage hardware with software running on standard servers, pooling their local disks into a distributed storage system.
5. Software-Defined Storage Platforms
- Ceph / Rook-Ceph: The dominant open-source distributed storage system providing block, file, and object storage in a single platform. Rook is the Kubernetes operator that automates Ceph lifecycle management.
- OpenShift Data Foundation (ODF): Red Hat's productized and supported version of Ceph for OpenShift — the expected storage backend for OVE.
- Storage Spaces Direct (S2D): Microsoft's HCI storage engine that pools local drives across Azure Local / Windows Server nodes into a resilient, software-defined storage pool.
6. Kubernetes Storage Model
- CSI (Container Storage Interface): The standard plugin API that allows Kubernetes to consume storage from any backend — the storage equivalent of CNI for networking.
- Persistent Volumes / PV / PVC: The Kubernetes abstraction where a PV is a piece of storage provisioned by an admin, and a PVC is a request for storage by a workload.
- StorageClasses: Kubernetes objects that define different "tiers" of storage (fast, replicated, cheap) and enable dynamic provisioning — the user picks a class, the system creates the volume.
7. Data Protection & Operations
- Snapshots & Clones (Storage-Level): Point-in-time copies of volumes — snapshots are read-only references, clones are writable copies. Used for backup, testing, and rapid provisioning.
- Storage Replication / DR: Synchronous or asynchronous copying of data between sites for disaster recovery — RPO (how much data you lose) vs RTO (how long recovery takes).
- Encryption at Rest: Encrypting data on the physical disks so that stolen or decommissioned hardware cannot be read — a regulatory requirement for financial institutions.
- Backup Integration (Veeam, Kasten): How the platform integrates with enterprise backup solutions — agent-based vs agentless, VM-level vs volume-level, application-consistent snapshots.
8. Advanced Topics
- Object Storage (S3-compatible): HTTP-based storage accessed via APIs rather than filesystem mounts — used for backups, logs, artifacts, and unstructured data at scale.
- Data Locality: The strategy of keeping compute and storage on the same physical node to minimize network hops — critical for HCI performance.