Migration Tooling & Formats
Why This Matters
Migrating 5,000+ VMs is the single largest operational risk in this project. It is not a weekend activity. It is a sustained engineering operation spanning months, requiring deep knowledge of disk image formats, conversion toolchains, driver injection, network remapping, and application-level validation. A single misconfigured virtio driver can render a Windows Server VM unbootable. A misunderstood VMDK descriptor file can silently corrupt a database disk. A migration tool that cannot leverage Changed Block Tracking (CBT) will quadruple the cutover window for every VM.
The previous chapters covered the target platforms (KVM/KubeVirt for OVE, Hyper-V for Azure Local) and the source platform (VMware vSphere/ESXi). This chapter covers the bridge between them: the formats, tools, and operational processes that move a running VM from the old world to the new world without losing data, breaking applications, or exhausting the team.
Three distinct migration paths exist for this evaluation:
- VMware to OVE: Uses virt-v2v and/or the Migration Toolkit for Virtualization (MTV). Converts VMDK to QCOW2 or raw, injects virtio drivers, imports into KubeVirt as VirtualMachine CRDs with DataVolumes.
- VMware to Azure Local: Uses Azure Migrate. Converts VMDK to VHD/VHDX, adjusts boot configuration, imports into Hyper-V.
- VMware to Swisscom ESC: VMware-to-VMware migration (vMotion, HCX, or OVA export/import). No hypervisor conversion required because ESC currently runs on VMware vSphere. This is the simplest path technically but does not solve the strategic goal of leaving VMware.
Each path has different risk profiles, throughput characteristics, and failure modes. This chapter covers all three in the depth required to plan a production migration.
Concepts
1. OVA / OVF / VMDK / QCOW2
OVF -- Open Virtualization Format
OVF is a DMTF standard (DSP0243) that describes the metadata of a virtual machine in an XML document. It is not a disk image -- it is a manifest that describes the virtual hardware configuration, references disk files, defines network connections, and carries product metadata. Think of it as the blueprint that tells the target hypervisor how to reconstruct the VM.
An OVF file contains these key sections:
<?xml version="1.0" encoding="UTF-8"?>
<Envelope xmlns="http://schemas.dmtf.org/ovf/envelope/1"
xmlns:ovf="http://schemas.dmtf.org/ovf/envelope/1"
xmlns:rasd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/..."
xmlns:vmw="http://www.vmware.com/schema/ovf">
<!-- References: pointers to the disk files included in this package -->
<References>
<File ovf:href="webserver-disk1.vmdk" ovf:id="file1" ovf:size="8589934592"/>
<File ovf:href="webserver-disk2.vmdk" ovf:id="file2" ovf:size="53687091200"/>
</References>
<!-- DiskSection: logical disk descriptions (capacity, format, parent) -->
<DiskSection>
<Disk ovf:capacity="50" ovf:capacityAllocationUnits="byte * 2^30"
ovf:diskId="vmdisk1" ovf:fileRef="file1"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html"/>
<Disk ovf:capacity="200" ovf:capacityAllocationUnits="byte * 2^30"
ovf:diskId="vmdisk2" ovf:fileRef="file2"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html"/>
</DiskSection>
<!-- NetworkSection: logical network names -->
<NetworkSection>
<Network ovf:name="VM Network">
<Description>The production network</Description>
</Network>
</NetworkSection>
<!-- VirtualSystem: the VM definition -->
<VirtualSystem ovf:id="webserver-01">
<ProductSection>
<Product>Internal Web Server</Product>
<Vendor>Infrastructure Team</Vendor>
<Version>2.1</Version>
</ProductSection>
<OperatingSystemSection ovf:id="101">
<Description>Red Hat Enterprise Linux 9 (64-bit)</Description>
</OperatingSystemSection>
<VirtualHardwareSection>
<!-- CPU: 4 vCPUs -->
<Item>
<rasd:ElementName>4 virtual CPUs</rasd:ElementName>
<rasd:ResourceType>3</rasd:ResourceType>
<rasd:VirtualQuantity>4</rasd:VirtualQuantity>
</Item>
<!-- Memory: 16 GB -->
<Item>
<rasd:ElementName>16384 MB of memory</rasd:ElementName>
<rasd:ResourceType>4</rasd:ResourceType>
<rasd:VirtualQuantity>16384</rasd:VirtualQuantity>
</Item>
<!-- Disk Controller: SCSI -->
<Item>
<rasd:ElementName>SCSI Controller</rasd:ElementName>
<rasd:ResourceSubType>lsilogic</rasd:ResourceSubType>
<rasd:ResourceType>6</rasd:ResourceType>
</Item>
<!-- Network Adapter: VMXNET3 -->
<Item>
<rasd:ElementName>Network adapter 1</rasd:ElementName>
<rasd:ResourceSubType>VmxNet3</rasd:ResourceSubType>
<rasd:ResourceType>10</rasd:ResourceType>
<rasd:Connection>VM Network</rasd:Connection>
</Item>
</VirtualHardwareSection>
</VirtualSystem>
</Envelope>
Why OVF matters for migration: OVF carries the metadata needed to recreate a VM on any platform that understands the format. However, VMware extends OVF with proprietary vmw: namespace elements (ExtraConfig keys, boot options, vApp properties) that other hypervisors ignore. During migration, these VMware-specific elements are typically discarded, and their equivalent settings must be reconfigured on the target platform manually or through the migration tool.
OVA -- Open Virtual Appliance
An OVA file is simply an OVF file plus all referenced disk files (VMDKs), packaged together as a single TAR archive. The TAR is not compressed -- the individual VMDKs inside may be compressed (streamOptimized format), but the TAR envelope is a plain concatenation.
OVA File Structure (TAR Archive)
================================================================
+--------------------------------------------------------------+
| webserver-01.ova (TAR file) |
| |
| +--------------------------------------------------------+ |
| | webserver-01.ovf (XML manifest) | |
| | - Virtual hardware definition (CPU, RAM, NICs) | |
| | - Disk references (file1, file2) | |
| | - Network mappings | |
| | - Product metadata | |
| +--------------------------------------------------------+ |
| |
| +--------------------------------------------------------+ |
| | webserver-01.mf (SHA256 manifest) | |
| | - SHA256(webserver-01.ovf) = a3f8c1... | |
| | - SHA256(webserver-disk1.vmdk) = 7b2e4d... | |
| | - SHA256(webserver-disk2.vmdk) = e9f1a2... | |
| +--------------------------------------------------------+ |
| |
| +--------------------------------------------------------+ |
| | webserver-01.cert (optional code signing) | |
| +--------------------------------------------------------+ |
| |
| +--------------------------------------------------------+ |
| | webserver-disk1.vmdk (boot disk, 50 GB) | |
| | - streamOptimized format for portability | |
| | - Sparse: only allocated blocks are stored | |
| +--------------------------------------------------------+ |
| |
| +--------------------------------------------------------+ |
| | webserver-disk2.vmdk (data disk, 200 GB) | |
| | - streamOptimized format | |
| +--------------------------------------------------------+ |
+--------------------------------------------------------------+
Key: OVF must be the FIRST file in the TAR archive.
The .mf file contains checksums for integrity verification.
The .cert file is an optional X.509 signature.
Practical note: When exporting VMs from vSphere for migration, OVA is convenient for single-VM transfers but inefficient at scale. Each OVA must be fully written to disk before it can be imported. For 5,000+ VM migrations, direct disk-level access via the vSphere API (VDDK/nbdkit) is far more efficient than OVA export/import.
VMDK -- Virtual Machine Disk
VMDK is VMware's virtual disk format. It is more complex than most people realize. A VMDK is not a single monolithic file -- it consists of a descriptor file (a small text file with metadata) and one or more extent files (the actual data blocks).
VMDK Descriptor File:
VMDK Descriptor File (webserver-disk1.vmdk)
================================================================
# Disk DescriptorFile
version=1
CID=fffffffe
parentCID=ffffffff
createType="vmfsSparse"
# Extent description
# access size-in-sectors type filename
RW 104857600 VMFSSPARSE "webserver-disk1-flat.vmdk"
# The Disk Data Base (DDB) -- virtual geometry
ddb.virtualHWVersion = "19"
ddb.geometry.cylinders = "6527"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.adapterType = "lsilogic"
ddb.thinProvisioned = "1"
ddb.uuid = "60 00 C2 93 e4 d1 a7 bc-4d 8a 2c 73 45 9e 12 f7"
VMDK format variants and their migration implications:
| VMDK Type | createType | Description | Migration Implication |
|---|---|---|---|
| Thin | vmfsThin |
Allocates space on demand. The extent file grows as data is written. | Most common in production. Sparse on export. qemu-img can convert efficiently. |
| Thick Eager Zero | vmfsFlat (eagerzeroedthick) |
All space pre-allocated and zeroed at creation. | Full-size extent file. Conversion reads the entire file even if mostly zeros. Use qemu-img convert -O qcow2 to reclaim zero space. |
| Thick Lazy Zero | vmfsFlat (zeroedthick) |
All space pre-allocated but not zeroed until first write. | Similar to eager zero for migration -- the extent file is full size on disk. |
| Sparse | monolithicSparse |
Single-file VMDK with embedded descriptor and sparse data. | Common for OVA exports. Single file simplifies handling. |
| Split Sparse | twoGbMaxExtentSparse |
Split into 2 GB chunks. Used for FAT32 filesystems. | Rare in enterprise. Must be reassembled before conversion. |
| Stream Optimized | streamOptimized |
Compressed, read-only format designed for OVA distribution. | Cannot be used directly. Must be converted to flat or sparse before use. |
| seSparse | vmfsSparse (SE) |
Space-efficient sparse. VMware's modern snapshot format. | Used for VM snapshots on VMFS 6+. Conversion must handle the grain directory structure. |
| vmfsSparse | vmfsSparse |
Legacy sparse format for snapshots. | Snapshot delta disks. Must be consolidated before migration. |
VMDK Internal Structure (Sparse VMDK)
================================================================
+--------------------------------------------------------------+
| VMDK File |
| |
| +-- Header (512 bytes) --------------------------------+ |
| | Magic: KDMV (0x564d444b) | |
| | Version: 1 | |
| | Flags: 3 (valid new line detection + redundant GT) | |
| | Capacity: 104857600 sectors (50 GB) | |
| | Grain Size: 128 sectors (64 KB) | |
| | Descriptor Offset: 1 (sector) | |
| | Descriptor Size: 20 (sectors) | |
| | Num GTE per GT: 512 | |
| | GD Offset: sector of grain directory | |
| | Overhead: sectors before grain data starts | |
| +------------------------------------------------------+ |
| |
| +-- Embedded Descriptor ------+ |
| | (same as text descriptor | |
| | shown above) | |
| +-----------------------------+ |
| |
| +-- Grain Directory (GD) -----+ |
| | GD[0] -> GT sector 100 | Points to grain tables |
| | GD[1] -> GT sector 200 | |
| | GD[2] -> 0 (not allocated) | Zero = no data in range |
| | GD[3] -> GT sector 300 | |
| | ... | |
| +-----------------------------+ |
| |
| +-- Grain Tables (GT) --------+ |
| | GT[0]: | |
| | GTE[0] -> grain at 1000 | Points to 64KB data grains |
| | GTE[1] -> 0 (sparse) | Zero = unallocated (thin) |
| | GTE[2] -> grain at 1128 | |
| | ... | |
| | GT[1]: | |
| | GTE[0] -> grain at 2000 | |
| | ... | |
| +-----------------------------+ |
| |
| +-- Data Grains ----+----+----+----+ |
| | Grain 0 (64 KB) | G1 | G2 | G3 | ... |
| | Actual VM data | | | | |
| +-------------------+----+----+----+ |
+--------------------------------------------------------------+
Two-level lookup: GD -> GT -> Grain (data)
Unallocated regions (sparse) are represented by zero entries.
Snapshot chains: When a VMware snapshot is taken, the original VMDK becomes read-only, and a new delta VMDK (redo log) is created in vmfsSparse or seSparse format. Each subsequent snapshot adds another delta file. Reads traverse the chain from newest to oldest delta until a non-zero block is found. Before migration, all snapshot chains must be consolidated into a single flat VMDK. Migrating a VM with active snapshots is not supported by any conversion tool and will fail or produce a corrupted disk.
QCOW2 -- QEMU Copy-On-Write Version 2
QCOW2 is KVM's native disk image format and the format most commonly used for VM disks on OVE. It is far more sophisticated than a simple flat file. QCOW2 supports thin provisioning, snapshots, backing files (for template-based VMs), compression, AES encryption, and preallocation modes -- all managed through an internal two-level table structure.
QCOW2 Internal Structure
================================================================
+--------------------------------------------------------------+
| QCOW2 File |
| |
| +-- Header (variable, min 104 bytes) --+ |
| | Magic: QFI\xfb (0x514649fb) | |
| | Version: 3 | |
| | Backing file offset: 0 (or pointer) | |
| | Backing file size: 0 (or length) | |
| | Cluster bits: 16 (64 KB clusters) | |
| | Size: 53687091200 (50 GB virtual) | |
| | Crypt method: 0 (none) | |
| | L1 size: 800 entries | |
| | L1 table offset: 0x30000 | |
| | Refcount table offset: 0x10000 | |
| | Refcount table clusters: 1 | |
| | Nb snapshots: 0 | |
| | Header extensions: | |
| | - Feature name table | |
| | - Bitmap directory (dirty tracking) | |
| +---------------------------------------+ |
| |
| +-- L1 Table (top-level index) --------+ |
| | L1[0] -> L2 table at offset 0x40000 | |
| | L1[1] -> L2 table at offset 0x50000 | |
| | L1[2] -> 0 (unallocated range) | |
| | L1[3] -> L2 table at offset 0x60000 | |
| | ... | |
| | Each L1 entry covers: | |
| | L2_entries * cluster_size | |
| | = 8192 * 64KB = 512 MB per L1 | |
| +----- --------------------------------+ |
| |
| +-- L2 Tables (second-level index) ----+ |
| | L2[0]: | |
| | Entry[0] -> cluster at 0x100000 | |
| | Entry[1] -> 0 (unallocated) | |
| | Entry[2] -> cluster at 0x110000 | |
| | Entry[3] -> compressed cluster* | |
| | ... | |
| | * Compressed entries use bits 62:0 | |
| | for offset and bits 63 for flag | |
| +---------------------------------------+ |
| |
| +-- Refcount Table + Refcount Blocks --+ |
| | Tracks how many references point | |
| | to each cluster (for snapshot COW): | |
| | | |
| | Refcount = 1: normal allocation | |
| | Refcount = 2+: shared by snapshots | |
| | Refcount = 0: free cluster | |
| | | |
| | When a snapshot-shared cluster is | |
| | written, QEMU copies it to a new | |
| | cluster (COW) and updates the L2 | |
| | entry. The old cluster's refcount | |
| | decrements. | |
| +--------------------------------------+ |
| |
| +-- Data Clusters ---+-------+-------+ |
| | Cluster 0 (64 KB) | C1 | C2 | ... |
| | Actual VM data | | | |
| +--------------------+-------+-------+ |
+--------------------------------------------------------------+
Address resolution:
Guest offset -> L1 index -> L2 table -> cluster offset
L1 index = guest_offset / (L2_entries * cluster_size)
L2 index = (guest_offset / cluster_size) % L2_entries
QCOW2 backing files (template chains):
QCOW2 supports a backing_file pointer in the header. When a cluster is read but not present in the current file (L2 entry is zero), QEMU reads it from the backing file instead. This enables instant VM provisioning from templates: the template is a read-only base image, and each VM gets a thin overlay QCOW2 that only stores the differences.
QCOW2 Backing File Chain (Template Pattern)
================================================================
+----------------------------+
| golden-image.qcow2 | <-- Base image (read-only)
| (RHEL 9 template, 3 GB) | Contains full OS install
+----------------------------+
^ ^
| |
+------+-----+ +--+------------+
| vm-01.qcow2| | vm-02.qcow2 | <-- Overlay images (read-write)
| backing: | | backing: | Only store changed clusters
| golden- | | golden- | Initial size: ~200 KB
| image.qcow2| | image.qcow2 |
| (50 MB of | | (120 MB of |
| changes) | | changes) |
+-------------+ +--------------+
Read path for vm-01:
1. Guest reads sector X
2. QEMU checks L1/L2 in vm-01.qcow2
3. If cluster allocated -> return data from vm-01.qcow2
4. If cluster NOT allocated -> read from golden-image.qcow2
5. If also not in backing file -> return zeros
Write path for vm-01:
1. Guest writes sector X
2. QEMU allocates new cluster in vm-01.qcow2 (COW)
3. Writes data to vm-01.qcow2
4. Backing file is never modified
QCOW2 preallocation modes:
| Mode | Behavior | Use Case |
|---|---|---|
off (default) |
Clusters allocated on first write. File starts small. | General workloads. Best storage efficiency. |
metadata |
L1/L2 tables pre-allocated, data clusters allocated on write. | Reduces metadata allocation overhead during I/O. |
falloc |
File space pre-allocated via fallocate(), data zeroed lazily. |
Avoids fragmentation. Near-raw performance for sequential I/O. |
full |
File space pre-allocated and fully zeroed. | Maximum performance. No allocation overhead during writes. Same size as virtual disk. |
Raw Format
A raw disk image is a byte-for-byte representation of a virtual disk. No headers, no metadata tables, no L1/L2 indirection. Offset 0 in the file is LBA 0 on the virtual disk. Offset N is LBA N.
Advantages:
- Maximum I/O performance: no metadata lookup on every read/write
- Simplest format: any tool can read it directly (dd, hexdump, mount via loopback)
- No format-specific bugs or corruption risks
Disadvantages:
- No thin provisioning at the image level (the file is always the full virtual disk size unless the filesystem supports sparse files or fallocate with hole-punching)
- No snapshots, no backing files, no compression within the format itself
When to use raw on OVE: When the underlying storage system handles thin provisioning and snapshots at the block device level (e.g., Ceph RBD, LVM thin), raw is the preferred format because the QCOW2 L1/L2 tables become redundant overhead. OVE with ODF (OpenShift Data Foundation, backed by Ceph RBD) typically uses raw images stored in PersistentVolumeClaims because Ceph itself provides thin provisioning, snapshots, and cloning at the RADOS level.
When to use QCOW2 on OVE: When the underlying storage does not provide native thin provisioning or snapshot capabilities (e.g., local disks, basic NFS), QCOW2's built-in features become necessary. QCOW2 is also useful when backing file chains (template patterns) are needed without storage-level COW.
Format Conversion: qemu-img
qemu-img is the Swiss Army knife for disk format conversion. It is the core utility used by virt-v2v, MTV, and manual migration workflows.
# VMDK to QCOW2 (most common migration conversion)
qemu-img convert -f vmdk -O qcow2 source.vmdk target.qcow2
# VMDK to raw (for Ceph/RBD-backed storage)
qemu-img convert -f vmdk -O raw source.vmdk target.raw
# QCOW2 to raw (for performance-critical VMs)
qemu-img convert -f qcow2 -O raw source.qcow2 target.raw
# With progress output and parallel I/O
qemu-img convert -p -W -f vmdk -O qcow2 source.vmdk target.qcow2
# ^ ^
# | +-- Write target in parallel (multiple coroutines)
# +---- Show progress percentage
# With QCOW2 options: preallocation, cluster size, compression
qemu-img convert -f vmdk -O qcow2 \
-o preallocation=metadata,cluster_size=65536,compat=1.1 \
source.vmdk target.qcow2
# Check image integrity after conversion
qemu-img check target.qcow2
# Show image info (format, virtual size, actual size, backing file)
qemu-img info --output=json target.qcow2
Conversion performance and space implications:
| Source Format | Target Format | Throughput (10 Gbps network, NVMe local) | Space Change |
|---|---|---|---|
| VMDK thin (50 GB virtual, 20 GB actual) | QCOW2 | ~500-800 MB/s with -W |
~20-22 GB (slight metadata overhead) |
| VMDK thin (50 GB virtual, 20 GB actual) | raw | ~500-800 MB/s | 50 GB (no thin provisioning in raw) or 20 GB if filesystem supports sparse |
| VMDK thick (50 GB virtual, 50 GB actual) | QCOW2 | ~400-600 MB/s | ~20 GB (QCOW2 reclaims zero blocks) |
| VMDK streamOptimized | QCOW2 | ~200-400 MB/s (decompress overhead) | Varies by content |
Practical tip: For large-scale migrations, always use -W (parallel writes) with qemu-img convert. Without it, conversion is serialized and substantially slower. Also consider -m 4 or higher to use multiple coroutines for reading.
vSphere API for Disk Export (VDDK, nbdkit)
In a production migration of 5,000+ VMs, you do not export OVA files manually. You use programmatic APIs to stream disk data directly from the vSphere storage layer.
VDDK (Virtual Disk Development Kit): VMware's proprietary C library for reading and writing VMDK files remotely. VDDK connects to vCenter, authenticates, opens a VM's disk, and reads blocks over the network. It supports Changed Block Tracking (CBT) -- the ability to read only the blocks that have changed since a specific snapshot, which is critical for warm migrations and incremental replication.
nbdkit: An open-source NBD (Network Block Device) server with a VDDK plugin. nbdkit acts as a bridge: it uses VMware's VDDK library to connect to vCenter and exposes the VM's disk as an NBD endpoint that open-source tools (qemu-img, virt-v2v) can read natively. This is how virt-v2v and MTV access VMware disks without needing direct VMFS access.
Disk Export Path: vCenter -> VDDK -> nbdkit -> qemu-img
================================================================
+----------+ HTTPS/SOAP +----------+
| vCenter | <----- API -------> | nbdkit |
| Server | | (with |
+----------+ | VDDK |
| | plugin) |
| VMkernel data path +----------+
| (NFC / NBDSSL) |
v | NBD protocol
+----------+ | (unix socket
| ESXi | | or TCP)
| Host | v
| +------+ | +----------+
| | VMDK | | ---- disk data ---> | qemu-img |
| | on | | (streamed) | convert |
| | VMFS | | +----------+
| +------+ | |
+----------+ v
+----------+
| target |
| .qcow2 |
| or .raw |
+----------+
CBT (Changed Block Tracking) flow for warm migration:
1. Initial snapshot: read ALL blocks via VDDK
2. VM continues running, CBT tracks changes
3. Delta sync: read ONLY changed blocks since last snapshot
4. Repeat delta syncs until change set is small
5. Final cutover: quiesce VM, read last delta, convert, boot on target
VDDK licensing: VDDK is freely downloadable from VMware but requires acceptance of VMware's SDK license agreement. The VDDK .so library must be placed on the migration host and is not redistributable. MTV ships a mechanism to mount the VDDK library into its conversion pods.
2. virt-v2v / Migration Toolkit for Virtualization (MTV)
virt-v2v: The Upstream Conversion Tool
virt-v2v is a command-line tool from the libguestfs project that converts virtual machines from foreign hypervisors (VMware, Hyper-V) to KVM. It is the upstream engine that MTV wraps in a Kubernetes-native workflow. Understanding virt-v2v's internals is essential because it is the tool that performs the actual conversion work -- even inside MTV, the conversion pod runs virt-v2v.
What virt-v2v does in a single conversion:
- Connects to the source -- vCenter (via VDDK/nbdkit) or local disk file
- Copies the disk(s) -- streams VMDK data through nbdkit, writes to the target format
- Inspects the guest OS -- uses libguestfs to mount the guest filesystem read-only and identify the OS type, version, installed drivers, and bootloader
- Removes source hypervisor artifacts -- uninstalls VMware Tools (open-vm-tools), removes VMware SVGA driver, removes VMware paravirtual SCSI driver references from boot configuration
- Injects target hypervisor drivers -- installs virtio drivers (vioscsi, viostor, NetKVM, balloon) for Windows; verifies virtio modules are present in initramfs for Linux
- Fixes the bootloader -- updates GRUB/BCD to reference virtio SCSI instead of VMware PVSCSI or LSI Logic
- Adjusts guest OS configuration -- fixes NIC naming (eth0 vs ens192 vs persistent names), updates fstab if needed, reconfigures network manager
- Outputs the result -- writes converted disk plus domain XML (for libvirt) or creates KubeVirt resources (when used with MTV)
virt-v2v Conversion Pipeline
================================================================
+---------+ +----------+ +----------+ +----------+
| Source | | Disk | | Guest OS | | Driver |
| Connect | --> | Copy & | --> | Inspect | --> | Inject |
| | | Convert | | | | |
| vCenter | | VMDK -> | | Mount FS | | Windows: |
| via VDDK | | QCOW2/ | | Detect | | virtio- |
| or local | | raw | | OS type | | win ISO |
| file | | | | Detect | | Linux: |
| | | | | drivers | | initramfs|
+---------+ +----------+ +----------+ | rebuild |
+----------+
|
v
+----------+ +----------+
| Boot | | Output |
| Fixup | --> | Write |
| | | |
| GRUB for | | QCOW2 + |
| Linux | | libvirt |
| BCD for | | XML, or |
| Windows | | kubevirt |
+----------+ | CRD |
+----------+
Input modes:
# Mode 1: Direct from VMware vCenter (most common for production)
virt-v2v -i vmx \
-ic "vpx://vcenter.example.com/Datacenter/cluster/esxi-host?no_verify=1" \
-it vddk \
-io vddk-libdir=/opt/vmware-vddk-lib-7.0 \
-io vddk-thumbprint=AA:BB:CC:DD:... \
"vm-name" \
-o kubevirt \
-os /output/directory
# Mode 2: From local disk files (OVA or individual VMDK)
virt-v2v -i ova /path/to/exported.ova \
-o qemu \
-os /output/directory
# Mode 3: From a running Hyper-V VM (via SSH)
virt-v2v -i disk /path/to/disk.vhdx \
-o local \
-os /output/directory
Driver Injection -- The Critical Step
Driver injection is where most conversions succeed or fail. A VM without the correct storage and network drivers for the target hypervisor cannot boot.
Linux VMs:
Linux VMs are generally straightforward. The virtio kernel modules (virtio_blk, virtio_scsi, virtio_net, virtio_balloon, virtio_rng) have been included in the mainline Linux kernel since version 2.6.25 (2008). For any RHEL 6+, SLES 12+, Ubuntu 14.04+, or Debian 8+ VM, the drivers are already in the kernel. virt-v2v's job for Linux is:
- Verify that the virtio modules exist in the kernel modules directory
- Rebuild the initramfs/initrd to include virtio modules (so the boot disk is accessible during early boot)
- Update GRUB configuration to remove VMware-specific kernel parameters
- Update
/etc/fstabif disk device names change (e.g.,/dev/sdastays the same with virtio-scsi, but/dev/vdaif using virtio-blk) - Remove VMware-specific udev rules that persist NIC names based on VMware MAC address prefixes
Windows VMs -- The Hard Case:
Windows does not include virtio drivers by default. They must be injected into the guest's driver store before the VM can boot on KVM. This is where the virtio-win driver package becomes critical.
The virtio-win ISO (or MSI installer) provides these drivers:
| Driver | Device | Purpose |
|---|---|---|
viostor |
VirtIO SCSI controller (legacy) | Block storage access for the boot disk |
vioscsi |
VirtIO SCSI controller (modern) | Preferred over viostor for new installations |
NetKVM |
VirtIO network adapter | Network connectivity |
Balloon |
VirtIO balloon device | Memory ballooning for overcommitment |
qxldod / qxl |
QXL display adapter | Console display (replaces VMware SVGA) |
pvpanic |
PV Panic device | Crash notification to hypervisor |
vioinput |
VirtIO input device | Keyboard/mouse for SPICE/VNC |
viorng |
VirtIO RNG device | Hardware random number generator |
viofs |
VirtIO FS (virtiofs) | Host-guest shared filesystem |
vioserial |
VirtIO serial device | QEMU Guest Agent communication channel |
fwcfg |
QEMU fw_cfg | Firmware configuration data access |
Windows Driver Injection Flow (virt-v2v)
================================================================
1. Mount Windows partition via libguestfs (read-write)
2. Detect Windows version and architecture:
- Read: \Windows\System32\config\SOFTWARE registry
- Determine: Windows Server 2019 x64, build 17763
3. Copy drivers from virtio-win ISO to guest:
Source: virtio-win.iso:/vioscsi/2k19/amd64/
Target: \Windows\System32\drivers\vioscsi.sys
\Windows\INF\vioscsi.inf
\Windows\INF\vioscsi.cat
4. Inject drivers into Windows driver store:
- Write registry entries to:
HKLM\SYSTEM\ControlSet001\Services\vioscsi
HKLM\SYSTEM\ControlSet001\Services\netkvm
HKLM\SYSTEM\ControlSet001\Services\viostor
- Mark drivers as boot-start (Start=0 for storage drivers)
- This ensures Windows loads the driver during boot,
before the filesystem is available
5. Update BCD (Boot Configuration Data):
- Ensure the boot disk reference uses the VirtIO SCSI
controller path instead of VMware PVSCSI
6. Remove VMware Tools:
- Delete VMware Tools service entries
- Remove VMware SVGA driver
- Remove VMware mouse driver
- Clean up "vm-tools-upgrader" scheduled tasks
7. Handle NIC renaming:
- Windows assigns NICs persistent names based on PCI slot
- New virtio NIC gets a new name ("Ethernet 2" or "Ethernet 3")
- Old VMware NIC configuration (IP, DNS, routes) is orphaned
- virt-v2v attempts to preserve network configuration,
but manual verification is recommended
Common Windows conversion failures:
| Failure | Symptom | Cause | Resolution |
|---|---|---|---|
| Blue Screen (BSOD) at boot | INACCESSIBLE_BOOT_DEVICE (0x7B) | Storage driver not injected or wrong version | Verify vioscsi/viostor driver matches Windows version. Boot from recovery media and inject manually. |
| No network after boot | VM boots but has no connectivity | NetKVM driver not installed or IP config lost | Install NetKVM driver from virtio-win ISO inside the guest. Reconfigure IP settings. |
| UEFI boot failure | VM does not POST, no bootloader found | Secure Boot enabled in source, OVMF does not have the same certificates | Disable Secure Boot in VM config or enroll the appropriate certificates in OVMF |
| Windows activation | "Windows is not activated" error | Hardware ID changed (motherboard UUID, BIOS serial) | Re-activate Windows. For KMS-activated VMs, ensure KMS server is reachable. For MAK/OEM, may need to contact Microsoft. |
| NIC ordering change | Applications bound to "Local Area Connection 2" cannot connect | VMware NIC removed, virtio NIC added with different name | Rename the NIC in Windows network settings, or update application configuration |
| Time zone / clock drift | Clock is wrong after migration | VMware uses UTC for BIOS clock, Windows expects local time, KVM defaults may differ | Set <clock offset="localtime"/> in libvirt XML for Windows VMs, or configure Windows to use UTC |
| Mouse/keyboard not working | Cannot interact with console | QXL/VirtIO input drivers not installed | Install full virtio-win driver set. Use tablet input device in VM config. |
| Hyper-V enlightenments | Poor performance on KVM | Windows detects KVM but Hyper-V enlightenments not enabled | Enable Hyper-V enlightenments in KubeVirt VM spec: hyperv: {relaxed, vapic, spinlocks, vpindex, runtime, synic, stimer, reset, frequencies} |
UEFI / Secure Boot considerations:
VMs with UEFI firmware (as opposed to legacy BIOS) require additional handling during conversion:
- The VM must be configured to use OVMF (Open Virtual Machine Firmware) on the target KVM host
- The EFI System Partition (ESP) must be preserved during disk conversion
- If Secure Boot was enabled on VMware, the VM's boot chain (bootloader, kernel) was signed with VMware's or Microsoft's certificates. OVMF includes Microsoft's UEFI CA certificates by default, so most Windows VMs with Secure Boot will work. Custom certificate chains require manual enrollment.
- virt-v2v detects UEFI firmware and sets the appropriate libvirt XML (
<os firmware="efi"/>)
MTV -- Migration Toolkit for Virtualization
MTV is Red Hat's productized, Kubernetes-native migration tool built on top of virt-v2v. While virt-v2v is a single-VM command-line tool, MTV provides a multi-VM, workflow-driven migration platform with a web UI, REST API, provider inventory, network/storage mapping, migration plans, and wave execution. MTV is the tool that makes migrating 5,000 VMs operationally feasible -- virt-v2v alone would require 5,000 manual invocations.
MTV is based on the upstream open-source project Forklift (previously called Konveyor Forklift), which is part of the Konveyor community project for application modernization.
MTV Architecture:
MTV (Migration Toolkit for Virtualization) Architecture
================================================================
+=====================================================================+
| OpenShift / Kubernetes Cluster (Target) |
| |
| +---------------------------------------------------------------+ |
| | MTV Operator (Deployed via OLM) | |
| | | |
| | +-------------------------+ +----------------------------+ | |
| | | forklift-controller | | forklift-ui | | |
| | | | | (Web console plugin) | | |
| | | - Reconciles Migration | | | | |
| | | CRDs | | - Provider management | | |
| | | - Orchestrates plans | | - Plan creation wizard | | |
| | | - Manages conversion | | - Migration monitoring | | |
| | | pods | | - Network/storage mapping | | |
| | | - Handles rollback | | | | |
| | +-------------------------+ +----------------------------+ | |
| | | |
| | +-------------------------+ +----------------------------+ | |
| | | forklift-validation | | inventory-service | | |
| | | | | | | |
| | | - Pre-migration checks | | - Discovers VMs from | | |
| | | - OS compatibility | | vCenter provider | | |
| | | - Driver availability | | - Caches VM metadata | | |
| | | - Disk/NIC analysis | | - Tracks provider state | | |
| | | - Warm migration | | - Provides API for UI | | |
| | | feasibility | | and controller | | |
| | +-------------------------+ +----------------------------+ | |
| +---------------------------------------------------------------+ |
| |
| Per-VM Migration Execution: |
| +---------------------------------------------------------------+ |
| | Conversion Pod (per VM) | |
| | +----------------------+ +-------------------------------+ | |
| | | nbdkit + VDDK | | virt-v2v | | |
| | | | | | | |
| | | Connects to vCenter | | Converts disk format | | |
| | | Reads VMDK blocks | | Injects virtio drivers | | |
| | | Streams via NBD | | Fixes bootloader | | |
| | +----------------------+ +-------------------------------+ | |
| | | | |
| | v | |
| | +-------------------------------+ | |
| | | CDI (Containerized Data | | |
| | | Importer) | | |
| | | | | |
| | | Writes converted disk into | | |
| | | PVC (PersistentVolumeClaim) | | |
| | | as DataVolume | | |
| | +-------------------------------+ | |
| +---------------------------------------------------------------+ |
| |
| Result: |
| +---------------------------------------------------------------+ |
| | VirtualMachine CRD + DataVolume(s) | |
| | Ready to start on KubeVirt | |
| +---------------------------------------------------------------+ |
+=====================================================================+
Source:
+==========================+
| VMware vCenter |
| - Provides VM inventory |
| - VDDK disk access |
| - CBT for warm migration|
+==========================+
MTV Custom Resources:
MTV extends the Kubernetes API with several CRDs that define the migration workflow:
| CRD | Purpose |
|---|---|
Provider |
Defines a connection to a source (VMware vCenter, RHV, OpenStack) or target (KubeVirt) platform. Contains credentials, URL, and TLS configuration. |
NetworkMap |
Maps source networks (VMware port groups) to target networks (KubeVirt NADs or pod network). |
StorageMap |
Maps source datastores (VMFS, NFS) to target storage classes (e.g., ocs-storagecluster-ceph-rbd). |
Plan |
Defines a migration plan: which VMs to migrate, which network/storage maps to use, warm vs. cold mode, and pre/post-migration hooks. |
Migration |
Represents the execution of a Plan. Created when the user starts the migration. Tracks status per VM. |
Hook |
A pre- or post-migration hook (an Ansible playbook or container image) that runs at specific points in the migration lifecycle. |
Migration workflow: Provider to Plan to Migration
MTV Migration Workflow
================================================================
Step 1: Create Provider (connect to vCenter)
+----------------------------------------------------------+
| apiVersion: forklift.konveyor.io/v1beta1 |
| kind: Provider |
| metadata: |
| name: vmware-prod |
| spec: |
| type: vsphere |
| url: https://vcenter.example.com/sdk |
| secret: vcenter-credentials # vCenter username/pwd |
+----------------------------------------------------------+
|
| Inventory service discovers all VMs, networks,
| datastores, hosts, clusters from vCenter
v
Step 2: Create NetworkMap + StorageMap
+----------------------------------------------------------+
| NetworkMap: "VM Network" -> "br-prod" (Multus NAD) |
| "Backup LAN" -> "br-backup" (Multus NAD) |
| |
| StorageMap: "datastore-ssd" -> "ocs-rbd" (StorageClass) |
| "datastore-hdd" -> "ocs-rbd-bulk" |
+----------------------------------------------------------+
|
v
Step 3: Create Plan (select VMs, configure migration type)
+----------------------------------------------------------+
| Plan: "wave-01-webservers" |
| VMs: [web-01, web-02, web-03, ..., web-50] |
| NetworkMap: prod-network-map |
| StorageMap: prod-storage-map |
| Type: warm (pre-copy with CBT) |
| Hooks: |
| pre: run-pre-check-playbook |
| post: run-smoke-test-playbook |
+----------------------------------------------------------+
|
| Validation service checks each VM:
| - OS supported?
| - Disk format convertible?
| - Network mappings valid?
| - Snapshots consolidated?
| - VMware Tools installed? (needed for quiesce)
v
Step 4: Execute Migration (create Migration CR)
+----------------------------------------------------------+
| Migration: "wave-01-execution-1" |
| Plan: wave-01-webservers |
| Status per VM: |
| web-01: CopyingDisk (45%) |
| web-02: ConvertingGuest |
| web-03: Pending |
+----------------------------------------------------------+
|
| Controller creates conversion pods, manages
| parallelism, tracks progress, handles failures
v
Step 5: Result
+----------------------------------------------------------+
| VirtualMachine CRDs created in target namespace |
| DataVolumes with converted disks in PVCs |
| VMs ready to start |
+----------------------------------------------------------+
Warm vs. Cold Migration
The choice between warm and cold migration directly impacts downtime per VM. For 5,000+ VMs, this choice scales to determine the total project timeline and business disruption.
Cold migration:
Cold Migration Timeline (per VM)
================================================================
Source (VMware) Target (KubeVirt)
| |
| VM running normally |
| ======================== |
| |
t=0 SHUT DOWN VM |
| | |
| | VM is offline |
| | |
| +-- Full disk copy --------->| Copy entire disk(s)
| | (100 GB @ 500 MB/s | via VDDK + nbdkit
| | = ~200 seconds) |
| | |
| +-- Convert (virt-v2v) ----->| Driver injection,
| | (~60-120 seconds) | bootloader fixup
| | |
| +-- Import to PVC --------->| CDI writes to DataVolume
| (~30-60 seconds) |
| |
| | START VM on KubeVirt
| | ======================
| |
| Total downtime: ~5-7 minutes |
| for a 100 GB VM |
| |
| For a 2 TB VM: |
| Full copy: ~70 minutes |
| Total downtime: ~75 minutes |
- Pros: Simple, reliable, no dependency on CBT or snapshot support
- Cons: Downtime equals the full disk copy time plus conversion time. For large VMs (500 GB to 2 TB+), downtime can be hours.
Warm migration (pre-copy with CBT):
Warm Migration Timeline (per VM)
================================================================
Source (VMware) Target (KubeVirt)
| |
| VM running normally |
| ======================== |
| |
t=0 Create snapshot S1 |
| Enable CBT |
| | |
| +-- Full disk copy --------->| Copy ALL blocks
| | (100 GB @ 500 MB/s | (initial sync)
| | = ~200 seconds) |
| | |
| VM continues running |
| CBT tracks changed blocks |
| | |
t+1h Create snapshot S2 |
| | |
| +-- Delta copy 1 ---------->| Copy ONLY changed blocks
| | (2 GB @ 500 MB/s | since S1 (via CBT query)
| | = ~4 seconds) |
| | |
| VM continues running |
| CBT tracks changes since S2 |
| | |
t+2h Create snapshot S3 |
| | |
| +-- Delta copy 2 ---------->| Copy changed blocks
| | (500 MB = ~1 second) | since S2
| | |
| VM continues running |
| | |
t+Xh CUTOVER (scheduled window) |
| | |
| +-- QUIESCE VM |
| | (graceful shutdown or |
| | freeze I/O) |
| | |
| +-- Final delta copy ------->| Copy last changes
| | (50 MB = <1 second) | since S3
| | |
| +-- Convert (virt-v2v) ----->| Driver injection,
| | (~60-120 seconds) | bootloader fixup
| | |
| +-- Import to PVC --------->| CDI writes to DataVolume
| (~30-60 seconds) |
| |
| | START VM on KubeVirt
| | ======================
| |
| Total downtime: ~2-4 minutes |
| regardless of disk size! |
| |
| Pre-copy phase (background): |
| hours/days before cutover |
- Pros: Downtime is nearly constant regardless of VM disk size. The bulk of the data is copied while the VM is still running. The cutover window is short and predictable.
- Cons: Requires CBT to be enabled on the source VM (VMware must support it, and the VM must not have snapshots that prevent CBT). Requires VDDK integration. More complex orchestration. If the VM's change rate is extremely high (e.g., a database with constant writes), the delta sets may not converge.
- When to use warm migration: For any VM with more than ~50 GB of disk data, or any VM where downtime must be minimized (production databases, trading systems, customer-facing applications).
MTV warm migration mechanics:
MTV implements warm migration through a series of snapshots and delta copies coordinated by the forklift-controller:
- The controller creates an initial VMware snapshot on the source VM and enables CBT if not already active
- A conversion pod copies all disk blocks via VDDK/nbdkit to a staging PVC
- After the initial copy completes, the controller creates a new snapshot
- The conversion pod queries CBT for changed blocks since the previous snapshot and copies only those blocks
- Steps 3-4 repeat on a configurable interval (default: every 60 minutes)
- When the operator triggers cutover, the controller: a. Quiesces the source VM (if VMware Tools is installed) or shuts it down b. Creates a final snapshot and copies the last delta c. Runs virt-v2v conversion on the staged disk d. Imports the converted disk into the target PVC via CDI e. Creates the VirtualMachine CRD f. Optionally starts the VM on KubeVirt
CDI -- Containerized Data Importer
CDI is the KubeVirt component that handles importing disk images into PersistentVolumeClaims. It is not specific to migration -- CDI is used for any disk import operation, including importing ISO images, container disk images, and HTTP-hosted disk files. During migration, CDI is the final step that writes the converted disk into the storage backend.
CDI works through DataVolume CRDs:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: web-01-boot-disk
namespace: migrated-vms
spec:
source:
# Option 1: Import from HTTP URL
http:
url: "https://migration-staging.internal/web-01-boot.qcow2"
# Option 2: Import from container registry
# registry:
# url: "docker://registry.internal/vm-disks/web-01:latest"
# Option 3: Upload from local file (via CDI upload proxy)
# upload: {}
# Option 4: Clone from existing PVC
# pvc:
# name: golden-image-rhel9
# namespace: templates
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: ocs-storagecluster-ceph-rbd
CDI creates an importer pod that downloads, decompresses (if needed), converts (if needed), and writes the disk data into the target PVC. The importer pod supports QCOW2, raw, VMDK, VHD, and VHDX input formats. It can also apply GZIP or XZ decompression on the fly.
Scale Considerations for 5,000+ VMs
Migrating at scale is not just running the same tool 5,000 times. Network bandwidth, storage I/O, vCenter API load, and conversion pod resource consumption all become bottlenecks.
Parallel migration capacity:
| Bottleneck | Limit | Mitigation |
|---|---|---|
| vCenter API load | vCenter can handle ~20-30 concurrent VDDK connections per host before performance degrades | Stagger migrations across hosts. Use multiple vCenter sessions. |
| Network bandwidth | 10 Gbps link = ~1.1 GB/s theoretical. 20 concurrent VMs each copying at 500 MB/s = 10 GB/s = saturated. | Dedicated migration VLAN. 25 Gbps or bonded 10 Gbps links. QoS policies. |
| Target storage IOPS | Ceph cluster write throughput is finite. 20 concurrent imports at 500 MB/s each = 10 GB/s sustained write. | Size the Ceph cluster for migration burst. Use dedicated pool. |
| Conversion pod resources | Each virt-v2v conversion pod needs ~2 vCPU and 2-4 GB RAM. 20 concurrent conversions = 40 vCPU + 60 GB RAM. | Dedicate migration worker nodes. Set resource limits in MTV config. |
| CDI concurrent imports | CDI has a configurable limit on concurrent data imports (default: 3 per namespace, configurable). | Increase CDI configuration: filesystemOverhead, uploadProxyURL, insecureRegistries. |
Bandwidth planning formula:
Migration Bandwidth Planning
================================================================
Inputs:
Total data to migrate: 200 TB (5,000 VMs, average 40 GB used per VM)
Available bandwidth: 10 Gbps dedicated = 1.1 GB/s
Utilization factor: 70% (protocol overhead, contention)
Effective throughput: 0.77 GB/s = ~66 TB/day
Cold migration (all downtime):
200 TB / 66 TB/day = ~3 days of continuous copying
But VMs are offline during copy, so this is 3 days of total downtime
NOT ACCEPTABLE for 5,000+ VMs
Warm migration (background pre-copy):
Phase 1 (background): 200 TB / 66 TB/day = ~3 days
Phase 2 (cutover): 5,000 VMs * 2-minute cutover each
At 20 parallel cutover slots = 5,000 / 20 = 250 rounds
250 rounds * 4 minutes = ~17 hours of total cutover time
Spread across 30 migration waves over 8 weeks
25 Gbps network:
Effective: 2.2 GB/s -> 190 TB/day
Phase 1: ~1 day
This is why dedicated 25 Gbps migration network is recommended
Migration validation: hooks and health checks
MTV supports pre- and post-migration hooks that run at specific lifecycle points. These are critical for validating that a migrated VM actually works.
Pre-migration hooks (run before disk copy starts):
- Verify VM snapshots are consolidated
- Check that CBT is available
- Verify application is in a consistent state (e.g., drain connections, flush caches)
- Tag the VM in vCenter as "migration-in-progress"
Post-migration hooks (run after VM starts on KubeVirt):
- Verify network connectivity (ping gateway, DNS resolution)
- Verify application health (HTTP health check, database connection test)
- Verify disk integrity (filesystem check, application-level data validation)
- Update DNS records to point to the new VM's IP
- Update load balancer backend pools
- Send notification to migration tracking system
apiVersion: forklift.konveyor.io/v1beta1
kind: Hook
metadata:
name: post-migration-smoke-test
spec:
image: registry.internal/migration-tools/smoke-test:latest
playbook: |
---
- hosts: localhost
tasks:
- name: Wait for VM to be reachable
wait_for:
host: "{{ vm_ip }}"
port: 22
timeout: 300
- name: Verify SSH access
command: ssh -o StrictHostKeyChecking=no admin@{{ vm_ip }} "hostname"
register: hostname_result
- name: Verify application health
uri:
url: "http://{{ vm_ip }}:8080/health"
status_code: 200
when: app_type == "webserver"
- name: Update DNS
nsupdate:
server: "dns.internal"
zone: "example.com"
record: "{{ vm_name }}"
value: "{{ vm_ip }}"
type: "A"
3. Azure Migrate
Azure Migrate is Microsoft's discovery, assessment, and migration platform for moving workloads into Azure (cloud) or Azure Local (on-premises). For this evaluation, the relevant scenario is migrating VMware VMs to Azure Local -- an on-premises Hyper-V-based platform managed through Azure Arc.
Azure Migrate Architecture
Azure Migrate Architecture for VMware-to-Azure Local
================================================================
+=====================================================================+
| Azure Portal (cloud control plane) |
| |
| +----------------------------+ +-------------------------------+ |
| | Azure Migrate Hub | | Azure Migrate: Server | |
| | | | Migration | |
| | - Project management | | | |
| | - Assessment reports | | - Replication management | |
| | - Dependency visualization| | - Test migration | |
| | - Cost estimation | | - Production migration | |
| +----------------------------+ +-------------------------------+ |
+=====================================================================+
| |
| HTTPS (443) | HTTPS (443)
| (metadata, control) | (control, replication mgmt)
| |
+====|====================================|============================+
| On-Premises Environment | |
| | |
| +----------------------------+ | |
| | Azure Migrate Appliance | | |
| | (Windows Server VM) | | |
| | | | |
| | +----------------------+ | | |
| | | Discovery Agent | | | |
| | | - Connects to vCenter| | | |
| | | - Inventories VMs | | | |
| | | - Collects perf data | | | |
| | +----------------------+ | | |
| | | | |
| | +----------------------+ | | |
| | | Assessment Agent | | | |
| | | - Readiness analysis | | | |
| | | - Sizing (CPU/RAM) | | | |
| | | - Cost estimation | | | |
| | +----------------------+ | | |
| | | | |
| | +----------------------+ | | |
| | | Replication Provider | | | |
| | | (for agentless mode) | | | |
| | | - Snapshot-based | | | |
| | | replication | | | |
| | | - Delta sync via CBT | | | |
| | +----------------------+ | | |
| +----------------------------+ | |
| | | |
| | vCenter API | |
| v | |
| +----------------------------+ | |
| | VMware vCenter | | |
| | - Source VMs | | |
| | - VMDK disks | | |
| +----------------------------+ | |
| | |
| v |
| +---------------------------------------------------+ |
| | Azure Local Cluster (Target) | |
| | | |
| | Hyper-V VMs created from converted disks | |
| | VMDK -> VHD/VHDX conversion | |
| | Managed through Azure Arc | |
| +---------------------------------------------------+ |
+=====================================================================+
Discovery
Azure Migrate's discovery phase is agentless -- no software is installed on the VMs being discovered. The Migrate Appliance connects to vCenter via the vSphere API and collects:
- VM inventory: names, IDs, OS type, power state, CPU/RAM configuration, disk sizes, network adapters, IP addresses
- Performance data: CPU utilization, memory utilization, disk IOPS, disk throughput, network throughput -- collected over a configurable period (default: 30 days) to establish baselines
- Dependency mapping: Optional agent-based (Microsoft Monitoring Agent) or agentless (via vCenter guest operations API) dependency analysis that maps which VMs communicate with which others. This is critical for wave planning -- you want to migrate VMs that depend on each other in the same wave.
- Application discovery: Identification of installed software, roles, and features on Windows/Linux VMs
Discovery runs continuously, updating the inventory every 15 minutes for new or removed VMs and every 5 minutes for performance data.
Assessment
The assessment phase analyzes discovered VMs and produces readiness reports:
Readiness analysis:
- Is the OS supported on Azure Local? (Windows Server 2012 R2+, RHEL 7+, Ubuntu 16.04+, etc.)
- Are the disk sizes within Azure Local limits?
- Are there unsupported features? (e.g., shared disks, pass-through disks, RDM)
- UEFI or BIOS? Generation 1 or Generation 2 VM on Hyper-V?
Sizing recommendations:
- Based on collected performance data, Azure Migrate recommends the VM size (CPU/RAM) on the target platform
- "As-on-premises" sizing: match the source VM's configuration exactly
- "Performance-based" sizing: right-size based on actual utilization (e.g., a VM with 16 vCPUs but average 10% utilization gets recommended at 4 vCPUs)
Cost estimation:
- Estimates compute, storage, and licensing costs on Azure or Azure Local
- Includes Windows Server licensing cost calculation (Azure Hybrid Benefit considerations)
Migration Methods
Azure Migrate supports two migration methods for VMware VMs:
Agentless migration (recommended):
- No software installed on source VMs
- Uses VMware vSphere APIs for snapshot-based replication
- Takes an initial snapshot, copies all blocks, then uses CBT for delta syncs
- Similar in concept to MTV's warm migration
- Converts VMDK to VHD/VHDX during the copy process
- Supports up to 500 concurrent replications per appliance
Agent-based migration:
- Installs the Azure Site Recovery Mobility Service agent on each source VM
- The agent streams write operations to a Process Server (on-premises), which forwards them to Azure/Azure Local
- Required for physical servers or scenarios where agentless is not supported
- More intrusive (requires agent installation and reboot)
Azure Migrate for Azure Local: Key Differences from Cloud Migration
Migrating to Azure Local (on-premises) differs from migrating to Azure (cloud) in several important ways:
| Aspect | Azure (Cloud) | Azure Local (On-Premises) |
|---|---|---|
| Network path | Replication data traverses WAN/internet to Azure datacenter | Replication data stays on-premises (LAN) |
| Bandwidth | Limited by WAN bandwidth (typically 1-10 Gbps) | Full LAN bandwidth available (10-25 Gbps) |
| Target VM type | Azure VM sizes (standardized) | Hyper-V VMs (custom sizing) |
| Disk format | Managed Disks (VHD) | VHD/VHDX on local storage (CSV, Storage Spaces Direct) |
| Networking | Azure VNet, NSG | Azure Local logical networks, SDN (optional) |
| Management | Azure portal, fully managed | Azure Arc + local admin (hybrid management) |
| Availability | Azure SLAs | Customer-managed HA (Hyper-V failover clustering) |
Azure Local specific considerations:
- Azure Local clusters must be registered with Azure Arc before Azure Migrate can target them
- The target storage must be configured as Cluster Shared Volumes (CSVs) backed by Storage Spaces Direct
- Network mapping from VMware port groups to Azure Local logical networks must be configured
- Generation selection: BIOS-based VMs become Generation 1 Hyper-V VMs, UEFI-based VMs become Generation 2
Hyper-V Conversion: VMDK to VHD/VHDX
Azure Migrate handles disk format conversion automatically, but understanding the process is important for troubleshooting:
VHD (Virtual Hard Disk):
- Fixed or dynamically expanding format
- Maximum size: 2 TB (for Generation 1 VMs)
- Used by: Hyper-V Generation 1 VMs, Azure managed disks
VHDX (Virtual Hard Disk v2):
- Maximum size: 64 TB
- Supports 4 KB logical sector sizes (for 4Kn drives)
- Built-in protection against power failure corruption (log-based metadata updates)
- Used by: Hyper-V Generation 2 VMs
Driver injection for Hyper-V:
- Hyper-V Integration Services (enlightenments) are built into the Windows kernel since Windows Server 2016 and Windows 10
- For older Windows versions (2012 R2 and earlier), the Hyper-V Integration Services must be installed
- Linux VMs: Hyper-V modules (
hv_vmbus,hv_storvsc,hv_netvsc,hv_utils,hv_balloon) are included in the Linux kernel since version 3.4+. Most modern distributions work without additional driver injection. - Azure Migrate injects the necessary drivers and the Azure VM Agent (waagent for Linux, WindowsAzureGuestAgent for Windows) during conversion
Network and Storage Mapping
Similar to MTV's NetworkMap and StorageMap, Azure Migrate requires mapping source VMware constructs to target Azure Local constructs:
Network mapping:
- Source: VMware port groups (e.g., "VLAN-100-Production", "VLAN-200-Backup")
- Target: Azure Local logical networks or VM switches
- IP address handling: can be static (manually assigned on target), DHCP, or preserved from source
Storage mapping:
- Source: VMware datastores (VMFS, NFS, vSAN)
- Target: Azure Local storage paths on Cluster Shared Volumes
- Disk type selection: Fixed or dynamic VHDX
Migration Strategy for 5,000+ VMs
This section addresses the operational reality of migrating an estate of this size. The tooling covered above handles individual VM conversions. This section covers how to orchestrate thousands of conversions into a managed program.
Wave Planning
Migration waves group VMs into batches that are migrated together during a scheduled window. Wave composition is the most important planning decision in the migration program.
Wave Planning Model
================================================================
Wave 0: Proof of Concept (2-4 weeks)
+----------------------------------------------------------+
| 10-20 VMs |
| Purpose: Validate tooling, process, team readiness |
| Selection criteria: |
| - Non-production, low-risk VMs |
| - Mix of Linux + Windows |
| - Mix of disk sizes (small, medium, large) |
| - At least one UEFI VM |
| - At least one VM with multiple NICs |
| - At least one VM with multiple disks |
| Exit criteria: |
| - All VMs boot and pass health checks |
| - Migration runbook validated |
| - Rollback procedure tested |
| - Team trained on tooling |
+----------------------------------------------------------+
Wave 1: Development & Test (2-3 weeks)
+----------------------------------------------------------+
| 200-500 VMs |
| Purpose: Scale validation, process refinement |
| Selection criteria: |
| - Development and test environments |
| - Applications with known low criticality |
| - VMs with owners who can validate quickly |
| Exit criteria: |
| - Throughput baseline established |
| - 95%+ success rate on first attempt |
| - Failure patterns documented |
+----------------------------------------------------------+
Waves 2-N: Production (8-16 weeks)
+----------------------------------------------------------+
| Remaining ~4,500 VMs in groups of 100-200 |
| Grouping criteria (in priority order): |
| 1. Application dependency (all VMs of an application |
| migrate together) |
| 2. Criticality tier (Tier 3 first, Tier 1 last) |
| 3. Complexity (simple VMs first, complex last) |
| 4. Business unit readiness |
| 5. Maintenance window alignment |
| |
| Each wave: |
| - Pre-migration: discovery validation, hook tests |
| - Migration: warm pre-copy (days), cutover (hours) |
| - Validation: health checks, application sign-off |
| - Soak: 5-10 business days dual-monitoring |
| - Decommission: power off source VMs, archive disks |
+----------------------------------------------------------+
Final Wave: Critical Infrastructure
+----------------------------------------------------------+
| 50-100 VMs |
| Domain controllers, DNS servers, monitoring, |
| certificate authorities, backup infrastructure |
| Requires most careful planning, smallest batch sizes, |
| and longest soak periods. |
+----------------------------------------------------------+
Grouping by application dependency is the most critical criterion. Migrating half of an application's VMs to the new platform while the other half remains on VMware creates a "split-brain" scenario where cross-platform network latency, firewall rules, and DNS inconsistencies can cause application failures. Use the dependency mapping from Azure Migrate or a CMDB to identify application groups.
Complexity classification:
| Complexity | Characteristics | Expected Migration Effort |
|---|---|---|
| Simple | Linux, single disk < 100 GB, single NIC, no special hardware, stateless | Fully automated. 5-10 minutes downtime. |
| Medium | Windows Server, 1-3 disks, multiple NICs, domain-joined, basic services | Automated with manual validation. 10-30 minutes downtime. |
| Complex | Large disks (500 GB+), clustered applications (Oracle RAC, SQL Always-On), GPU, UEFI + Secure Boot, custom drivers, RDM disks | Semi-automated. May require manual conversion steps. 1-4 hours downtime. |
| Special | Physical-to-virtual legacy systems, appliances with locked OS, VMs with USB passthrough, VMs with SRIOV | Manual migration or rebuild. May not be convertible -- may require re-platforming. |
Migration Factory
A "migration factory" is the operational model for executing waves at a sustained pace. It is a dedicated team, with defined roles, tools, runbooks, and escalation paths, that operates like a production line.
Migration Factory Operating Model
================================================================
+-------------------+ +-------------------+ +-------------------+
| INTAKE | | EXECUTE | | VALIDATE |
| | | | | |
| - Receive wave | --> | - Run pre-copy | --> | - Health checks |
| manifest | | (warm mode) | | - App owner |
| - Validate VM | | - Schedule | | sign-off |
| readiness | | cutover window | | - Performance |
| - Check snapshot | | - Execute | | comparison |
| consolidation | | cutover | | - Soak period |
| - Verify network | | - Monitor | | monitoring |
| mappings | | conversion | | |
| - Assign to | | - Handle | | |
| migration slot | | failures | | |
+-------------------+ +-------------------+ +-------------------+
| | |
v v v
+-------------------+ +-------------------+ +-------------------+
| Roles: | | Roles: | | Roles: |
| Migration Lead | | Migration Eng. | | App Owner |
| App Owner | | Platform Eng. | | QA Engineer |
| (intake form) | | Network Eng. | | Migration Lead |
+-------------------+ +-------------------+ +-------------------+
Throughput target: 100-200 VMs per week at steady state
Team size: 4-6 migration engineers + 1 lead + shared platform/network
Key metrics for the migration factory:
| Metric | Target | Why |
|---|---|---|
| VMs migrated per week | 100-200 | Determines total project duration. At 150/week, 5,000 VMs = ~33 weeks. |
| First-attempt success rate | >95% | Failed migrations consume 3-5x the effort of successful ones. |
| Average cutover downtime | <15 minutes (warm), <60 minutes (cold) | Business tolerance for per-VM downtime. |
| Rollback rate | <5% | VMs that must revert to VMware after cutover. |
| Mean time to validate | <4 hours | Time from VM-start-on-target to application owner sign-off. |
Rollback Strategy
Every migration must have a tested rollback path. The source VMware environment must remain operational until the migrated VM is validated and the soak period is complete.
Rollback approach for warm migration:
- Do NOT decommission the source VM immediately after cutover. Power it off but retain the disks.
- If the migrated VM fails validation, power off the target VM, power on the source VM on VMware.
- DNS and load balancer changes must be reversible (use short TTLs during migration, keep old configuration ready).
- Set a soak period (5-10 business days) after which the source VM can be archived and eventually deleted.
- Archive source VMDKs to cold storage (object store, tape) for a defined retention period (90-180 days) as a safety net.
Rollback approach for cold migration:
Same as above -- the source VM is still on VMware, just powered off. Power it back on to rollback.
What makes rollback hard:
- If the migrated VM has been running in production and receiving new data (database writes, user uploads), reverting to the source VM loses that data. This is why soak periods should be kept as short as possible for write-heavy workloads, and why some organizations implement data synchronization in both directions during the soak period.
- DNS changes may have propagated to clients with long TTLs. Clients may cache the new IP address.
- If shared infrastructure (file servers, databases) has been updated to reference the new VM's address, rollback requires reverting those changes too.
Parallel Running (Dual-Stack Period)
During migration, both the VMware platform and the target platform (OVE or Azure Local) run simultaneously. This "dual-stack" period requires:
- Network connectivity between platforms: VMs on VMware must communicate with VMs on the target platform. This requires routing between the VMware network segments and the target platform's networks, or a shared L2 segment.
- Shared services: DNS, Active Directory, monitoring, backup, and logging must serve both platforms simultaneously.
- Capacity planning: The organization needs enough hardware to run both platforms at the same time. The target platform must be fully deployed before migration starts. Hardware from VMware can only be reclaimed after VMs are migrated and decommissioned.
- Monitoring: Unified monitoring that covers VMs on both platforms, with dashboards showing migration progress and comparative health metrics.
Acceptance Criteria
A migrated VM is "done" when all of the following are true:
| Criterion | Verification |
|---|---|
| VM boots successfully | Console access confirms OS boot, login prompt |
| Network connectivity | Ping gateway, resolve DNS, reach dependent services |
| Correct IP configuration | IP address, subnet, gateway, DNS match the pre-migration configuration |
| Storage accessible | All disks mounted, file systems intact, data readable |
| Application functional | Application-specific health check passes (HTTP 200, DB connection, etc.) |
| Performance acceptable | CPU, memory, disk I/O, network throughput within 20% of baseline |
| Monitoring integrated | VM appears in monitoring system, alerts fire correctly |
| Backup configured | Backup agent/schedule active on the new platform |
| Application owner sign-off | Written confirmation from the application team |
| Soak period complete | N business days without incidents |
Timeline Estimation
Realistic throughput estimates per platform:
Timeline Estimation for 5,000 VM Migration
================================================================
Assumptions:
- 25 Gbps dedicated migration network
- Average VM: 2 disks, 80 GB used data
- 70% Linux, 30% Windows
- 80% simple/medium, 20% complex/special
- Warm migration for all VMs > 50 GB
- 20 parallel migration slots
Phase 1: Setup & Wave 0 (4-6 weeks)
+-----------+----------------------------------------------+
| Week 1-2 | Deploy target platform, configure MTV/Azure |
| | Migrate, set up monitoring, build runbooks |
| Week 3-4 | Wave 0: 20 VMs (PoC validation) |
| Week 5-6 | Process review, runbook refinement |
+-----------+----------------------------------------------+
Phase 2: Dev/Test Waves (3-4 weeks)
+-----------+----------------------------------------------+
| Week 7-8 | Wave 1: 200-500 dev/test VMs |
| Week 9-10 | Validate, measure, optimize throughput |
+-----------+----------------------------------------------+
Phase 3: Production Waves (16-24 weeks)
+-----------+----------------------------------------------+
| Week 11+ | Waves 2-N: 200 VMs/wave, 1-2 waves/week |
| | ~150 VMs/week sustained throughput |
| | 4,500 VMs / 150 per week = ~30 weeks |
| | With ramp-up and delays: 16-24 weeks |
+-----------+----------------------------------------------+
Phase 4: Critical Infrastructure & Cleanup (4-6 weeks)
+-----------+----------------------------------------------+
| Final | Domain controllers, DNS, monitoring |
| weeks | Decommission VMware, reclaim hardware |
+-----------+----------------------------------------------+
Total estimated duration: 6-10 months
(assuming dedicated migration team and no major blockers)
How the Candidates Handle This
| Aspect | OVE (MTV) | Azure Local (Azure Migrate) | Swisscom ESC |
|---|---|---|---|
| Primary tool | Migration Toolkit for Virtualization (MTV / Forklift) | Azure Migrate: Server Migration | Swisscom Professional Services (VMware HCX or vMotion) |
| Source platforms | VMware vSphere, Red Hat Virtualization, OpenStack, oVirt | VMware vSphere, Hyper-V, physical servers | VMware vSphere (VMware-to-VMware) |
| Disk format conversion | VMDK to QCOW2 or raw (via qemu-img / virt-v2v) | VMDK to VHD/VHDX (handled by migration service) | No conversion needed (VMware-to-VMware) |
| Warm migration | Yes. CBT-based delta sync via VDDK. Cutover downtime: 2-5 minutes. | Yes. Snapshot-based replication with CBT. Similar cutover window. | Yes (vMotion for same-vCenter, HCX for cross-vCenter). Near-zero downtime. |
| Cold migration | Yes. Full disk copy + convert + import. | Yes. Full disk copy + convert. | Yes (OVA export/import). |
| Driver injection | virt-v2v: virtio-win for Windows, initramfs rebuild for Linux. Automated. | Azure Migrate: Hyper-V Integration Services injection. Automated for supported OS. | Not needed (same hypervisor). |
| Parallel migrations | Configurable. Typically 10-20 concurrent. Limited by vCenter API and storage write IOPS. | Up to 500 concurrent replications per appliance. | Limited by network bandwidth and vMotion/HCX capacity. |
| UI | OpenShift Console plugin (MTV UI). Web-based. | Azure Portal. Cloud-based. Requires internet. | Swisscom portal + professional services. |
| API / Automation | Kubernetes CRDs. Fully automatable via kubectl, Ansible, Terraform. | Azure REST API, PowerShell, Azure CLI. | API availability unclear. Migration managed by Swisscom. |
| Pre-migration validation | Forklift-validation: automated OS/disk/network checks. | Azure Migrate Assessment: readiness, sizing, cost. | Swisscom professional services assessment. |
| Post-migration hooks | Yes. Ansible playbooks or container hooks at Plan level. | Limited. Azure Automation runbooks. | Not self-service. Managed by Swisscom. |
| Dependency mapping | Not built into MTV. Use third-party or manual CMDB. | Yes. Built-in agentless or agent-based dependency analysis. | Swisscom assessment. |
| Rollback | Manual. Keep source VM powered off, re-start if needed. | Built-in. Can resume replication to source (for Azure cloud). For Azure Local: manual. | VMware-to-VMware rollback is straightforward. |
| Assessment / right-sizing | Not built into MTV. Manual or use third-party tools. | Yes. Performance-based sizing recommendations. Cost estimation. | Included in Swisscom assessment. |
| Scale proven | Proven at 1,000+ VM scale in Red Hat customer deployments. Scaling to 5,000+ requires careful planning. | Proven at large scale (1,000+ VMs per appliance). Microsoft's most mature migration tool. | Swisscom manages migration; scale depends on engagement model. |
| VDDK dependency | Yes. VDDK library required for VMware source. Not redistributable -- must be obtained from VMware/Broadcom. | No. Uses vSphere APIs directly (no VDDK dependency). | N/A (VMware native tools). |
| Windows support | Full. virtio-win drivers for all supported Windows versions. UEFI supported. | Full. Hyper-V enlightenments built into Windows. UEFI Generation 2 supported. | Full (no conversion needed). |
| Linux support | Full. virtio modules in kernel. initramfs rebuild automated. | Full. Hyper-V modules in kernel. | Full (no conversion needed). |
Key Takeaways
-
Warm migration is non-negotiable at 5,000+ VM scale. Cold migration's downtime (proportional to disk size) is unacceptable for production VMs. Both MTV and Azure Migrate support warm migration with CBT-based delta sync, reducing cutover downtime to minutes regardless of disk size. The migration network should be 25 Gbps dedicated to support the sustained data transfer volume.
-
VMDK snapshot chains must be consolidated before migration. No conversion tool can migrate a VM with active VMware snapshots. The pre-migration validation phase must verify that every VM's snapshot chain is fully consolidated into a single flat VMDK. This is a common source of migration failures that can be entirely prevented with upfront validation.
-
Windows VM conversion is the highest-risk area. Linux VMs almost always convert cleanly because virtio drivers are in the kernel. Windows VMs require driver injection (virtio-win for OVE, Integration Services for Azure Local), bootloader reconfiguration, and frequently have NIC ordering and Windows activation issues. Budget extra time and manual validation for every Windows VM.
-
MTV's Kubernetes-native approach is both a strength and a complexity. MTV's CRD-based workflow (Provider, NetworkMap, StorageMap, Plan, Migration) integrates naturally into GitOps and automation pipelines. However, it requires Kubernetes expertise that a VMware-trained team may not have. Azure Migrate's Azure Portal-based workflow is more accessible to traditional infrastructure teams but less automatable.
-
VDDK is a licensing and supply-chain dependency for MTV. MTV requires VMware's proprietary VDDK library to access VMware disks. VDDK must be obtained from VMware/Broadcom under their SDK license. In a scenario where the organization is leaving VMware due to licensing concerns, maintaining a VDDK dependency during migration is an awkward but necessary reality. Azure Migrate does not have this dependency.
-
The migration is a 6-10 month program, not a one-time event. Wave planning, migration factory operations, validation, soak periods, and rollback handling require a dedicated team operating at sustained capacity. The total elapsed time depends on parallel migration capacity (constrained by network bandwidth, target storage throughput, and team size) and the tolerance for downtime windows.
-
Swisscom ESC migration is technically trivial but strategically questionable. VMware-to-VMware migration avoids all conversion risks (no format conversion, no driver injection, no bootloader changes). However, it does not address the strategic objective of leaving VMware. If the long-term goal is VMware independence, ESC migration is a lateral move that defers the conversion problem.
-
Disk format selection on OVE depends on the storage backend. If OVE uses ODF (Ceph RBD), raw format is preferred because Ceph provides thin provisioning and snapshots at the RADOS layer, making QCOW2's features redundant overhead. If the storage backend lacks these capabilities, QCOW2 provides them at the image level. This decision should be made before migration begins, as converting between formats after migration adds unnecessary work.
-
Post-migration validation must be application-aware, not just infrastructure-aware. A VM that boots, has network connectivity, and shows healthy CPU/memory metrics can still have a broken application. Migration hooks (MTV) or automation runbooks (Azure Migrate) should include application-specific health checks: HTTP endpoints, database connectivity tests, queue processing verification, and end-to-end transaction tests.
-
Rollback readiness is a planning requirement, not an afterthought. Retain source VMware VMs (powered off) through the soak period. Use short DNS TTLs during cutover. Keep load balancer configurations reversible. Archive VMDKs to cold storage for a defined retention period. The cost of maintaining rollback capability is small compared to the cost of a failed migration with no way back.
Discussion Guide
Use these questions when engaging with vendors, Red Hat/Microsoft/Swisscom field teams, or internal subject matter experts.
Migration Tooling and Process
-
Demonstrate a warm migration of a Windows Server 2022 VM with 500 GB of disk data, domain-joined, running IIS with a SQL Server backend. Walk through every step: pre-copy, delta syncs, cutover, driver injection, boot on target, application validation. What is the total cutover downtime? Does Windows activation survive the migration? Why this matters: Windows VMs with large disks and domain membership represent the most common complex migration scenario. The demonstration must prove that the full pipeline works end-to-end, including driver injection and post-conversion functionality.
-
Show how MTV/Azure Migrate handles a VM with multiple NICs on different VLANs. After migration, are the NIC-to-VLAN mappings preserved? Is the NIC ordering inside the guest OS preserved? What happens to static IP configurations? Why this matters: Multi-NIC VMs (e.g., management + data + backup networks) are common in enterprise environments. NIC ordering changes can break applications that bind to specific interface names.
-
What is the maximum number of concurrent migrations you have demonstrated in a production customer environment? What were the bottlenecks? How was the migration network sized? Why this matters: The vendor's answer reveals real-world scale limitations that may not appear in lab tests. The number must be compared against the organization's throughput requirements for the migration timeline.
-
How does the tool handle a conversion failure mid-flight? If virt-v2v / Azure Migrate fails during driver injection on VM number 47 out of 200 in a wave, what happens to VM 47? What happens to the remaining 153 VMs in the wave? Is there automatic retry? Why this matters: At scale, failures are statistical certainties. The tool's failure handling (retry, skip, abort-wave) determines how much manual intervention is needed per wave.
-
Demonstrate rollback: migrate a VM to the new platform, let it run for 24 hours with active data changes, then roll back to VMware. How is data handled? Is there any data loss? How long does rollback take? Why this matters: Rollback is the safety net. If rollback is painful or lossy, the team will hesitate to proceed with production migrations, slowing the entire program.
Disk Formats and Conversion
-
What disk format is recommended on the target platform (QCOW2 vs. raw for OVE, fixed vs. dynamic VHDX for Azure Local), and why? What is the performance difference? Can the format be changed after migration without re-migrating the VM? Why this matters: The disk format choice affects I/O performance, storage efficiency, and snapshot capabilities. Making the wrong choice pre-migration may require costly re-conversion later.
-
How does the conversion tool handle a VMDK with seSparse snapshots that have not been consolidated? Does it detect this condition and warn the user, or does it attempt conversion and fail? Why this matters: Unconsolidated snapshots are the most common pre-migration blocker. The tool should fail fast with a clear error message, not attempt a conversion that produces a corrupted disk.
Wave Planning and Scale
-
For an estate of 5,000+ VMs, what is the recommended migration team size, wave cadence, and total project duration? What are the key assumptions behind that estimate? Provide reference customers of comparable scale. Why this matters: The vendor's answer calibrates timeline expectations against real-world experience. Reference customers provide validation that the tool and process have been proven at this scale.
-
How does your tooling support wave planning? Can we group VMs by application, tag them, and execute them as a unit? Can we define dependencies between waves (e.g., "do not start Wave 3 until Wave 2 validation is complete")? Why this matters: Wave orchestration at scale requires tooling support, not just spreadsheets. The ability to define, track, and gate waves determines how safely the migration progresses.
-
What happens if the VMware license expires or VMware support is terminated during the migration? Does the migration tool still function? Are there any VDDK or vCenter API dependencies that would break? Why this matters: The migration timeline may overlap with VMware contract expiration. If the migration tool depends on active VMware licensing (VDDK, vCenter), a license lapse could halt the migration program. This is a genuine risk that must be contractually and technically mitigated.