Infrastructure as Code
Why This Matters
The platform strategy mandates "Everything as Code" -- all infrastructure must be version-controlled, peer-reviewed, and automated. This is not a preference; it is a governance requirement. For a Tier-1 financial enterprise running 5,000+ VMs, manual provisioning through web consoles is a compliance risk, an audit liability, and an operational bottleneck. Every VM, every network policy, every storage volume must be traceable to a Git commit, reproducible from a pipeline, and destroyable without human intervention.
Today, the VMware estate is managed through a combination of vCenter UI operations, PowerCLI scripts, Terraform with the vSphere provider, and Ansible playbooks using the community.vmware collection. This hybrid approach works, but it is tightly coupled to VMware-specific APIs. Migrating to OVE, Azure Local, or Swisscom ESC means replacing or rewriting every IaC integration -- Terraform providers, Ansible collections, CI/CD pipelines, and approval workflows.
This chapter covers the two primary IaC toolchains (Terraform and Ansible), the Kubernetes-native alternative (GitOps with ArgoCD/Flux), and the emerging Crossplane model. Each approach has distinct strengths, and the right choice depends on the target platform. For OVE, GitOps may be the preferred model because VMs are Kubernetes-native objects. For Azure Local, Terraform with the Azure provider ecosystem is the natural fit. For Swisscom ESC, the IaC surface depends on what APIs Swisscom exposes.
The chapter assumes working knowledge of Git, YAML, and basic Kubernetes concepts. It provides complete, copy-pasteable examples that can be adapted for proof-of-concept deployments.
Concepts
1. Terraform Provider for KubeVirt / Hyper-V
Terraform Fundamentals
Terraform is a declarative infrastructure provisioning tool. You describe the desired end state of your infrastructure in HCL (HashiCorp Configuration Language) files, and Terraform calculates the difference between the current state and the desired state, then executes the necessary API calls to converge reality to the declaration.
Core concepts:
- Provider: A plugin that translates HCL resource definitions into API calls for a specific platform (vSphere, Kubernetes, Azure, AWS, etc.). Each provider ships its own set of resource types and data sources.
- Resource: A single infrastructure object managed by Terraform (a VM, a disk, a network, a DNS record). Resources have a type (e.g.,
kubevirt_virtual_machine) and a unique name within the configuration. - Data Source: A read-only query against the provider's API. Data sources let you reference existing infrastructure without managing it (e.g., look up an existing namespace or network).
- State: A JSON file (
terraform.tfstate) that records the mapping between HCL resource definitions and real-world infrastructure objects. State is the source of truth for what Terraform manages. Losing state means Terraform loses track of what it created. - Plan: A dry-run that shows what Terraform would create, modify, or destroy without actually doing it. Every apply should be preceded by a plan review.
- Apply: Executes the planned changes against the target API.
- Destroy: Removes all resources managed by the current configuration.
Terraform Execution Flow
+-------------------+
| .tf files (HCL) | Developer writes declarative configuration
| - main.tf | in version-controlled .tf files
| - variables.tf |
| - outputs.tf |
+--------+----------+
|
v
+--------+----------+
| terraform init | Downloads provider plugins, initializes
| (one-time setup) | backend for state storage
+--------+----------+
|
v
+--------+----------+
| terraform plan | Reads current state + provider API,
| | computes diff, shows proposed changes
| + = create |
| ~ = modify | "Plan: 3 to add, 1 to change, 0 to destroy"
| - = destroy |
+--------+----------+
|
v (human or pipeline reviews plan)
+--------+----------+
| terraform apply | Executes API calls to converge
| | real infrastructure to desired state
| Provider API |
| calls: POST, | Updates terraform.tfstate with new
| PUT, DELETE | resource IDs and attributes
+--------+----------+
|
v
+--------+----------+
| terraform.tfstate | JSON file mapping HCL resources to
| (state file) | real infrastructure object IDs
| | MUST be stored securely (S3, GCS,
| | Terraform Cloud, GitLab backend)
+-------------------+
State management is Terraform's most critical operational concern. The state file contains resource IDs, IP addresses, and sometimes secrets. For a team of 10+ engineers managing 5,000+ VMs:
- State must be stored remotely (not in local files) using a backend like S3, Azure Blob Storage, or a GitLab-managed Terraform state.
- State must be locked during apply operations to prevent concurrent modifications.
- State must be backed up and versioned.
- State should be segmented: one large state file for 5,000 VMs is unmanageable. Use workspaces or separate root modules per environment, application, or team.
KubeVirt Terraform Provider
The KubeVirt Terraform provider (kubevirt/kubevirt in the Terraform registry) enables declarative management of KubeVirt VirtualMachine resources via Terraform. It speaks directly to the Kubernetes API server using a kubeconfig file and translates HCL resource definitions into KubeVirt Custom Resource operations.
Provider configuration:
# providers.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
kubevirt = {
source = "kubevirt/kubevirt"
version = "~> 0.1"
}
}
backend "s3" {
bucket = "terraform-state-prod"
key = "kubevirt/vm-workloads/terraform.tfstate"
region = "eu-central-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
provider "kubevirt" {
# Option 1: Use the default kubeconfig
# (reads from KUBECONFIG env var or ~/.kube/config)
# Option 2: Explicit path
config_path = var.kubeconfig_path
# Option 3: In-cluster (when Terraform runs inside the cluster)
# No configuration needed -- uses the pod's service account token
}
Resource types available in the KubeVirt provider:
| Resource Type | Description |
|---|---|
kubevirt_virtual_machine |
A VirtualMachine CR (persistent, restartable VM) |
kubevirt_data_volume |
A CDI DataVolume (disk image import or clone) |
The KubeVirt Terraform provider is relatively young compared to the vSphere provider. It covers the core VM and DataVolume resources but does not yet expose every KubeVirt CRD (e.g., MigrationPolicy, VirtualMachineClusterInstancetype, VirtualMachinePool are not directly supported). For resources not covered by the provider, the Kubernetes Terraform provider's kubernetes_manifest resource can fill the gap (see "Alternative" section below).
Complete example -- provisioning a VM with disks, network, and cloud-init:
# variables.tf
variable "kubeconfig_path" {
description = "Path to the kubeconfig file"
type = string
default = "~/.kube/config"
}
variable "namespace" {
description = "Kubernetes namespace for the VM"
type = string
default = "vm-workloads"
}
variable "vm_name" {
description = "Name of the virtual machine"
type = string
default = "rhel9-appserver-01"
}
variable "cpu_cores" {
description = "Number of vCPU cores"
type = number
default = 4
}
variable "memory" {
description = "Memory allocation (e.g., 8Gi)"
type = string
default = "8Gi"
}
variable "disk_size" {
description = "Root disk size (e.g., 100Gi)"
type = string
default = "100Gi"
}
variable "ssh_public_key" {
description = "SSH public key for cloud-init"
type = string
}
variable "network_name" {
description = "Name of the NetworkAttachmentDefinition for the VM network"
type = string
default = "vlan-100-prod"
}
variable "storage_class" {
description = "Kubernetes StorageClass for the VM disk"
type = string
default = "ocs-storagecluster-ceph-rbd"
}
variable "golden_image_namespace" {
description = "Namespace containing the golden image PVC"
type = string
default = "golden-images"
}
variable "golden_image_pvc" {
description = "Name of the golden image PVC to clone"
type = string
default = "rhel9-golden-20240601"
}
# main.tf
# --- Data Volume: Clone the root disk from a golden image ---
resource "kubevirt_data_volume" "root_disk" {
metadata {
name = "${var.vm_name}-rootdisk"
namespace = var.namespace
labels = {
"app.kubernetes.io/name" = var.vm_name
"app.kubernetes.io/managed-by" = "terraform"
"app.kubernetes.io/component" = "rootdisk"
}
}
spec {
source {
pvc {
name = var.golden_image_pvc
namespace = var.golden_image_namespace
}
}
pvc {
access_modes = ["ReadWriteMany"]
resources {
requests = {
storage = var.disk_size
}
}
storage_class_name = var.storage_class
}
}
}
# --- Data Volume: Additional data disk ---
resource "kubevirt_data_volume" "data_disk" {
metadata {
name = "${var.vm_name}-datadisk"
namespace = var.namespace
labels = {
"app.kubernetes.io/name" = var.vm_name
"app.kubernetes.io/managed-by" = "terraform"
"app.kubernetes.io/component" = "datadisk"
}
}
spec {
source {
blank {}
}
pvc {
access_modes = ["ReadWriteMany"]
resources {
requests = {
storage = "200Gi"
}
}
storage_class_name = var.storage_class
}
}
}
# --- Virtual Machine ---
resource "kubevirt_virtual_machine" "vm" {
metadata {
name = var.vm_name
namespace = var.namespace
labels = {
"app.kubernetes.io/name" = var.vm_name
"app.kubernetes.io/managed-by" = "terraform"
"app.kubernetes.io/part-of" = "appserver-fleet"
"env" = "production"
}
annotations = {
"vm.kubevirt.io/os" = "rhel9"
}
}
spec {
run_strategy = "Always"
template {
metadata {
labels = {
"app.kubernetes.io/name" = var.vm_name
"kubevirt.io/domain" = var.vm_name
}
}
spec {
domain {
cpu {
cores = var.cpu_cores
sockets = 1
threads = 1
}
resources {
requests = {
memory = var.memory
}
limits = {
memory = var.memory
}
}
devices {
disk {
name = "rootdisk"
disk {
bus = "virtio"
}
}
disk {
name = "datadisk"
disk {
bus = "virtio"
}
}
disk {
name = "cloudinit"
disk {
bus = "virtio"
}
}
interface {
name = "prod-net"
bridge {}
}
rng {}
}
}
network {
name = "prod-net"
multus {
network_name = var.network_name
}
}
volume {
name = "rootdisk"
data_volume {
name = kubevirt_data_volume.root_disk.metadata[0].name
}
}
volume {
name = "datadisk"
data_volume {
name = kubevirt_data_volume.data_disk.metadata[0].name
}
}
volume {
name = "cloudinit"
cloud_init_no_cloud {
user_data = <<-CLOUDINIT
#cloud-config
hostname: ${var.vm_name}
fqdn: ${var.vm_name}.internal.example.com
manage_etc_hosts: true
users:
- name: sysadmin
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
- ${var.ssh_public_key}
packages:
- qemu-guest-agent
- nfs-utils
- python3
runcmd:
- systemctl enable --now qemu-guest-agent
- echo "Provisioned by Terraform at $(date)" > /etc/motd
write_files:
- path: /etc/sysctl.d/99-tuning.conf
content: |
net.core.somaxconn = 65535
vm.swappiness = 10
permissions: '0644'
CLOUDINIT
}
}
}
}
}
depends_on = [
kubevirt_data_volume.root_disk,
kubevirt_data_volume.data_disk,
]
}
# outputs.tf
output "vm_name" {
description = "Name of the created VM"
value = kubevirt_virtual_machine.vm.metadata[0].name
}
output "vm_namespace" {
description = "Namespace of the created VM"
value = kubevirt_virtual_machine.vm.metadata[0].namespace
}
output "root_disk_name" {
description = "Name of the root disk DataVolume"
value = kubevirt_data_volume.root_disk.metadata[0].name
}
output "data_disk_name" {
description = "Name of the data disk DataVolume"
value = kubevirt_data_volume.data_disk.metadata[0].name
}
Lifecycle management -- what happens on plan, apply, and destroy:
| Operation | Behavior |
|---|---|
terraform plan |
Reads the current state of the VM and DataVolumes from the Kubernetes API. Computes the diff. Shows whether the VM would be created, modified, or destroyed. |
terraform apply (create) |
Creates the DataVolume CRs first (CDI begins cloning/importing disk images). Then creates the VirtualMachine CR. KubeVirt's virt-controller creates the virt-launcher pod and starts the VM. Terraform waits for the resources to be created in the API but does not wait for the VM to finish booting. |
terraform apply (modify) |
Changes to certain fields (labels, annotations, resource requests) can be applied in-place. Changes to immutable fields (disk bus type, network interface type) force a destroy-and-recreate. The run_strategy can be changed in-place. |
terraform destroy |
Deletes the VirtualMachine CR first (which triggers graceful shutdown of the VMI and deletion of the virt-launcher pod), then deletes the DataVolume CRs (which deletes the underlying PVCs and PVs, releasing the storage). Data is permanently lost. |
State management -- how VM state maps to Terraform state:
The Terraform state file records the metadata (name, namespace, UID, resourceVersion) and the full spec of each managed resource. When the state file says a VM exists with 4 cores and 8Gi memory, Terraform trusts this until the next plan or apply, at which point it queries the Kubernetes API to detect drift.
Drift detection: If someone manually edits the VM via kubectl or virtctl (e.g., adds a label, changes memory), the next terraform plan will detect the drift and propose changes to bring the real resource back in line with the HCL definition. This is a strength of Terraform -- it enforces the declared state. It is also a risk: if the operations team makes an emergency change via kubectl and then someone runs terraform apply, Terraform will revert the emergency change.
Limitations and workarounds:
-
Incomplete CRD coverage. The KubeVirt provider does not expose all KubeVirt CRDs. MigrationPolicy, VirtualMachineClusterInstancetype, VirtualMachineClusterPreference, VirtualMachinePool, and many others must be managed via the
kubernetes_manifestresource or via rawkubectl applyoutside Terraform. -
No lifecycle operation support. Terraform can set
run_strategy: Alwaysorrun_strategy: Halted, but it cannot trigger a live migration, open a console, or invoke a restart. These imperative operations are outside Terraform's declarative model. Usevirtctlor Ansible for day-2 lifecycle operations. -
Provider maturity. The KubeVirt Terraform provider is a community-maintained project. It does not have the same level of investment, documentation, or testing as the vSphere or AWS providers. Expect rough edges. Pin the provider version and test upgrades carefully.
-
State file size at scale. Each VM resource in the state file is approximately 5--10 KB of JSON. At 5,000 VMs, the state file would be 25--50 MB -- manageable, but slow to plan. Segment state files by team, environment, or application.
-
Import existing VMs. If VMs were created manually (e.g., during a PoC),
terraform importcan bring them under Terraform management. However, you must write the matching HCL configuration first. There is noterraform import --generate-configfor KubeVirt resources (unlike some other providers).
Alternative: Kubernetes Terraform Provider with Raw Manifests
For KubeVirt resources not covered by the dedicated provider, the hashicorp/kubernetes provider offers the kubernetes_manifest resource, which can manage any Kubernetes resource by accepting raw YAML/JSON manifests.
# Using the Kubernetes provider for resources not in the KubeVirt provider
terraform {
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.30"
}
}
}
provider "kubernetes" {
config_path = var.kubeconfig_path
}
# Example: Create a MigrationPolicy (not supported by KubeVirt provider)
resource "kubernetes_manifest" "migration_policy" {
manifest = {
apiVersion = "migrations.kubevirt.io/v1alpha1"
kind = "MigrationPolicy"
metadata = {
name = "high-bandwidth-policy"
}
spec = {
selectors = {
namespaceSelector = {
"migration-tier" = "premium"
}
}
bandwidthPerMigration = "1Gi"
completionTimeoutPerGiB = 800
allowAutoConverge = true
allowPostCopy = false
}
}
}
# Example: Create a VirtualMachineClusterInstancetype
resource "kubernetes_manifest" "instancetype_large" {
manifest = {
apiVersion = "instancetype.kubevirt.io/v1beta1"
kind = "VirtualMachineClusterInstancetype"
metadata = {
name = "large"
labels = {
"instancetype.kubevirt.io/vendor" = "internal"
}
}
spec = {
cpu = {
guest = 8
}
memory = {
guest = "32Gi"
}
}
}
}
Tradeoff: kubernetes_manifest is fully generic but has weaker type safety. The KubeVirt provider validates HCL against the KubeVirt CRD schema at plan time; kubernetes_manifest validates only at apply time when the API server rejects invalid fields. For core VM resources, prefer the dedicated provider. For supplementary CRDs, use kubernetes_manifest.
OpenTofu as an Alternative to HashiCorp Terraform
In August 2023, HashiCorp changed Terraform's license from the Mozilla Public License 2.0 (MPL-2.0) to the Business Source License 1.1 (BSL 1.1). The BSL restricts using Terraform in products that compete with HashiCorp's commercial offerings. While this does not directly affect end-users who use Terraform to manage their own infrastructure, it created uncertainty for:
- Organizations that embed Terraform in internal developer platforms
- Managed service providers that offer Terraform-as-a-Service
- Vendors that build tooling on top of Terraform
In response, the Linux Foundation forked Terraform 1.5.x under the name OpenTofu and released it under the MPL-2.0 license. OpenTofu is a drop-in replacement for Terraform: it uses the same HCL language, the same state format, the same provider ecosystem, and the same CLI commands (tofu init, tofu plan, tofu apply).
What this means for the evaluation:
- All HCL examples in this chapter work identically with both Terraform and OpenTofu.
- If the organization's legal team has concerns about the BSL license, OpenTofu is a viable alternative with no code changes.
- OpenTofu is a CNCF project (accepted into the Linux Foundation in September 2023), which aligns with the Kubernetes-native direction of OVE.
- The KubeVirt Terraform provider, the Kubernetes provider, and the Azure providers all work with OpenTofu.
- Monitor both projects. As of early 2026, both are actively maintained, but feature parity may diverge over time (e.g., Terraform 1.8+ introduced provider-defined functions that OpenTofu has implemented via a different mechanism).
Hyper-V / Azure Local Terraform
For Azure Local, the Terraform story is fundamentally different from KubeVirt. Azure Local is managed through Azure Resource Manager (ARM), which means the standard Azure Terraform provider (azurerm) is the primary IaC interface. VMs on Azure Local are represented as Azure Arc-enabled resources in the Azure control plane.
Provider configuration:
# providers.tf for Azure Local
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.100"
}
}
backend "azurerm" {
resource_group_name = "terraform-state-rg"
storage_account_name = "tfstateprodstorage"
container_name = "tfstate"
key = "azure-local/vm-workloads.tfstate"
}
}
provider "azurerm" {
features {}
subscription_id = var.subscription_id
}
Resource types for Azure Local / Azure Stack HCI:
| Resource Type | Description |
|---|---|
azurerm_stack_hci_cluster |
The Azure Stack HCI cluster registration |
azurerm_stack_hci_logical_network |
Logical network on the HCI cluster |
azurerm_stack_hci_storage_path |
Storage path (local or shared) |
azurerm_stack_hci_marketplace_gallery_image |
Marketplace image for VM creation |
azurerm_stack_hci_network_interface |
NIC for an HCI VM |
azurerm_stack_hci_virtual_hard_disk |
Virtual hard disk (VHDX) |
azurerm_arc_machine |
Arc-connected machine representation |
Example -- provisioning a VM on Azure Local:
# main.tf for Azure Local VM
variable "resource_group_name" {
type = string
default = "hci-workloads-rg"
}
variable "location" {
type = string
default = "switzerlandnorth"
}
variable "custom_location_id" {
description = "Azure Custom Location ID for the HCI cluster"
type = string
}
variable "logical_network_id" {
description = "ID of the logical network on the HCI cluster"
type = string
}
variable "marketplace_image_id" {
description = "ID of the marketplace gallery image"
type = string
}
# --- Network Interface ---
resource "azurerm_stack_hci_network_interface" "vm_nic" {
name = "rhel9-appserver-01-nic"
resource_group_name = var.resource_group_name
location = var.location
custom_location_id = var.custom_location_id
ip_configuration {
name = "ipconfig1"
private_ip_address = "10.100.1.50"
subnet_id = var.logical_network_id
}
tags = {
managed-by = "terraform"
env = "production"
}
}
# --- Virtual Hard Disk ---
resource "azurerm_stack_hci_virtual_hard_disk" "data_disk" {
name = "rhel9-appserver-01-datadisk"
resource_group_name = var.resource_group_name
location = var.location
custom_location_id = var.custom_location_id
disk_size_gb = 200
dynamic_enabled = true
storage_path_id = var.storage_path_id
tags = {
managed-by = "terraform"
}
}
# Note: As of the azurerm provider ~3.100, the full
# azurerm_stack_hci_virtual_machine resource is still
# evolving. Check the provider changelog for the latest
# supported attributes. The resource may require using
# azapi_resource for full control.
The Azure API provider (azapi) for cutting-edge resources:
Because Azure Local resources are evolving rapidly, the azurerm provider sometimes lags behind the Azure API. The azapi provider allows direct Azure REST API calls using Terraform:
terraform {
required_providers {
azapi = {
source = "Azure/azapi"
version = "~> 1.12"
}
}
}
resource "azapi_resource" "hci_vm" {
type = "Microsoft.AzureStackHCI/virtualMachineInstances@2024-01-01"
parent_id = azurerm_arc_machine.vm.id
body = jsonencode({
extendedLocation = {
type = "CustomLocation"
name = var.custom_location_id
}
properties = {
hardwareProfile = {
vmSize = "Custom"
processors = 4
memoryMB = 8192
}
osProfile = {
computerName = "rhel9-appserver-01"
adminUsername = "sysadmin"
linuxConfiguration = {
ssh = {
publicKeys = [{
keyData = var.ssh_public_key
path = "/home/sysadmin/.ssh/authorized_keys"
}]
}
}
}
storageProfile = {
imageReference = {
id = var.marketplace_image_id
}
dataDisks = [{
id = azurerm_stack_hci_virtual_hard_disk.data_disk.id
}]
}
networkProfile = {
networkInterfaces = [{
id = azurerm_stack_hci_network_interface.vm_nic.id
}]
}
}
})
}
Key differences from the KubeVirt model:
| Aspect | KubeVirt (OVE) | Azure Local |
|---|---|---|
| API endpoint | Kubernetes API server (on-premises) | Azure Resource Manager (cloud control plane) |
| Authentication | kubeconfig / ServiceAccount token | Azure AD / Service Principal / Managed Identity |
| Provider | kubevirt/kubevirt + hashicorp/kubernetes |
hashicorp/azurerm + Azure/azapi |
| State of provider maturity | Community-maintained, limited CRD coverage | Microsoft-backed, but HCI-specific resources are still evolving |
| Offline operation | Fully functional without internet | Requires connectivity to Azure ARM (cloud control plane) |
| Resource model | Kubernetes CRDs (VirtualMachine, DataVolume) | Azure resource types (Microsoft.AzureStackHCI/*) |
VMware Terraform Provider (Current State)
The current VMware estate uses the hashicorp/vsphere Terraform provider. Understanding what the team currently has helps scope the migration effort.
# Current VMware configuration (for migration context)
provider "vsphere" {
user = var.vsphere_user
password = var.vsphere_password
vsphere_server = var.vsphere_server
allow_unverified_ssl = false
}
data "vsphere_datacenter" "dc" {
name = "DC-ZRH-01"
}
data "vsphere_compute_cluster" "cluster" {
name = "Cluster-Prod-01"
datacenter_id = data.vsphere_datacenter.dc.id
}
data "vsphere_datastore" "datastore" {
name = "VSAN-Prod-01"
datacenter_id = data.vsphere_datacenter.dc.id
}
data "vsphere_network" "network" {
name = "VLAN-100-Prod"
datacenter_id = data.vsphere_datacenter.dc.id
}
data "vsphere_virtual_machine" "template" {
name = "templates/rhel9-golden"
datacenter_id = data.vsphere_datacenter.dc.id
}
resource "vsphere_virtual_machine" "vm" {
name = "rhel9-appserver-01"
resource_pool_id = data.vsphere_compute_cluster.cluster.resource_pool_id
datastore_id = data.vsphere_datastore.datastore.id
num_cpus = 4
memory = 8192
guest_id = data.vsphere_virtual_machine.template.guest_id
network_interface {
network_id = data.vsphere_network.network.id
adapter_type = data.vsphere_virtual_machine.template.network_interface_types[0]
}
disk {
label = "disk0"
size = 100
thin_provisioned = true
}
clone {
template_uuid = data.vsphere_virtual_machine.template.id
customize {
linux_options {
host_name = "rhel9-appserver-01"
domain = "internal.example.com"
}
network_interface {
ipv4_address = "10.100.1.50"
ipv4_netmask = 24
}
ipv4_gateway = "10.100.1.1"
}
}
}
Migration from vSphere provider to KubeVirt / Azure provider:
The migration is not a find-and-replace exercise. The resource models are fundamentally different:
vsphere_virtual_machineis a monolithic resource that encapsulates CPU, memory, disk, network, and guest customization in a single resource block.kubevirt_virtual_machinefollows the same pattern but with Kubernetes-native field names and structures. The mapping is conceptually straightforward but syntactically different.- VMware data sources (
vsphere_datacenter,vsphere_compute_cluster,vsphere_datastore) have no direct equivalents in KubeVirt. Kubernetes uses namespaces, StorageClasses, and NetworkAttachmentDefinitions instead. - VMware guest customization (hostname, IP address, domain join) is built into the VM resource. In KubeVirt, this is handled by cloud-init or sysprep as a separate volume.
Migration strategy:
- Inventory all existing Terraform-managed VMware resources.
- Write equivalent KubeVirt or Azure HCL configurations for each resource.
- Use MTV (Migration Toolkit for Virtualization) to migrate the VM data (disks, configurations).
- Import the migrated VMs into the new Terraform state using
terraform import. - Run
terraform planto verify that the imported state matches the new HCL. Resolve any drift. - Decommission the old VMware Terraform configurations.
This is a labor-intensive process. Budget for it explicitly.
Crossplane as a Kubernetes-Native Alternative to Terraform
Crossplane is a CNCF project that brings the Terraform-style declarative resource management model into Kubernetes itself. Instead of running Terraform from a CI/CD pipeline or a developer workstation, Crossplane runs as a set of controllers inside the Kubernetes cluster and manages infrastructure through Custom Resources.
Crossplane Architecture
+====================================================================+
| Kubernetes Cluster (same cluster as OVE, or a dedicated mgmt |
| cluster) |
| |
| +-------------------------------+ |
| | Crossplane Core Controllers | |
| | - Package manager | Watches Crossplane CRDs |
| | - Composition engine | and reconciles external |
| | - Provider reconcilers | resources via provider APIs |
| +-------------------------------+ |
| |
| +-------------------------------+ +----------------------------+ |
| | Provider: provider-kubernetes | | Provider: provider-azure | |
| | Manages K8s resources | | Manages Azure resources | |
| | (VMs, DataVolumes, etc.) | | (HCI VMs, networks, etc.) | |
| +-------------------------------+ +----------------------------+ |
| |
| Custom Resources: |
| +-------------------------------+ |
| | kind: VirtualMachine | Crossplane XRD (Composite |
| | apiVersion: infra.example/v1 | Resource Definition) that |
| | spec: | abstracts the platform- |
| | cpu: 4 | specific details behind a |
| | memory: 8Gi | unified API |
| | image: rhel9 | |
| +-------------------------------+ |
+====================================================================+
| |
v v
+------------------+ +--------------------+
| KubeVirt API | | Azure ARM API |
| (on-prem cluster)| | (cloud control |
| | | plane) |
+------------------+ +--------------------+
Why Crossplane matters for this evaluation:
- Platform abstraction. Crossplane's Composite Resource Definitions (XRDs) allow the platform team to define a unified
VirtualMachineAPI that works across both OVE and Azure Local. Application teams request VMs through a platform-specific abstraction; the Crossplane composition translates this into the correct provider-specific resources. - GitOps-native. Crossplane resources are Kubernetes CRDs, which means they can be managed by ArgoCD or Flux just like any other Kubernetes resource. This eliminates the need for a separate Terraform state backend and CI/CD pipeline.
- No external state. The Kubernetes API server (etcd) is the state store. There is no
terraform.tfstatefile to manage, lock, or back up. - Continuous reconciliation. Unlike Terraform (which only reconciles on
apply), Crossplane controllers continuously reconcile the desired state with reality. If someone manually deletes a VM, Crossplane recreates it.
Tradeoff: Crossplane adds complexity to the Kubernetes cluster (more controllers, more CRDs, more RBAC to manage). It is a good fit for organizations that are all-in on Kubernetes; it is a poor fit if the team wants to keep IaC separate from the runtime platform.
2. Ansible Modules for VM Management
Ansible Fundamentals
Ansible is an agentless automation engine. It connects to target systems over SSH (Linux) or WinRM (Windows), executes modules to achieve a desired state, and reports results. Unlike Terraform (which is purely declarative and focused on provisioning), Ansible is procedural-first (playbooks execute tasks in order) with idempotent modules that can be used declaratively.
Core concepts:
- Inventory: A list of hosts (or, for KubeVirt, a connection to the Kubernetes API) that Ansible manages. Can be static (INI/YAML files) or dynamic (scripts or plugins that query an API).
- Playbook: A YAML file defining a sequence of tasks to execute. Playbooks are the unit of automation.
- Role: A reusable, parameterized bundle of tasks, templates, handlers, and variables. Roles are how teams share standardized automation.
- Collection: A distribution format for Ansible content (modules, roles, plugins). Collections are installed via
ansible-galaxy. Key collections for this evaluation:kubernetes.core,kubevirt.core,community.windows,community.vmware. - Module: A unit of work (e.g., "ensure this VM exists with 4 CPUs", "ensure this package is installed"). Modules are idempotent -- running them twice produces the same result.
- Idempotency: The guarantee that applying the same playbook multiple times has the same effect as applying it once. A module that creates a VM will skip creation if the VM already exists. This is what makes Ansible suitable for declarative-style infrastructure management despite its procedural execution model.
Ansible Execution Model
+==================================================================+
| Control Node (where ansible-playbook runs) |
| |
| +----------------------------+ |
| | ansible-playbook site.yml | |
| +----------------------------+ |
| | |
| v |
| +----------------------------+ |
| | Parse playbook YAML | |
| | Resolve roles, variables | |
| | Build task execution plan | |
| +----------------------------+ |
| | |
| v |
| +----------------------------+ |
| | For each host in inventory | |
| | For each task: | |
| | 1. Generate module | |
| | code + arguments | |
| | 2. Transfer to host | <--- SSH (Linux) or |
| | (via SSH/WinRM) | WinRM (Windows) |
| | 3. Execute module | |
| | 4. Collect result JSON | |
| | 5. Report changed/ok/ | |
| | failed | |
| +----------------------------+ |
| | |
| v |
| Results: |
| ok=12 changed=3 unreachable=0 failed=0 |
+==================================================================+
For KubeVirt management, the "host" is localhost (the control node),
and modules communicate with the Kubernetes API server via kubeconfig
instead of SSH to individual VMs.
Ansible vs. Terraform -- when to use which:
IaC Tool Decision Matrix
+------------------------------------------------------------------+
| Provisioning new infrastructure? |
| (VMs, networks, storage, DNS) |
| |
| YES --> Terraform / OpenTofu / Crossplane |
| Declarative, state-tracked, plan-before-apply |
| Good at: creating, modifying, destroying infra |
| Bad at: configuring software inside VMs |
| |
| Configuring existing infrastructure? |
| (OS patching, package install, config files, certificates) |
| |
| YES --> Ansible |
| Procedural + idempotent, agentless, SSH-based |
| Good at: OS config, app deployment, day-2 operations |
| Bad at: tracking infrastructure state, destroy lifecycle|
| |
| Both? |
| |
| YES --> Terraform for provisioning + Ansible for configuration |
| Terraform creates the VM, Ansible configures the OS |
| Common pattern: Terraform outputs VM IP, Ansible uses |
| it as dynamic inventory |
+------------------------------------------------------------------+
For KubeVirt/OVE, the boundary blurs. Ansible's kubevirt.core collection can both provision VMs (create VirtualMachine CRs) and configure guest OSes (via SSH after the VM boots). A single playbook can do both. Whether to use Terraform or Ansible for provisioning is a team preference and organizational standard, not a technical requirement.
KubeVirt Ansible
Two Ansible collections are relevant for managing KubeVirt VMs:
-
kubernetes.core-- the general-purpose Kubernetes collection. Itsk8smodule can create, update, and delete any Kubernetes resource, including KubeVirt CRDs. This is the "raw manifest" approach. -
kubevirt.core-- a KubeVirt-specific collection with purpose-built modules:kubevirt_vm-- manage VirtualMachine CRskubevirt_vmi-- manage VirtualMachineInstance CRs (for direct VMI manipulation, rare in practice)- inventory plugin -- dynamically discover running VMs as Ansible inventory hosts
Installation:
# Install required collections
ansible-galaxy collection install kubernetes.core kubevirt.core
# Verify
ansible-galaxy collection list | grep -E "kubernetes|kubevirt"
# kubernetes.core 3.2.0
# kubevirt.core 1.5.0
Full playbook example -- VM provisioning, configuration, and lifecycle operations:
This is a complete, production-oriented playbook that demonstrates end-to-end VM management with KubeVirt.
# file: playbooks/provision-appserver.yml
# Purpose: Provision a RHEL 9 VM on KubeVirt, wait for it to boot,
# configure the guest OS, and verify readiness.
#
# Usage:
# ansible-playbook playbooks/provision-appserver.yml \
# -e vm_name=rhel9-appserver-01 \
# -e namespace=vm-workloads \
# -e cpu_cores=4 \
# -e memory=8Gi \
# -e disk_size=100Gi
---
- name: Provision KubeVirt VM
hosts: localhost
connection: local
gather_facts: false
vars:
vm_name: "rhel9-appserver-01"
namespace: "vm-workloads"
cpu_cores: 4
memory: "8Gi"
disk_size: "100Gi"
storage_class: "ocs-storagecluster-ceph-rbd"
golden_image_pvc: "rhel9-golden-20240601"
golden_image_namespace: "golden-images"
network_name: "vlan-100-prod"
ssh_public_key: "{{ lookup('file', '~/.ssh/id_ed25519.pub') }}"
tasks:
# ---------------------------------------------------------------
# Step 1: Ensure namespace exists
# ---------------------------------------------------------------
- name: Create namespace if it does not exist
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: "{{ namespace }}"
labels:
managed-by: ansible
migration-tier: premium
# ---------------------------------------------------------------
# Step 2: Create the DataVolume (clone from golden image)
# ---------------------------------------------------------------
- name: Create root disk DataVolume
kubernetes.core.k8s:
state: present
definition:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: "{{ vm_name }}-rootdisk"
namespace: "{{ namespace }}"
labels:
app.kubernetes.io/name: "{{ vm_name }}"
app.kubernetes.io/managed-by: ansible
spec:
source:
pvc:
name: "{{ golden_image_pvc }}"
namespace: "{{ golden_image_namespace }}"
pvc:
accessModes:
- ReadWriteMany
resources:
requests:
storage: "{{ disk_size }}"
storageClassName: "{{ storage_class }}"
- name: Wait for DataVolume to complete cloning
kubernetes.core.k8s_info:
api_version: cdi.kubevirt.io/v1beta1
kind: DataVolume
name: "{{ vm_name }}-rootdisk"
namespace: "{{ namespace }}"
register: dv_status
until: >-
dv_status.resources | length > 0 and
dv_status.resources[0].status.phase | default('') == 'Succeeded'
retries: 60
delay: 10
# ---------------------------------------------------------------
# Step 3: Create the VirtualMachine
# ---------------------------------------------------------------
- name: Create VirtualMachine
kubevirt.core.kubevirt_vm:
state: present
name: "{{ vm_name }}"
namespace: "{{ namespace }}"
labels:
app.kubernetes.io/name: "{{ vm_name }}"
app.kubernetes.io/managed-by: ansible
env: production
running: true
spec:
domain:
cpu:
cores: "{{ cpu_cores }}"
sockets: 1
threads: 1
resources:
requests:
memory: "{{ memory }}"
limits:
memory: "{{ memory }}"
devices:
disks:
- name: rootdisk
disk:
bus: virtio
- name: cloudinit
disk:
bus: virtio
interfaces:
- name: prod-net
bridge: {}
rng: {}
networks:
- name: prod-net
multus:
networkName: "{{ network_name }}"
volumes:
- name: rootdisk
dataVolume:
name: "{{ vm_name }}-rootdisk"
- name: cloudinit
cloudInitNoCloud:
userData: |
#cloud-config
hostname: {{ vm_name }}
fqdn: {{ vm_name }}.internal.example.com
manage_etc_hosts: true
users:
- name: sysadmin
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
- {{ ssh_public_key }}
packages:
- qemu-guest-agent
- python3
runcmd:
- systemctl enable --now qemu-guest-agent
# ---------------------------------------------------------------
# Step 4: Wait for VM to be running and guest agent to report
# ---------------------------------------------------------------
- name: Wait for VMI to reach Running phase
kubernetes.core.k8s_info:
api_version: kubevirt.io/v1
kind: VirtualMachineInstance
name: "{{ vm_name }}"
namespace: "{{ namespace }}"
register: vmi_status
until: >-
vmi_status.resources | length > 0 and
vmi_status.resources[0].status.phase | default('') == 'Running'
retries: 30
delay: 10
- name: Wait for guest agent to report IP address
kubernetes.core.k8s_info:
api_version: kubevirt.io/v1
kind: VirtualMachineInstance
name: "{{ vm_name }}"
namespace: "{{ namespace }}"
register: vmi_info
until: >-
vmi_info.resources[0].status.interfaces | default([]) | length > 0 and
vmi_info.resources[0].status.interfaces[0].ipAddress | default('') != ''
retries: 30
delay: 10
- name: Extract VM IP address
ansible.builtin.set_fact:
vm_ip: "{{ vmi_info.resources[0].status.interfaces[0].ipAddress }}"
- name: Display VM information
ansible.builtin.debug:
msg: |
VM provisioned successfully:
Name: {{ vm_name }}
Namespace: {{ namespace }}
IP: {{ vm_ip }}
CPU: {{ cpu_cores }} cores
Memory: {{ memory }}
Disk: {{ disk_size }}
# ---------------------------------------------------------------
# Step 5: Add the VM to in-memory inventory for configuration
# ---------------------------------------------------------------
- name: Add VM to runtime inventory
ansible.builtin.add_host:
name: "{{ vm_ip }}"
groups: new_vms
ansible_user: sysadmin
ansible_ssh_private_key_file: "~/.ssh/id_ed25519"
ansible_ssh_common_args: "-o StrictHostKeyChecking=no"
# ===================================================================
# Play 2: Configure the guest OS
# ===================================================================
- name: Configure VM guest OS
hosts: new_vms
become: true
gather_facts: true
tasks:
- name: Wait for SSH to become available
ansible.builtin.wait_for_connection:
delay: 10
timeout: 300
- name: Gather facts after connection
ansible.builtin.setup:
- name: Update all packages
ansible.builtin.dnf:
name: "*"
state: latest
register: pkg_update
- name: Install standard tooling
ansible.builtin.dnf:
name:
- vim
- tmux
- htop
- net-tools
- bind-utils
- lsof
- strace
- tcpdump
- chrony
state: present
- name: Configure chrony for NTP
ansible.builtin.copy:
dest: /etc/chrony.conf
content: |
server ntp1.internal.example.com iburst
server ntp2.internal.example.com iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
owner: root
group: root
mode: "0644"
notify: restart chrony
- name: Configure sysctl tuning
ansible.posix.sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
sysctl_set: true
reload: true
loop:
- { key: "net.core.somaxconn", value: "65535" }
- { key: "vm.swappiness", value: "10" }
- { key: "net.ipv4.tcp_max_syn_backlog", value: "65535" }
- { key: "fs.file-max", value: "2097152" }
- name: Ensure firewalld is running
ansible.builtin.systemd:
name: firewalld
state: started
enabled: true
- name: Open application port
ansible.posix.firewalld:
port: 8443/tcp
permanent: true
state: enabled
immediate: true
- name: Verification -- display system info
ansible.builtin.debug:
msg: |
Configuration complete:
Hostname: {{ ansible_hostname }}
OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
Kernel: {{ ansible_kernel }}
CPUs: {{ ansible_processor_vcpus }}
Memory: {{ ansible_memtotal_mb }} MB
IP: {{ ansible_default_ipv4.address | default('N/A') }}
Updates: {{ pkg_update.results | default([]) | length }} packages updated
handlers:
- name: restart chrony
ansible.builtin.systemd:
name: chronyd
state: restarted
Day-2 operations playbook -- patching, scaling, certificates:
# file: playbooks/day2-operations.yml
# Purpose: Common day-2 lifecycle operations for KubeVirt VMs
#
# Usage (patching):
# ansible-playbook playbooks/day2-operations.yml \
# --tags patch -e target_vms=vm-workloads
#
# Usage (scale CPU):
# ansible-playbook playbooks/day2-operations.yml \
# --tags scale -e vm_name=rhel9-appserver-01 \
# -e namespace=vm-workloads -e new_cpu_cores=8
---
# --- Tag: patch -- Patch guest OS packages ---
- name: "Day-2: Patch guest OS"
hosts: "{{ target_vms | default('all') }}"
become: true
tags: [patch]
tasks:
- name: Update all packages
ansible.builtin.dnf:
name: "*"
state: latest
register: patch_result
- name: Display patch results
ansible.builtin.debug:
msg: "{{ patch_result.results | default([]) | length }} packages updated"
- name: Check if reboot is required
ansible.builtin.stat:
path: /var/run/reboot-required
register: reboot_flag
- name: Reboot if required (with wait)
ansible.builtin.reboot:
msg: "Rebooting for kernel update"
reboot_timeout: 600
when: reboot_flag.stat.exists | default(false)
# --- Tag: scale -- Scale VM CPU/Memory via KubeVirt API ---
- name: "Day-2: Scale VM resources"
hosts: localhost
connection: local
gather_facts: false
tags: [scale]
vars:
vm_name: ""
namespace: "vm-workloads"
new_cpu_cores: 4
new_memory: "8Gi"
tasks:
- name: Patch VirtualMachine CPU and memory
kubernetes.core.k8s:
state: present
merge_type: merge
definition:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: "{{ vm_name }}"
namespace: "{{ namespace }}"
spec:
template:
spec:
domain:
cpu:
cores: "{{ new_cpu_cores }}"
resources:
requests:
memory: "{{ new_memory }}"
limits:
memory: "{{ new_memory }}"
- name: Restart VM to apply changes (if hot-plug not available)
kubernetes.core.k8s:
state: present
definition:
apiVersion: subresources.kubevirt.io/v1
kind: VirtualMachine
metadata:
name: "{{ vm_name }}"
namespace: "{{ namespace }}"
# Note: For VMs with LiveUpdate enabled and maxSockets configured,
# CPU hot-plug applies without restart. Otherwise, a restart is
# needed. The restart can be triggered via:
# virtctl restart {{ vm_name }} -n {{ namespace }}
# or by toggling the VM's runStrategy.
# --- Tag: certs -- Rotate TLS certificates ---
- name: "Day-2: Rotate application certificates"
hosts: "{{ target_vms | default('all') }}"
become: true
tags: [certs]
vars:
cert_source_dir: "files/certs"
cert_dest_dir: "/etc/pki/tls"
tasks:
- name: Copy new certificate
ansible.builtin.copy:
src: "{{ cert_source_dir }}/app.crt"
dest: "{{ cert_dest_dir }}/certs/app.crt"
owner: root
group: root
mode: "0644"
notify: reload application
- name: Copy new private key
ansible.builtin.copy:
src: "{{ cert_source_dir }}/app.key"
dest: "{{ cert_dest_dir }}/private/app.key"
owner: root
group: root
mode: "0600"
notify: reload application
- name: Verify certificate validity
ansible.builtin.command:
cmd: >-
openssl x509 -in {{ cert_dest_dir }}/certs/app.crt
-noout -dates -subject
register: cert_info
changed_when: false
- name: Display certificate info
ansible.builtin.debug:
msg: "{{ cert_info.stdout }}"
handlers:
- name: reload application
ansible.builtin.systemd:
name: myapp
state: reloaded
KubeVirt dynamic inventory plugin:
The kubevirt.core collection includes an inventory plugin that discovers running VMs and makes them available as Ansible hosts. This eliminates the need to maintain a static inventory file.
# file: inventory/kubevirt.yml
# Ansible dynamic inventory for KubeVirt VMs
plugin: kubevirt.core.kubevirt
connections:
- namespaces:
- vm-workloads
- vm-development
# Filter: only VMs with the "ansible-managed: true" label
label_selector: "ansible-managed=true"
# Network interface to use for SSH connection
network_name: default
# Use the guest agent-reported IP address
use_service: false
# Map KubeVirt labels to Ansible groups
compose:
ansible_host: >-
status.interfaces[0].ipAddress
ansible_user: "'sysadmin'"
ansible_ssh_private_key_file: "'~/.ssh/id_ed25519'"
# Group VMs by labels
keyed_groups:
- key: labels['env']
prefix: env
separator: "_"
- key: labels['app.kubernetes.io/part-of']
prefix: app
separator: "_"
# Test the dynamic inventory
ansible-inventory -i inventory/kubevirt.yml --list
# Use in a playbook
ansible-playbook -i inventory/kubevirt.yml playbooks/day2-operations.yml --tags patch
Hyper-V Ansible
Managing Hyper-V / Azure Local VMs with Ansible requires the community.windows collection and WinRM connectivity to the Hyper-V hosts.
Prerequisites:
- WinRM must be enabled and configured on all Hyper-V hosts (HTTPS preferred).
- A service account with Hyper-V operator privileges.
pywinrmPython package installed on the Ansible control node.
# file: inventory/hyperv-hosts.yml
all:
children:
hyperv_hosts:
hosts:
hv-node-01.internal.example.com:
ansible_connection: winrm
ansible_winrm_transport: credssp
ansible_winrm_server_cert_validation: validate
ansible_port: 5986
hv-node-02.internal.example.com:
ansible_connection: winrm
ansible_winrm_transport: credssp
ansible_winrm_server_cert_validation: validate
ansible_port: 5986
vars:
ansible_user: "DOMAIN\\svc-ansible"
ansible_password: "{{ vault_hyperv_password }}"
# file: playbooks/hyperv-provision-vm.yml
# Provision a VM on Hyper-V using the community.windows collection
---
- name: Provision Hyper-V VM
hosts: hyperv_hosts[0]
gather_facts: false
vars:
vm_name: "rhel9-appserver-01"
vm_cpu: 4
vm_memory_mb: 8192
vm_disk_path: "C:\\ClusterStorage\\Volume1\\VMs\\{{ vm_name }}"
vm_vhdx_size_bytes: 107374182400 # 100 GB
vm_switch: "Prod-vSwitch"
iso_path: "C:\\ISOs\\rhel-9.3-x86_64-dvd.iso"
tasks:
- name: Create VM directory
ansible.windows.win_file:
path: "{{ vm_disk_path }}"
state: directory
- name: Create VHDX disk
community.windows.win_powershell:
script: |
$vhdx = "{{ vm_disk_path }}\\{{ vm_name }}.vhdx"
if (-not (Test-Path $vhdx)) {
New-VHD -Path $vhdx -SizeBytes {{ vm_vhdx_size_bytes }} -Dynamic
Write-Output "created"
} else {
Write-Output "exists"
}
register: vhdx_result
- name: Create Hyper-V VM
community.windows.win_powershell:
script: |
$vm = Get-VM -Name "{{ vm_name }}" -ErrorAction SilentlyContinue
if (-not $vm) {
New-VM -Name "{{ vm_name }}" `
-MemoryStartupBytes {{ vm_memory_mb }}MB `
-VHDPath "{{ vm_disk_path }}\\{{ vm_name }}.vhdx" `
-SwitchName "{{ vm_switch }}" `
-Generation 2
Set-VM -Name "{{ vm_name }}" `
-ProcessorCount {{ vm_cpu }} `
-DynamicMemory `
-MemoryMinimumBytes 2GB `
-MemoryMaximumBytes {{ vm_memory_mb }}MB
# Enable Secure Boot with Microsoft UEFI CA
Set-VMFirmware -VMName "{{ vm_name }}" `
-SecureBootTemplate MicrosoftUEFICertificateAuthority
# Attach ISO for installation
Add-VMDvdDrive -VMName "{{ vm_name }}" `
-Path "{{ iso_path }}"
Write-Output "created"
} else {
Write-Output "exists"
}
register: vm_create_result
- name: Start VM
community.windows.win_powershell:
script: |
Start-VM -Name "{{ vm_name }}"
when: vm_create_result.output[0] == "created"
Note: The community.windows collection does not have a dedicated win_hyperv_guest module with full idempotency guarantees comparable to the kubevirt.core.kubevirt_vm module. Most Hyper-V operations require using win_powershell with custom PowerShell scripts and manual idempotency checks (the if (-not $vm) pattern above). This is a significant maturity gap compared to the KubeVirt Ansible integration.
For Azure Local specifically, Microsoft recommends using the Azure CLI (az stack-hci vm create) or ARM templates rather than direct Hyper-V PowerShell commands, because the VMs must be registered as Arc resources. Ansible can invoke Azure CLI commands via ansible.builtin.command or use the azure.azcollection collection for ARM-level operations.
VMware Ansible (Current State)
The current VMware estate uses the community.vmware collection, which provides mature, well-tested modules:
| Module | Purpose |
|---|---|
community.vmware.vmware_guest |
Create, manage, and delete VMs |
community.vmware.vmware_guest_disk |
Manage VM disks |
community.vmware.vmware_guest_network |
Manage VM network adapters |
community.vmware.vmware_guest_powerstate |
Control VM power state |
community.vmware.vmware_guest_snapshot |
Manage VM snapshots |
community.vmware.vmware_vmotion |
Trigger vMotion migrations |
community.vmware.vmware_cluster_info |
Query cluster information |
The community.vmware collection is one of the most mature Ansible collections. Any replacement must provide comparable module coverage and idempotency guarantees. As of 2026, the kubevirt.core collection is functional but has fewer modules. The community.windows collection for Hyper-V is even thinner. This gap must be factored into the migration timeline.
Ansible Automation Platform (AAP)
For an enterprise with 5,000+ VMs, running ansible-playbook from a developer's laptop is not a governance-compliant operating model. Red Hat's Ansible Automation Platform (AAP) -- formerly Ansible Tower -- provides:
- Web UI and REST API for launching playbooks and viewing results.
- Role-Based Access Control (RBAC): Define who can run which playbooks against which inventory groups. Critical for separation-of-duties requirements in financial services.
- Credential management: Store SSH keys, kubeconfig files, Azure service principal secrets, and vCenter passwords in an encrypted credential store. Playbooks never see plaintext secrets.
- Approval workflows: Require a second person to approve a playbook run before it executes. Mandatory for production changes in regulated environments.
- Job scheduling: Run playbooks on a cron schedule (e.g., nightly patching, weekly compliance scans).
- Execution environments: Containerized Ansible execution with pinned collection versions and Python dependencies. Eliminates "works on my laptop" problems.
- Audit trail: Every job run is logged with who initiated it, what parameters were used, what changed, and the full output. This log is the compliance artifact.
Ansible Automation Platform Architecture
+====================================================================+
| Ansible Automation Platform (AAP) |
| |
| +-----------------------------+ |
| | Automation Controller | Web UI + REST API |
| | (formerly Tower) | RBAC, credential store, |
| | | job scheduling, approvals |
| +-------------+---------------+ |
| | |
| v |
| +-----------------------------+ |
| | Execution Environments | Containerized Ansible |
| | (container images with | execution with pinned |
| | collections + Python deps) | dependencies |
| +-------------+---------------+ |
| | |
+====================================================================+
|
+----------+----------+------------------+
| | |
v v v
+------------+ +------------+ +------------+
| KubeVirt | | Hyper-V | | VM Guest |
| API Server | | Hosts | | OS (SSH) |
| (kubeconfig)| | (WinRM) | | |
+------------+ +------------+ +------------+
AAP is particularly relevant for the OVE evaluation because Red Hat bundles AAP with OpenShift Platform Plus. If the organization selects OVE, AAP becomes a natural fit for VM lifecycle automation -- provisioning VMs through KubeVirt, configuring guest OSes via SSH, and managing day-2 operations, all through a single automation platform with audit logging and approval workflows.
GitOps with ArgoCD/Flux: The Kubernetes-Native IaC Model
For organizations adopting OVE, GitOps is potentially the most natural IaC model because KubeVirt VMs are Kubernetes-native Custom Resources. Instead of running Terraform from a pipeline or Ansible from AAP, you store VirtualMachine manifests in a Git repository and let a GitOps controller (ArgoCD or Flux) synchronize them to the cluster.
What is GitOps?
GitOps is an operational model where:
- The desired state of all infrastructure is declared in Git (the single source of truth).
- A controller running in the cluster continuously compares the desired state (Git) with the actual state (cluster).
- When drift is detected, the controller automatically reconciles (applies the changes from Git to the cluster).
- All changes go through Git -- pull requests, code review, approval, merge. No direct
kubectl apply.
GitOps Sync Loop
+============================================================+
| |
| Git Repository |
| (e.g., GitLab, GitHub) |
| |
| repo: infra-gitops/vm-workloads |
| +------------------------------------------------------+ |
| | main branch | |
| | | |
| | vm-workloads/ | |
| | rhel9-appserver-01.yaml (VirtualMachine CR) | |
| | rhel9-appserver-02.yaml (VirtualMachine CR) | |
| | rhel9-database-01.yaml (VirtualMachine CR) | |
| | network-policy.yaml (NetworkPolicy) | |
| | resource-quota.yaml (ResourceQuota) | |
| | kustomization.yaml (Kustomize overlay) | |
| +------------------------------------------------------+ |
| |
+=======================+====================================+
|
1. ArgoCD polls Git repo
(or webhook triggers sync)
|
v
+=======================+====================================+
| Kubernetes Cluster (OVE) |
| |
| +------------------------------------------------------+ |
| | ArgoCD Controller | |
| | | |
| | 2. Compare Git manifests with live cluster state | |
| | | |
| | 3a. If in sync --> no action (healthy) | |
| | 3b. If out of sync --> apply diff to cluster | |
| | (create/update/delete resources) | |
| | | |
| | 4. Report sync status back to ArgoCD UI / Git | |
| +------------------------------------------------------+ |
| |
| +------------------------------------------------------+ |
| | Live Resources | |
| | VirtualMachine: rhel9-appserver-01 (Running) | |
| | VirtualMachine: rhel9-appserver-02 (Running) | |
| | VirtualMachine: rhel9-database-01 (Running) | |
| +------------------------------------------------------+ |
| |
+============================================================+
How ArgoCD syncs VirtualMachine manifests from Git to the cluster:
ArgoCD works with any Kubernetes resource, including KubeVirt CRDs. No special configuration is needed to manage VirtualMachines -- ArgoCD treats them like any other CR.
Step 1: Store VM manifests in Git.
# file: vm-workloads/base/rhel9-appserver-01.yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: rhel9-appserver-01
namespace: vm-workloads
labels:
app.kubernetes.io/name: rhel9-appserver-01
app.kubernetes.io/managed-by: argocd
env: production
annotations:
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
runStrategy: Always
template:
metadata:
labels:
kubevirt.io/domain: rhel9-appserver-01
spec:
domain:
cpu:
cores: 4
sockets: 1
threads: 1
resources:
requests:
memory: 8Gi
limits:
memory: 8Gi
devices:
disks:
- name: rootdisk
disk:
bus: virtio
- name: cloudinit
disk:
bus: virtio
interfaces:
- name: prod-net
bridge: {}
rng: {}
networks:
- name: prod-net
multus:
networkName: vlan-100-prod
volumes:
- name: rootdisk
dataVolume:
name: rhel9-appserver-01-rootdisk
- name: cloudinit
cloudInitNoCloud:
userData: |
#cloud-config
hostname: rhel9-appserver-01
users:
- name: sysadmin
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- ssh-ed25519 AAAA... admin@example.com
packages:
- qemu-guest-agent
runcmd:
- systemctl enable --now qemu-guest-agent
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: rhel9-appserver-01-rootdisk
namespace: vm-workloads
spec:
source:
pvc:
name: rhel9-golden-20240601
namespace: golden-images
pvc:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
storageClassName: ocs-storagecluster-ceph-rbd
Step 2: Use Kustomize for environment-specific overlays.
# file: vm-workloads/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- rhel9-appserver-01.yaml
- rhel9-appserver-02.yaml
- rhel9-database-01.yaml
commonLabels:
managed-by: argocd
team: platform-engineering
# file: vm-workloads/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namespace: vm-workloads-prod
patches:
- target:
kind: VirtualMachine
patch: |
- op: replace
path: /spec/template/spec/domain/resources/requests/memory
value: 16Gi
- op: replace
path: /spec/template/spec/domain/resources/limits/memory
value: 16Gi
# file: vm-workloads/overlays/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namespace: vm-workloads-staging
patches:
- target:
kind: VirtualMachine
patch: |
- op: replace
path: /spec/template/spec/domain/cpu/cores
value: 2
- op: replace
path: /spec/template/spec/domain/resources/requests/memory
value: 4Gi
- op: replace
path: /spec/template/spec/domain/resources/limits/memory
value: 4Gi
Step 3: Create an ArgoCD Application.
# file: argocd/applications/vm-workloads-prod.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: vm-workloads-prod
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: infrastructure
source:
repoURL: https://gitlab.internal.example.com/infra-gitops/vm-workloads.git
targetRevision: main
path: overlays/production
destination:
server: https://kubernetes.default.svc
namespace: vm-workloads-prod
syncPolicy:
automated:
prune: false # Do NOT delete VMs removed from Git (safety)
selfHeal: true # Re-apply if someone manually changes a VM
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
- RespectIgnoreDifferences=true
retry:
limit: 3
backoff:
duration: 30s
factor: 2
maxDuration: 5m
ignoreDifferences:
# Ignore status fields that KubeVirt controllers update
- group: kubevirt.io
kind: VirtualMachine
jsonPointers:
- /status
- group: cdi.kubevirt.io
kind: DataVolume
jsonPointers:
- /status
Key ArgoCD configuration decisions for VMs:
| Decision | Recommendation | Rationale |
|---|---|---|
| Automated sync | Yes, with selfHeal: true |
Detect and revert manual kubectl changes. Enforces Git as the source of truth. |
| Pruning | prune: false (initially) |
Removing a VM YAML from Git should NOT automatically delete a running production VM. This is a safety measure. Enable selective pruning per resource type after the team gains confidence. |
| Server-side apply | Yes | Required for large CRDs like VirtualMachine. Client-side apply can hit annotation size limits. |
| Ignore differences | Ignore /status on VMs and DataVolumes |
KubeVirt controllers continuously update the status subresource. ArgoCD would otherwise show perpetual "OutOfSync" status. |
GitOps vs. Terraform for VM lifecycle management:
| Aspect | Terraform | GitOps (ArgoCD/Flux) |
|---|---|---|
| Source of truth | .tf files + terraform.tfstate |
Git repository (manifests in YAML) |
| State storage | External backend (S3, Azure Blob, etc.) | Kubernetes API server (etcd) |
| Reconciliation | On-demand (terraform apply) |
Continuous (controller polls or watches Git) |
| Drift detection | On terraform plan |
Continuous (controller compares Git vs. live) |
| Drift correction | Manual (terraform apply) |
Automatic (if selfHeal: true) |
| Multi-platform | Excellent (any provider) | Kubernetes-only (CRDs must live in K8s) |
| Learning curve | HCL language, state management | YAML/Kustomize/Helm, ArgoCD configuration |
| Destroy workflow | terraform destroy (explicit) |
Remove YAML from Git + enable pruning (risky) |
| Imperative operations | Not supported (use virtctl/Ansible) |
Not supported (use virtctl/Ansible) |
| Audit trail | CI/CD pipeline logs | Git commit history (who, when, what, why) |
| Best fit | Multi-cloud, hybrid environments | Kubernetes-native platforms (OVE) |
The GitOps recommendation for OVE:
For an OVE deployment, GitOps is the strongest IaC model because:
- VMs are Kubernetes CRDs -- they are native Git-syncable objects.
- No external state file to manage, lock, or lose.
- The Git commit history is the audit trail. Every VM change is a reviewed, approved pull request.
- Self-healing reverts unauthorized manual changes -- a strong governance control.
- The same ArgoCD instance that manages application deployments also manages VMs -- one operational model.
However, GitOps alone does not cover everything:
- Day-2 guest OS operations (patching, certificate rotation, configuration changes inside the VM) still require Ansible or a similar configuration management tool. ArgoCD manages the VM resource definition; Ansible manages the software inside the VM.
- Imperative lifecycle operations (live migration, console access, snapshot triggers) require
virtctlor Ansible. These are not declarative operations that can be expressed as a YAML manifest in Git. - One-off or experimental VMs may be faster to create via
virtctlorkubectlthan through a full Git pull request workflow. Consider a "sandbox" namespace excluded from GitOps where the team can experiment freely.
The combined model:
Recommended IaC Architecture for OVE
+====================================================================+
| |
| Git Repository |
| +--------------------------------------------------------------+ |
| | VM Definitions (YAML) | Kustomize Overlays | Helm Charts | |
| +--------------------------------------------------------------+ |
| | |
| Pull Request --> Review --> Approve --> Merge to main |
| | |
+================+===================================================+
|
+--------+--------+
| |
v v
+-------+------+ +------+-------+
| ArgoCD | | AAP / Ansible|
| (GitOps) | | (Day-2 Ops) |
| | | |
| Manages: | | Manages: |
| - VM CRDs | | - Guest OS |
| - DataVolumes| | - Patching |
| - Networks | | - Certs |
| - Quotas | | - Config |
| - Policies | | - Compliance |
+--------------+ +--------------+
| |
v v
+====================================================================+
| OVE Cluster (Kubernetes + KubeVirt) |
| - VirtualMachine CRDs managed by ArgoCD |
| - Guest OS configured by Ansible via SSH |
| - Imperative ops (migrate, console) via virtctl |
+====================================================================+
How the Candidates Handle This
| Capability | VMware (Current) | OVE (KubeVirt) | Azure Local (Hyper-V) | Swisscom ESC |
|---|---|---|---|---|
| Terraform Provider | hashicorp/vsphere -- mature, full-featured, widely used. Covers VMs, networks, storage, clusters, resource pools. |
kubevirt/kubevirt -- community-maintained, covers VMs and DataVolumes. Supplement with kubernetes_manifest for other CRDs. Less mature. |
hashicorp/azurerm + Azure/azapi -- Microsoft-backed, evolving HCI resource coverage. Strong for Azure-native operations. |
Depends on Swisscom API exposure. If VMware-based, uses hashicorp/vsphere. If API is abstracted, may require custom provider. |
| Ansible Collection | community.vmware -- 50+ modules, mature, well-documented. vmware_guest is the gold standard for VM management. |
kubevirt.core + kubernetes.core -- functional but fewer modules. kubevirt_vm for provisioning, k8s for raw manifests. Dynamic inventory plugin available. |
community.windows -- requires PowerShell scripting for Hyper-V. No dedicated win_hyperv_guest module with full idempotency. azure.azcollection for ARM-level operations. |
Depends on API exposure. Ansible can wrap any REST API via ansible.builtin.uri module. |
| GitOps (ArgoCD/Flux) | Not applicable. VMware resources are not Kubernetes CRDs. PowerCLI/govc scripts can be wrapped but this is not native GitOps. | Native fit. VMs are Kubernetes CRDs. ArgoCD/Flux sync VM manifests from Git. Self-healing, drift detection, audit trail via Git history. Recommended model. | Not natively applicable. Azure Local VMs are ARM resources, not Kubernetes CRDs. Azure GitOps (Flux on Arc-enabled clusters) manages Kubernetes workloads but not Hyper-V VMs. | Not applicable unless Swisscom exposes a Kubernetes-native API. |
| Crossplane | Possible via provider-terraform wrapping the vSphere provider, but adds complexity without clear benefit. | Native fit. provider-kubernetes manages KubeVirt CRDs. Enables platform abstraction via XRDs. Good for multi-cluster or multi-platform scenarios. |
Possible via provider-azure for ARM resources. Useful for hybrid OVE + Azure Local environments. |
Unlikely unless Swisscom provides a Crossplane-compatible API. |
| State Management | Terraform state file. Remote backend required. State segmentation by cluster/team. | GitOps: Kubernetes etcd (no external state). Terraform: standard state file. Crossplane: Kubernetes etcd. | Terraform state in Azure Blob Storage. ARM template state managed by Azure. | Depends on tooling choice. |
| Enterprise Automation | vRealize Automation (vRA) / Aria Automation. Mature self-service portal with approval workflows, blueprints, catalog. | AAP (Ansible Automation Platform). RBAC, credential management, approval workflows, job scheduling. Bundled with OpenShift Platform Plus. | Azure Automation / Azure DevOps pipelines. ARM templates, Bicep. RBAC via Azure AD. | Swisscom-managed. Self-service capabilities depend on ESC portal features. |
| Offline Operation | Fully offline. vCenter + Terraform run on-premises. | Fully offline. Kubernetes API + ArgoCD + Ansible all run on-premises. Git server can be on-premises (GitLab). | Partially offline. VMs run locally but ARM control plane requires Azure connectivity. Terraform apply requires Azure AD authentication. | Depends on Swisscom architecture. ESC is a managed service -- likely requires Swisscom network connectivity. |
| IaC Migration Effort | N/A (baseline). | High. Rewrite all Terraform HCL from vSphere to KubeVirt resources. Rewrite Ansible from community.vmware to kubevirt.core. Build GitOps repo structure. Estimated: 3-6 months for 5,000+ VMs. |
Medium-High. Rewrite Terraform HCL from vSphere to azurerm/azapi. Rewrite Ansible from community.vmware to community.windows + PowerShell. Estimated: 3-6 months. |
Low (if Swisscom manages IaC) to High (if customer manages IaC through Swisscom APIs). |
Key Takeaways
-
The IaC toolchain is not portable across platforms. Every Terraform configuration, every Ansible playbook, every CI/CD pipeline that touches VMware APIs must be rewritten for the target platform. This is not a configuration change -- it is a development project. Budget for it as a workstream with its own timeline, testing, and rollout plan.
-
GitOps is the strongest IaC model for OVE. Because KubeVirt VMs are Kubernetes CRDs, they integrate natively with ArgoCD/Flux. The Git repository becomes the single source of truth. The commit history becomes the audit trail. Self-healing reverts unauthorized changes. No external state file. For a regulated financial institution, the Git-based approval workflow (pull request --> review --> approve --> merge --> auto-sync) is a natural fit for change management processes.
-
Terraform remains the right choice for Azure Local. Azure Local VMs are ARM resources managed through the Azure control plane. The
azurermandazapiTerraform providers are the natural IaC interface. GitOps does not apply because Azure Local VMs are not Kubernetes CRDs. Consider Terraform Cloud or GitLab-managed Terraform state for enterprise state management. -
Ansible bridges the provisioning-configuration gap. Regardless of platform, Ansible is the right tool for day-2 guest OS operations: patching, certificate rotation, configuration management, compliance scanning. The
kubevirt.corecollection enables a single-playbook workflow that provisions the VM via the Kubernetes API and then configures the guest OS via SSH. For enterprise deployments, AAP provides the RBAC, credential management, and audit trail thatansible-playbookon a laptop does not. -
The KubeVirt Terraform provider is less mature than the vSphere provider. The team currently relies on a battle-tested Terraform provider with comprehensive resource coverage. The KubeVirt provider covers core resources but not the full CRD surface. The
kubernetes_manifestresource fills gaps but with weaker type safety. If Terraform is the chosen IaC tool for OVE (instead of GitOps), budget for workarounds and monitor provider releases closely. -
OpenTofu mitigates the Terraform license risk. The BSL license change does not affect end-users today, but organizational legal teams may flag it. OpenTofu is a drop-in replacement under the MPL-2.0 license. All examples in this chapter work with both tools. The decision between Terraform and OpenTofu is a legal and strategic question, not a technical one.
-
Crossplane is worth evaluating for multi-platform scenarios. If the organization operates both OVE and Azure Local (or plans to), Crossplane's Composite Resource Definitions can provide a unified VM provisioning API that abstracts platform-specific details. This reduces cognitive load for application teams but adds operational complexity to the platform team.
-
Offline operation is a differentiator. OVE's IaC stack (GitOps with ArgoCD, Ansible, on-premises Git server) operates fully offline. Azure Local's IaC stack (Terraform with azurerm, Azure AD authentication) requires connectivity to Azure. For financial institutions with strict air-gap or data sovereignty requirements, this is a material consideration.
-
The "destroy" workflow deserves special attention. Terraform
destroyand GitOps pruning both delete VMs permanently. For 5,000+ VMs in production, accidental deletion is a catastrophic risk. Implement safeguards:prevent_destroylifecycle blocks in Terraform,prune: falsein ArgoCD, deletion protection annotations, and mandatory approval gates before any destructive operation. -
IaC is a team capability, not just a tooling choice. The team must be trained on the new IaC tools, workflows, and operational patterns. A VMware team that has used PowerCLI for a decade will not become proficient in Kustomize overlays and ArgoCD sync policies overnight. Invest in training as part of the migration budget.
Discussion Guide
Use these questions when engaging with vendors, Red Hat/Microsoft/Swisscom field teams, or internal subject matter experts.
Terraform and Provider Maturity
-
What is the current release cadence and maintainer status of the KubeVirt Terraform provider? Is it backed by Red Hat or purely community-maintained? What is the commitment to keeping it aligned with new KubeVirt CRD versions? Why this matters: A community-maintained provider with a single maintainer is a supply-chain risk for an enterprise managing 5,000+ VMs. If the provider lags behind KubeVirt releases, the team is stuck with
kubernetes_manifestworkarounds or forking the provider. -
For Azure Local: demonstrate provisioning a VM entirely through Terraform (azurerm or azapi provider) with network, storage, and guest configuration. Which resource types are GA in the provider, and which require the azapi escape hatch? Why this matters: If critical resource types are not yet in the azurerm provider and require raw API calls via azapi, the IaC experience is significantly less mature than the vSphere Terraform provider the team currently uses.
-
Show a Terraform plan for modifying a running VM's CPU count from 4 to 8. Does the plan show an in-place update or a destroy-and-recreate? What happens to the VM during the apply? Why this matters: If CPU changes require destroy-and-recreate, Terraform-managed scaling becomes disruptive. The team needs to understand which fields are mutable in-place and which trigger recreation.
GitOps and ArgoCD
-
Demonstrate an ArgoCD-managed VirtualMachine deployment: commit a new VM manifest to Git, show ArgoCD detecting the change, syncing it to the cluster, and reporting the VM as healthy. What is the sync latency from Git merge to VM creation? Why this matters: GitOps sync latency directly affects provisioning SLAs. If ArgoCD polls every 3 minutes, a VM request takes at minimum 3 minutes before creation even starts. Webhook-triggered syncs are faster but require integration with the Git server.
-
Show what happens when someone manually changes a GitOps-managed VM via kubectl (e.g., adds extra memory). Does ArgoCD detect the drift? Does self-healing revert it? How quickly? Why this matters: Self-healing is the key governance feature of GitOps. If it does not work reliably for VirtualMachine CRDs (e.g., due to status field noise), the model loses its value.
-
How should the team handle the "delete a VM" workflow in GitOps? If a developer removes a VM YAML from the Git repo and merges, should ArgoCD automatically delete the running VM? What safeguards exist? Why this matters: Accidental VM deletion is the single biggest risk of GitOps with
prune: true. The team needs a clear operational model -- deletion protection annotations, manual prune approval, or a two-step workflow (stop first, then delete).
Ansible and Day-2 Operations
-
Show a complete workflow: provision a VM via Ansible (kubevirt.core), wait for it to boot, SSH into the guest, install packages, configure NTP and firewall rules, and verify. How long does the end-to-end workflow take? Why this matters: End-to-end provisioning time (from playbook start to VM ready for application deployment) is the metric that matters for provisioning SLAs. The team needs to benchmark this against the current VMware + Ansible workflow.
-
Demonstrate rolling OS patching across 50 VMs using Ansible with serial execution (e.g., 5 at a time), health checks between batches, and automatic rollback if a health check fails. Is this workflow achievable with the kubevirt.core collection? Why this matters: Day-2 patching at scale is the most frequent operational task. The workflow must support batch execution with health gates to avoid taking down all instances of a service simultaneously.
Enterprise Automation and Governance
-
For OVE: show how AAP (Ansible Automation Platform) integrates with KubeVirt -- credential management for kubeconfig, RBAC for playbook execution, approval workflows for production changes, and audit logging. Is AAP bundled with the OVE subscription? Why this matters: Running Ansible from a laptop is not governance-compliant. The team needs an enterprise automation platform with audit trails, RBAC, and approval workflows. If AAP is included, it simplifies the commercial model.
-
What is the recommended IaC operating model for the target platform? Should the team use Terraform, GitOps, Ansible, Crossplane, or a combination? Provide a reference architecture with clear boundaries between tools (which tool does what). Why this matters: Tool proliferation is a real risk. If the team ends up using Terraform for some VMs, ArgoCD for others, Ansible for day-2, and Crossplane for cross-platform abstraction, the operational complexity is worse than the current VMware model. The vendor should provide an opinionated reference architecture.