Bicep for Managing Bleeding-Edge AI Infrastructure on Azure

11 min read
azure bicep ai-infrastructure machine-learning gpu iac 2024
Bicep for Managing Bleeding-Edge AI Infrastructure on Azure

Introduction

As organizations race to deploy cutting-edge AI applications powered by large language models, machine learning pipelines, and GPU-intensive workloads, managing the underlying Azure infrastructure has become increasingly complex. Azure’s AI ecosystem spans Azure Machine Learning, Azure OpenAI Service, AI Foundry, and specialized GPU compute resources—each requiring precise configuration and orchestration.

Enter Bicep: Microsoft’s domain-specific language for deploying Azure resources through Infrastructure as Code (IaC). While Bicep simplifies infrastructure deployment across Azure, its application to bleeding-edge AI infrastructure presents unique challenges and opportunities. This guide explores how to leverage Bicep to provision, manage, and scale AI infrastructure on Azure, from GPU-enabled compute clusters to complete AI Foundry hubs.

By the end of this article, you’ll understand how to create modular, reusable Bicep templates for AI workloads, implement best practices for managing complex dependencies, and troubleshoot common deployment issues specific to GPU and AI resources.

Prerequisites

Before diving into Bicep for AI infrastructure, ensure you have:

  • An active Azure subscription with appropriate permissions
  • Azure CLI version 2.20.0 or later installed
  • Visual Studio Code with the Bicep extension installed
  • Basic understanding of Azure Resource Manager (ARM) concepts
  • Familiarity with Azure Machine Learning or AI services
  • Understanding of GPU compute requirements for AI workloads
  • Access to GPU quota in your Azure subscription (critical for AI deployments)

Understanding Bicep for AI Infrastructure

Why Bicep for AI Workloads?

Traditional ARM templates written in JSON can become unwieldy when defining complex AI infrastructure with multiple interdependent resources. Bicep addresses this with several advantages particularly relevant to AI deployments:

Concise Syntax: Bicep reduces template complexity by up to 40% compared to equivalent JSON ARM templates, making it easier to manage the intricate relationships between AI services.

Immediate API Support: As Azure introduces new AI capabilities—like GPT-4o models, data zone provisioned deployments, or enhanced GPU SKUs—Bicep supports them immediately without waiting for tooling updates.

Modularity: Break down complex AI infrastructure into reusable modules for networking, storage, compute, and AI services that can be composed across projects.

Type Safety: Bicep’s type system catches configuration errors before deployment, critical when provisioning expensive GPU resources.

Key AI Infrastructure Components

Modern Azure AI infrastructure typically consists of:

  1. Azure Machine Learning Workspace - Central hub for ML operations
  2. Azure AI Services / OpenAI Service - Access to foundation models
  3. GPU Compute Clusters - Training and inference compute
  4. Storage Accounts - Data persistence and model storage
  5. Key Vault - Secrets and credential management
  6. Container Registry - Custom container images
  7. Application Insights - Monitoring and telemetry
  8. Virtual Networks - Network isolation and security

Core Concepts: Deploying AI Resources with Bicep

Structuring Your Bicep Project

For AI infrastructure, adopt a modular approach that separates concerns:

ai-infrastructure/
├── main.bicep                 # Orchestrator file
├── parameters/
│   ├── dev.bicepparam        # Development parameters
│   ├── staging.bicepparam    # Staging parameters
│   └── prod.bicepparam       # Production parameters
├── modules/
│   ├── networking.bicep      # VNet, subnets, NSGs
│   ├── storage.bicep         # Storage accounts
│   ├── keyvault.bicep        # Key Vault configuration
│   ├── ml-workspace.bicep    # ML workspace
│   ├── ai-services.bicep     # OpenAI/AI services
│   └── gpu-compute.bicep     # GPU compute clusters
└── README.md

Deploying an Azure Machine Learning Workspace

Let’s start with a foundational ML workspace deployment:

// modules/ml-workspace.bicep
@description('Name of the Azure Machine Learning workspace')
param workspaceName string

@description('Location for all resources')
param location string = resourceGroup().location

@description('Storage account resource ID')
param storageAccountId string

@description('Key Vault resource ID')
param keyVaultId string

@description('Application Insights resource ID')
param appInsightsId string

@description('Container Registry resource ID (optional)')
param containerRegistryId string = ''

resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-04-01' = {
  name: workspaceName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    friendlyName: workspaceName
    storageAccount: storageAccountId
    keyVault: keyVaultId
    applicationInsights: appInsightsId
    containerRegistry: containerRegistryId != '' ? containerRegistryId : null
    publicNetworkAccess: 'Disabled'  // Best practice for production
    hbiWorkspace: false
    v1LegacyMode: false
  }
}

output workspaceId string = mlWorkspace.id
output workspaceName string = mlWorkspace.name

Provisioning GPU Compute Clusters

GPU compute is the backbone of AI training and inference. Here’s how to provision a GPU cluster:

// modules/gpu-compute.bicep
@description('Name of the compute cluster')
param computeName string

@description('Azure ML workspace name')
param workspaceName string

@description('VM size for compute nodes')
@allowed([
  'Standard_NC6s_v3'
  'Standard_NC12s_v3'
  'Standard_NC24s_v3'
  'Standard_ND40rs_v2'
  'Standard_NC24ads_A100_v4'
  'Standard_NC48ads_A100_v4'
  'Standard_NC96ads_A100_v4'
])
param vmSize string = 'Standard_NC6s_v3'

@description('Minimum number of nodes')
@minValue(0)
param minNodeCount int = 0

@description('Maximum number of nodes')
@minValue(1)
param maxNodeCount int = 4

@description('Idle time before scale down (in seconds)')
param idleTimeBeforeScaleDown int = 1800

resource workspace 'Microsoft.MachineLearningServices/workspaces@2024-04-01' existing = {
  name: workspaceName
}

resource gpuCompute 'Microsoft.MachineLearningServices/workspaces/computes@2024-04-01' = {
  parent: workspace
  name: computeName
  location: workspace.location
  properties: {
    computeType: 'AmlCompute'
    properties: {
      vmSize: vmSize
      vmPriority: 'Dedicated'  // Use 'LowPriority' for cost savings
      scaleSettings: {
        minNodeCount: minNodeCount
        maxNodeCount: maxNodeCount
        nodeIdleTimeBeforeScaleDown: 'PT${idleTimeBeforeScaleDown}S'
      }
      enableNodePublicIp: false  // Security best practice
      isolatedNetwork: false
      osType: 'Linux'
    }
  }
}

output computeId string = gpuCompute.id
output computeName string = gpuCompute.name

Deploying Azure OpenAI Service

For generative AI workloads, deploy Azure OpenAI Service:

// modules/ai-services.bicep
@description('Name of the Azure OpenAI account')
param openAIName string

@description('Location for Azure OpenAI')
@allowed([
  'eastus'
  'eastus2'
  'southcentralus'
  'swedencentral'
  'westus'
  'westus3'
])
param location string = 'eastus'

@description('SKU name')
@allowed([
  'S0'
])
param skuName string = 'S0'

@description('Deploy GPT-4o model')
param deployGpt4o bool = true

resource openAIAccount 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
  name: openAIName
  location: location
  kind: 'OpenAI'
  sku: {
    name: skuName
  }
  properties: {
    customSubDomainName: openAIName
    publicNetworkAccess: 'Enabled'
    networkAcls: {
      defaultAction: 'Allow'
    }
  }
}

// Deploy GPT-4o model if requested
resource gpt4oDeployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01' = if (deployGpt4o) {
  parent: openAIAccount
  name: 'gpt-4o'
  sku: {
    name: 'Standard'
    capacity: 10
  }
  properties: {
    model: {
      format: 'OpenAI'
      name: 'gpt-4o'
      version: '2024-08-06'
    }
  }
}

output openAIId string = openAIAccount.id
output openAIEndpoint string = openAIAccount.properties.endpoint

Practical Implementation: Complete AI Infrastructure

Orchestrating the Deployment

Create a main orchestrator that composes all modules:

// main.bicep
targetScope = 'resourceGroup'

@description('Prefix for resource naming')
@minLength(2)
@maxLength(10)
param prefix string

@description('Environment name')
@allowed([
  'dev'
  'staging'
  'prod'
])
param environment string = 'dev'

@description('Location for all resources')
param location string = resourceGroup().location

// Variables
var uniqueSuffix = uniqueString(resourceGroup().id)
var storageAccountName = '${prefix}st${uniqueSuffix}'
var keyVaultName = '${prefix}-kv-${uniqueSuffix}'
var mlWorkspaceName = '${prefix}-mlw-${environment}'
var openAIName = '${prefix}-oai-${environment}'
var computeName = 'gpu-cluster'

// Deploy storage account
module storage 'modules/storage.bicep' = {
  name: 'storage-deployment'
  params: {
    storageAccountName: storageAccountName
    location: location
  }
}

// Deploy Key Vault
module keyVault 'modules/keyvault.bicep' = {
  name: 'keyvault-deployment'
  params: {
    keyVaultName: keyVaultName
    location: location
  }
}

// Deploy Application Insights
resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: '${prefix}-ai-${uniqueSuffix}'
  location: location
  kind: 'web'
  properties: {
    Application_Type: 'web'
  }
}

// Deploy ML Workspace
module mlWorkspace 'modules/ml-workspace.bicep' = {
  name: 'mlworkspace-deployment'
  params: {
    workspaceName: mlWorkspaceName
    location: location
    storageAccountId: storage.outputs.storageAccountId
    keyVaultId: keyVault.outputs.keyVaultId
    appInsightsId: appInsights.id
  }
}

// Deploy GPU Compute
module gpuCompute 'modules/gpu-compute.bicep' = {
  name: 'gpu-deployment'
  params: {
    computeName: computeName
    workspaceName: mlWorkspace.outputs.workspaceName
    vmSize: 'Standard_NC6s_v3'
    minNodeCount: 0
    maxNodeCount: 4
  }
  dependsOn: [
    mlWorkspace
  ]
}

// Deploy Azure OpenAI
module openAI 'modules/ai-services.bicep' = {
  name: 'openai-deployment'
  params: {
    openAIName: openAIName
    location: 'eastus'
    deployGpt4o: true
  }
}

output workspaceId string = mlWorkspace.outputs.workspaceId
output openAIEndpoint string = openAI.outputs.openAIEndpoint

Deployment Commands

Deploy your AI infrastructure with environment-specific parameters:

# Login to Azure
az login

# Set subscription
az account set --subscription "your-subscription-id"

# Create resource group
az group create \
  --name rg-ai-infrastructure-dev \
  --location eastus

# Deploy with inline parameters
az deployment group create \
  --resource-group rg-ai-infrastructure-dev \
  --template-file main.bicep \
  --parameters prefix=aiml environment=dev

# Or deploy with parameter file
az deployment group create \
  --resource-group rg-ai-infrastructure-dev \
  --template-file main.bicep \
  --parameters @parameters/dev.bicepparam

Advanced Topics

Implementing Azure AI Foundry with Bicep

Azure AI Foundry provides a unified platform for AI development. Here’s how to deploy it:

// modules/ai-foundry-hub.bicep
@description('Name of the AI Hub')
param hubName string

@description('Location')
param location string = resourceGroup().location

@description('Storage account ID')
param storageAccountId string

@description('Key Vault ID')
param keyVaultId string

resource aiHub 'Microsoft.MachineLearningServices/workspaces@2024-04-01' = {
  name: hubName
  location: location
  kind: 'Hub'  // Critical: sets workspace as Hub
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    friendlyName: hubName
    storageAccount: storageAccountId
    keyVault: keyVaultId
    publicNetworkAccess: 'Enabled'
    v1LegacyMode: false
  }
}

// Create AI Services connection
resource aiServicesConnection 'Microsoft.MachineLearningServices/workspaces/connections@2024-04-01' = {
  parent: aiHub
  name: 'aiservices-connection'
  properties: {
    category: 'AIServices'
    target: 'https://your-ai-services.cognitiveservices.azure.com'
    authType: 'ApiKey'
  }
}

output hubId string = aiHub.id

Managing Data Zone Provisioned Deployments

For high-throughput AI workloads, use data zone provisioned deployments:

resource dataZoneDeployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01' = {
  parent: openAIAccount
  name: 'gpt-4o-provisioned'
  sku: {
    name: 'ProvisionedManaged'
    capacity: 100  // Provisioned throughput units (PTUs)
  }
  properties: {
    model: {
      format: 'OpenAI'
      name: 'gpt-4o'
      version: '2024-08-06'
    }
    versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
  }
}

Infrastructure Workflow Visualization

Yes

No

Yes

No

Start Deployment

Create Resource Group

Deploy Networking Module

Deploy Storage & Key Vault

Deploy Application Insights

Deploy ML Workspace

Deploy GPU Compute?

Provision GPU Cluster

Skip GPU

Deploy Azure OpenAI

Deploy AI Foundry?

Create AI Hub & Projects

Complete

Validate Deployments

End

Common Pitfalls and Troubleshooting

GPU Quota Limitations

Problem: Deployment fails with quota errors when provisioning GPU VMs.

Cause: Azure subscriptions have default quota limits for GPU resources that vary by region and SKU.

Solution:

  1. Check current quota usage:
az vm list-usage --location eastus --output table | grep "NC\|ND"
  1. Request quota increase via Azure Portal:

    • Navigate to “Help + support” → “New support request”
    • Issue type: “Service and subscription limits (quotas)”
    • Quota type: “Machine Learning service”
    • Specify required GPU SKU and region
  2. Use lower-tier GPU SKUs for development:

param vmSize string = 'Standard_NC6s_v3'  // 1 V100 GPU

Region Availability for AI Services

Problem: Deployment fails because Azure OpenAI or specific GPU SKUs aren’t available in the target region.

Solution: Always verify service availability before deployment:

@description('Location for Azure OpenAI - restricted regions')
@allowed([
  'eastus'
  'eastus2'
  'southcentralus'
  'swedencentral'
  'westus'
  'westus3'
])
param openAILocation string = 'eastus'

Implicit vs Explicit Dependencies

Problem: Resources deploy in wrong order, causing failures.

Best Practice: Use symbolic references for implicit dependencies:

// GOOD: Implicit dependency via symbolic reference
resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-04-01' = {
  properties: {
    storageAccount: storageAccount.id  // Bicep handles ordering
  }
}

// AVOID: Explicit dependencies unless absolutely necessary
resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-04-01' = {
  dependsOn: [
    storageAccount
  ]
}

Container Registry Authentication

Problem: ML workspace cannot access custom container images.

Solution: Ensure proper RBAC assignments:

// Grant ML workspace access to ACR
resource acrPullRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(containerRegistry.id, mlWorkspace.id, 'acrpull')
  scope: containerRegistry
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '7f951dda-4ed3-4680-a7ca-43fe172d538d')  // AcrPull role
    principalId: mlWorkspace.identity.principalId
    principalType: 'ServicePrincipal'
  }
}

Network Isolation Issues

Problem: Private endpoints and network isolation block resource access.

Troubleshooting Steps:

  1. Verify private endpoint configuration:
resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-05-01' = {
  name: '${mlWorkspace.name}-pe'
  location: location
  properties: {
    subnet: {
      id: subnetId
    }
    privateLinkServiceConnections: [
      {
        name: '${mlWorkspace.name}-plsc'
        properties: {
          privateLinkServiceId: mlWorkspace.id
          groupIds: [
            'amlworkspace'
          ]
        }
      }
    ]
  }
}
  1. Configure DNS zones for private endpoints
  2. Ensure compute clusters can access workspace via private network

Model Deployment Failures

Problem: GPU inference endpoints fail to start.

Common causes and solutions:

  1. Insufficient GPU quota: Check quota as described above
  2. Incorrect VM SKU: Verify GPU SKU supports your framework:
// For NVIDIA CUDA workloads
param inferenceVMSize string = 'Standard_NC6s_v3'

// For NVIDIA A100 workloads
param inferenceVMSize string = 'Standard_NC24ads_A100_v4'
  1. Driver compatibility: Ensure container images include correct NVIDIA drivers
  2. Resource constraints: Increase instance count or VM size:
resource inferenceEndpoint 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints@2024-04-01' = {
  properties: {
    compute: {
      instanceType: 'Standard_NC12s_v3'  // Increased from NC6s_v3
      instanceCount: 3
    }
  }
}

Conclusion

Bicep provides a powerful, maintainable approach to managing complex AI infrastructure on Azure. By embracing modularity, leveraging type safety, and following best practices for GPU resource management, you can create reproducible, scalable AI environments that adapt to rapidly evolving requirements.

Key takeaways:

  • Structure Bicep projects with clear module separation for AI components
  • Always verify GPU quota and regional availability before deployment
  • Use symbolic references for implicit dependencies
  • Implement proper RBAC for service-to-service authentication
  • Test deployments in development environments before production rollout

As Azure’s AI capabilities continue to expand with new models, deployment types, and infrastructure options, Bicep’s immediate API support ensures your infrastructure code remains current and capable of leveraging the latest innovations.

Next Steps

  1. Explore Azure Verified Modules for production-ready Bicep modules
  2. Implement CI/CD pipelines with Azure DevOps or GitHub Actions
  3. Add deployment validation with az deployment group what-if
  4. Experiment with Azure AI Foundry multi-agent workflows
  5. Monitor costs with Azure Cost Management tags in Bicep templates