Agent skill
Terraform IaC Expert
Infrastructure as Code for AI workloads using Terraform across AWS, Azure, GCP, and OCI
Install this agent skill to your Project
npx add-skill https://github.com/frankxai/ai-architect/tree/main/skills/terraform-iac
SKILL.md
Terraform IaC Expert
Expert in Terraform and Infrastructure as Code for deploying AI infrastructure across multi-cloud environments.
Project Structure
infrastructure/
├── modules/
│ ├── aws-bedrock/
│ ├── azure-openai/
│ ├── oci-genai/
│ └── vector-store/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── prod/
├── shared/
│ ├── networking/
│ └── security/
└── scripts/
Module Overview
| Module | Provider | Purpose |
|---|---|---|
aws-bedrock |
AWS | Bedrock, Knowledge Bases, VPC Endpoints |
azure-openai |
Azure | Azure OpenAI, AI Search, Private Endpoints |
oci-genai |
OCI | DACs, Endpoints, Agents, Knowledge Bases |
vector-store |
Multi | OpenSearch Serverless, Qdrant, Milvus |
Full module code: resources/modules.tf
AWS AI Infrastructure
Bedrock Module
module "aws_ai" {
source = "./modules/aws-bedrock"
prefix = "prod"
region = "us-east-1"
vpc_id = data.aws_vpc.main.id
private_subnet_ids = data.aws_subnets.private.ids
enable_private_endpoint = true
create_knowledge_base = true
}
Key Resources
- IAM roles for Bedrock access
- VPC endpoints for private connectivity
- Knowledge bases with OpenSearch
Azure AI Infrastructure
Azure OpenAI Module
module "azure_ai" {
source = "./modules/azure-openai"
openai_name = "prod-openai"
location = "eastus"
resource_group_name = azurerm_resource_group.ai.name
gpt4o_capacity = 100 # TPM
embedding_capacity = 50
enable_private_endpoint = true
}
Key Resources
- Cognitive Account (OpenAI kind)
- Model deployments (GPT-4o, embeddings)
- Private endpoints
- Azure AI Search
OCI AI Infrastructure
GenAI Module
module "oci_ai" {
source = "./modules/oci-genai"
prefix = "prod"
compartment_id = var.oci_compartment_id
cluster_type = "HOSTING"
unit_count = 10
unit_shape = "LARGE_COHERE"
create_agent = true
create_knowledge_base = true
}
Key Resources
- Dedicated AI Clusters (DAC)
- Model endpoints
- GenAI Agents
- Knowledge bases
Vector Store Infrastructure
OpenSearch Serverless (AWS)
module "vectors" {
source = "./modules/vector-store/aws-opensearch"
prefix = "prod"
vpc_endpoint_ids = [aws_vpc_endpoint.opensearch.id]
allowed_principals = [aws_iam_role.bedrock.arn]
}
Multi-Cloud Environment
# environments/prod/main.tf
terraform {
backend "s3" {
bucket = "terraform-state-ai-infra"
key = "prod/terraform.tfstate"
encrypt = true
}
}
provider "aws" { region = var.aws_region }
provider "azurerm" { features {} }
provider "oci" { ... }
# Deploy to all clouds
module "aws_ai" { source = "../../modules/aws-bedrock" ... }
module "azure_ai" { source = "../../modules/azure-openai" ... }
module "oci_ai" { source = "../../modules/oci-genai" ... }
Best Practices
State Management
- Remote state (S3, Azure Blob, OCI Object Storage)
- State locking (DynamoDB, Cosmos DB)
- Encrypt state at rest
- Separate state per environment
Variable Validation
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
Sensitive Data
- Use
sensitive = truefor outputs - Reference secrets from secret managers
- Never commit
.tfvarswith secrets
Tagging
default_tags {
tags = {
Environment = var.environment
Project = "ai-platform"
ManagedBy = "terraform"
}
}
CI/CD Integration
GitHub Actions
- uses: hashicorp/setup-terraform@v3
- run: terraform init
- run: terraform plan -out=tfplan
- run: terraform apply -auto-approve tfplan
if: github.ref == 'refs/heads/main'
Key Patterns
- Plan on PR, apply on merge
- Use workspaces or directories for environments
- Lock state during apply
- Store plan artifacts
Managed Kubernetes GPU
EKS GPU Nodes
eks_managed_node_groups = {
gpu = {
instance_types = ["g5.2xlarge"]
ami_type = "AL2_x86_64_GPU"
taints = [{ key = "nvidia.com/gpu" ... }]
}
}
AKS GPU Nodes
resource "azurerm_kubernetes_cluster_node_pool" "gpu" {
vm_size = "Standard_NC24ads_A100_v4"
node_taints = ["nvidia.com/gpu=true:NoSchedule"]
}
OKE GPU Nodes
resource "oci_containerengine_node_pool" "gpu" {
node_shape = "BM.GPU.A100-v2.8"
}
Full examples: resources/modules.tf
Resources
Infrastructure as Code for enterprise AI deployments.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
GenAI DAC Specialist
Expert in OCI Generative AI Dedicated AI Clusters - deployment, fine-tuning, optimization, and production operations
Oracle Agent Spec Expert
Design framework-agnostic AI agents using Oracle's Open Agent Specification for portable, interoperable agentic systems with JSON/YAML definitions
AI Security Expert
Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection
OCI Services Expert
Expert guidance on Oracle Cloud Infrastructure services, cloud architecture patterns, cost optimization, deployment strategies, and OCI best practices for enterprise solutions
agentic-orchestration
Patterns for multi-agent coordination, task decomposition, handoffs, and workflow orchestration. Best practices for building and managing agent systems.
nvidia-nim
NVIDIA NIM inference microservices for deploying AI models with OpenAI-compatible APIs, self-hosted or cloud
Didn't find tool you were looking for?