Skip to content

Effortless VM Management Using Tailscale and Terraform.

Introduction

I have a Hetzner bare metal server in Finland that I use for a few different things. One is for running workstations for different client works. Locally I have a cursor/VSCode client connecting to the dev environment with the help of the VSCode remote development plugin and Tailscale as a way of private access.

Historically I've been using a mixture of multipass and LXD VMs, but it's mostly done in an adhoc and manual manner. Recently to keep things in good order I've started looking at ways to automate the process of setting up new VMs and also to document the process.

Must haves

  • VSCode/Cursor - I gets very comfortable (perhaps too comfortable) with them for local development, and I don't want to move away to other fancy tools.
  • Private access to the VMs from the client - Tailscale fits the bill here.
  • Managing the VMs in an infrastructure as code manner - I decided to use Terraform as I manage majority of my bare-metal infrastructure using Terraform.
  • Close to the metal IOPS inside the VM - Mount LVM to the LXD host instead of using qcow2 format should provide the raw disk IOPS that I need.

Nice to haves

  • Install all the essential tools on VM start up - e.g. git, golang, docker, kubectl, etc. This can be managed via cloud-init. It's not a fully-fledged configuration management tool, but it's good enough for basic tasks, besides the world has pretty much moved on from configuration management :/
  • Ideally the VM auto joins the tailscale network without manual bootstraping. This can be achieved using Tailscale pre-auth keys.

The setup

Luckily Terraform has a LXD provider which makes managing LXD VMs much easier. The only downside is that the current provider does not support resource importing, but not a blocker to my use case.

To remotely manage the LXD daemon you will need a remote password for it, you can set it up via

lxc config trust add

For my use case once I get the token I stored it to the Google Secret Manager so that it can be fetched and used by the Terraform.

To allow Terraform to manage Tailscale's VM auto-join I also created a dedicated tailscale API key and stored it in the Google Secret Manager as well.

Here is the end to end Terraform config for it.

terraform {
  backend "gcs" {
    bucket = "YOUR_STATE_BUCKET_NAME"
    prefix = "YOUR_STATE_BUCKET_PREFIX"
  }

  required_providers {
    lxd = {
      source  = "terraform-lxd/lxd"
      version = "~>2.3.0"
    }
    tailscale = {
      source  = "tailscale/tailscale"
      version = "0.16.2"
    }
  }
}

locals {
  project_id         = "YOUR_PROJECT_ID"
  region             = "YOUR_REGION"
  lxd_sa_token_name  = "YOUR_LXD_SA_TOKEN_NAME"
  lxd_remote_name    = "YOUR_LXD_REMOTE_NAME"
  lxd_server_address = "https://YOUR_LXD_SERVER_ADDRESS:8443"
  lxd_storage_pool   = "YOUR_LXD_STORAGE_POOL"
}

provider "google" {
  project = local.project_id
  region  = local.region
}

data "google_secret_manager_secret_version" "tailscale_api_key" {
  project = local.project_id
  secret  = "tailscale-terraform-sa"
  version = "latest"
}


data "google_secret_manager_secret_version" "prod" {
  project = local.project_id
  secret  = local.lxd_sa_token_name
  version = "latest"
}

provider "tailscale" {
  api_key = data.google_secret_manager_secret_version.tailscale_api_key.secret_data
}

provider "lxd" {
  generate_client_certificates = true
  accept_remote_certificate    = true

  remote {
    name     = local.lxd_remote_name
    address  = local.lxd_server_address
    password = data.google_secret_manager_secret_version.prod.secret_data
    # default  = true
  }
}

resource "tailscale_tailnet_key" "key" {
  reusable      = true
  description   = "Key for auto-register vm to the tailnet"
  ephemeral     = false
  preauthorized = true
  expiry        = 3600 * 24 * 90
}

locals {
  vm_spec = {
    "THE-VM-NAME" = {
      "cpu"    = 4
      "image"  = "ubuntu:24.04"
      "memory" = "12GB"
      "disk"   = "80GB"
    }
  }

  pool_name         = local.lxd_storage_pool
  cloud_init_config = <<-EOF
#cloud-config

# Update and upgrade packages
package_update: true
package_upgrade: true

# Install required packages
packages:
  - curl
  - unzip
  - gnupg
  - software-properties-common
  - wget

# Add Tailscale repository and install
runcmd:
  # install tailscale
  - curl -fsSL https://tailscale.com/install.sh | sh
  - tailscale up --ssh --auth-key ${tailscale_tailnet_key.key.key}
  # install azure cli
  - curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
  # install aws cli
  - curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
  - unzip awscliv2.zip
  - ./aws/install
  # install kubectl
  - curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
  - install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
  - rm -rf awscliv2.zip kubectl
  # install terraform
  - wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null
  - gpg --no-default-keyring --keyring /usr/share/keyrings/hashicorp-archive-keyring.gpg --fingerprint
  - echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | tee /etc/apt/sources.list.d/hashicorp.list
  - apt-get update && sudo apt-get install terraform

# Create user and add to sudo group
users:
  - name: jingkaihe
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: sudo
    shell: /bin/bash
    ssh_authorized_keys:
    - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHmJqiWIDlnAUasIYRgIiNJ1vKpYYEBFtpQ2m+I+RULI jingkai@hey.com

# Set up sudo access without password for jingkaihe
write_files:
  - path: /etc/sudoers.d/jingkaihe
    content: |
      jingkaihe ALL=(ALL) NOPASSWD:ALL
    permissions: '0440'

# Ensure SSH access (optional, remove if not needed)
ssh_pwauth: true
EOF
}

resource "lxd_instance" "vm" {
  for_each = local.vm_spec

  remote = local.lxd_remote_name
  name   = each.key
  image  = each.value.image
  type   = "virtual-machine"

  config = {
    "user.user-data" = local.cloud_init_config
  }

  limits = {
    cpu    = each.value.cpu
    memory = each.value.memory
  }
  device {
    name = "root"
    type = "disk"
    properties = {
      "size" = each.value.disk
      "path" = "/"
      "pool" = local.pool_name
    }
  }
}

In the Terraform code snippet above I created a VM

  • Comes with with 4 CPUs, 12GB RAM and 80GB disk.
  • Install all the essential tools on VM start up.
  • Create a user jingkaihe and add to sudo group, so that I can manage the VM with the non-root user without typing password all the time.
  • Auto join the tailscale network.
  • Install Azure CLI, AWS CLI, kubectl and Terraform.

The LXD VM itself should be up-and-running very quickly, but it takes quite some times to install all the packages. Once it's all done, you can SSH into the VM with

ssh jingkaihe@${THE-VM-NAME}

In my case the storage pool I use is a LVM pool, thus I pretty much get raw disk grade IOPS on the VM.

Another thing worth mentioning is that I connect to the LXD daemon endpoint via Tailscale, so it provides an extra layer of security. In fact on the server only 443/tcp is opened to the internet, anything else are blocked by the hardware firewall.