Unified Memory on AMD Strix Halo SoC

The Ryzen AI MAX+ 395 "Strix Halo" in Framework Desktop comes with 128GB of unified memory, but the GTT (Graphics Translation Tables) allocation defaults to only 64GB, which limits the ability to run large language models like gpt-oss-120b. This can be increased to 128GB through GRUB configuration changes.

The Problem

When trying to run large language models on the Framework Desktop with Ryzen AI MAX+ 395 "Strix Halo", you'll hit a 64GB GTT limit even though the chip has 128GB of unified memory. The default GTT allocation of 64GB isn't enough for models like gpt-oss-120b. The good news is that since this is an APU with unified memory architecture, we can bump up the GTT allocation to let the GPU access more system memory.

Increasing the Memory Limit

We can increase the GTT size to 128GB and disable AMD IOMMU (to reduce memory access latency) by tweaking a few GRUB parameters.

Step 1: Modify GRUB Configuration

Edit /etc/default/grub and add the following parameters to GRUB_CMDLINE_LINUX:

GRUB_CMDLINE_LINUX="... amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432"

The parameters explained:

amd_iommu=off - Disables AMD IOMMU to reduce random memory access latency
amdgpu.gttsize=131072 - Sets GTT size to 128GB (131072 MB)
ttm.pages_limit=33554432 - Sets the TTM (Translation Table Maps) page limit to match the GTT size

Step 2: Reconfigure GRUB

Apply the changes by regenerating the GRUB configuration:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Step 3: Reboot

Restart the machine to apply the changes:

sudo systemctl reboot

Verification

After rebooting, you can verify the parameters were applied correctly:

sudo cat /sys/module/amdgpu/parameters/gttsize
sudo cat /sys/module/ttm/parameters/pages_limit

The first command should return 131072 (128GB) and the second should return 33554432.

To confirm IOMMU is disabled, check that the /sys/class/iommu/ directory exists but is empty:

ls /sys/class/iommu/

If IOMMU is properly disabled, this should return no output.

Use Cases

This configuration is particularly useful for:

Running large language models (120B+ parameters) locally
GPU-intensive workloads requiring more VRAM than the default allocation
Development and testing of LLM applications without cloud infrastructure

Considerations

Disabling IOMMU may have security implications for virtualization workloads. If you're running VMs or containers that rely on IOMMU for device passthrough, consider whether the trade-off is acceptable for your use case.

References

HP Zbook Ultra G1A pp512/tg128 scores for unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF 128gb unified RAM