Unified Memory on AMD Strix Halo SoC
The Ryzen AI MAX+ 395 "Strix Halo" in Framework Desktop comes with 128GB of unified memory, but the GTT (Graphics Translation Tables) allocation defaults to only 64GB, which limits the ability to run large language models like gpt-oss-120b. This can be increased to 128GB through GRUB configuration changes.
The Problem
When trying to run large language models on the Framework Desktop with Ryzen AI MAX+ 395 "Strix Halo", you'll hit a 64GB GTT limit even though the chip has 128GB of unified memory. The default GTT allocation of 64GB isn't enough for models like gpt-oss-120b. The good news is that since this is an APU with unified memory architecture, we can bump up the GTT allocation to let the GPU access more system memory.
Increasing the Memory Limit
We can increase the GTT size to 128GB and disable AMD IOMMU (to reduce memory access latency) by tweaking a few GRUB parameters.
Step 1: Modify GRUB Configuration
Edit /etc/default/grub and add the following parameters to GRUB_CMDLINE_LINUX:
GRUB_CMDLINE_LINUX="... amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432"
The parameters explained:
amd_iommu=off- Disables AMD IOMMU to reduce random memory access latencyamdgpu.gttsize=131072- Sets GTT size to 128GB (131072 MB)ttm.pages_limit=33554432- Sets the TTM (Translation Table Maps) page limit to match the GTT size
Step 2: Reconfigure GRUB
Apply the changes by regenerating the GRUB configuration:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Step 3: Reboot
Restart the machine to apply the changes:
sudo systemctl reboot
Verification
After rebooting, you can verify the parameters were applied correctly:
sudo cat /sys/module/amdgpu/parameters/gttsize
sudo cat /sys/module/ttm/parameters/pages_limit
The first command should return 131072 (128GB) and the second should return 33554432.
To confirm IOMMU is disabled, check that the /sys/class/iommu/ directory exists but is empty:
ls /sys/class/iommu/
If IOMMU is properly disabled, this should return no output.
Use Cases
This configuration is particularly useful for:
- Running large language models (120B+ parameters) locally
- GPU-intensive workloads requiring more VRAM than the default allocation
- Development and testing of LLM applications without cloud infrastructure
Considerations
Disabling IOMMU may have security implications for virtualization workloads. If you're running VMs or containers that rely on IOMMU for device passthrough, consider whether the trade-off is acceptable for your use case.