Tech Expert & Vibe Coder

With 14+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Configuring Proxmox PCIe passthrough for Intel Arc GPUs to run multiple isolated Ollama LXC containers with SR-IOV

Why I Worked on This

I needed to run multiple isolated Ollama instances on my Proxmox server, each with dedicated GPU access. My Intel Arc A380 was sitting idle most of the time, and I wanted to split its resources across several LXC containers without the overhead of full VMs. The goal was simple: efficient GPU sharing for local LLM inference without containers stepping on each other.

Intel Arc GPUs support SR-IOV (Single Root I/O Virtualization), which theoretically allows splitting one physical GPU into multiple virtual functions. This seemed like the perfect fit for my use case—lighter than VM passthrough, more isolated than sharing a single GPU across containers.

My Real Setup

Hardware:

  • Intel Arc A380 (4GB GDDR6)
  • Proxmox VE 8.1 on an Intel i5-12400 system
  • Motherboard with VT-d and SR-IOV support enabled in BIOS

Software stack:

  • Proxmox host running kernel 6.5
  • Multiple unprivileged LXC containers (Debian 12 base)
  • Ollama installed in each container
  • Intel compute runtime and drivers for Arc support

What I Actually Did

Enabling IOMMU and SR-IOV

First, I verified IOMMU was working on the Proxmox host:

dmesg | grep -e DMAR -e IOMMU

I saw the expected "DMAR: IOMMU enabled" line. Then I checked that my Arc GPU was in its own IOMMU group:

pvesh get /nodes/proxmox/hardware/pci --pci-class-blacklist ""

The Arc A380 showed up isolated in IOMMU group 14. Good start.

Next, I enabled SR-IOV for the GPU. Intel Arc cards expose SR-IOV through the i915 driver, but it's not enabled by default. I added this to my kernel command line in /etc/default/grub:

intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7

The enable_guc=3 enables both GuC submission and HuC firmware loading, which Arc GPUs need. The max_vfs=7 tells the driver to create up to 7 virtual functions from the physical GPU.

After updating grub and rebooting:

update-grub
reboot

I verified the virtual functions appeared:

lspci | grep VGA

I saw the physical function (0000:03:00.0) plus seven new virtual functions (0000:03:00.1 through 0000:03:00.7). Each VF showed up as a separate PCI device.

Blacklisting the Host Driver

The Proxmox host was trying to claim all the VFs with the i915 driver. For passthrough to work, I needed to prevent this and bind them to vfio-pci instead.

I created /etc/modprobe.d/vfio.conf:

options vfio-pci ids=8086:56a5
softdep i915 pre: vfio-pci

The device ID 8086:56a5 is specific to the Arc A380 VFs. I found this by running lspci -nn and looking at the VF entries.

Then I updated the initramfs and rebooted:

update-initramfs -u -k all
reboot

After reboot, I confirmed vfio-pci owned the VFs:

lspci -k -s 03:00.1

The output showed "Kernel driver in use: vfio-pci" instead of i915.

Creating LXC Containers with GPU Access

I created unprivileged LXC containers through the Proxmox web UI, using Debian 12 templates. The trick with LXC is that you can't directly passthrough PCI devices like you can with VMs, but you can map device nodes.

For each container, I needed to:

  1. Pass the render node from the host
  2. Set up proper permissions
  3. Install Intel compute runtime

I edited each container's config file (e.g., /etc/pve/lxc/100.conf) and added:

lxc.cgroup2.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file

The c 226:* allows access to DRM devices. The mount entry maps the specific render node into the container. Each VF creates its own render node (renderD128, renderD129, etc.), so I mapped different nodes to different containers.

I also had to make the containers privileged enough to access the GPU without being fully privileged:

lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 65491

This maps the host's video group (GID 44) into the container so the render group membership works correctly.

Installing Intel Drivers in Containers

Inside each container, I installed the Intel compute runtime:

apt update
apt install -y intel-opencl-icd intel-level-zero-gpu level-zero

Then I verified the GPU was visible:

ls -la /dev/dri/
clinfo

The render node showed up, and clinfo listed the Arc GPU as an available OpenCL device.

Running Ollama

I installed Ollama in each container using their official script:

curl -fsSL https://ollama.com/install.sh | sh

By default, Ollama detects available compute devices. I verified it found the GPU:

ollama run llama2

During inference, I checked GPU usage on the host:

intel_gpu_top

Each container's workload showed up as separate activity on different VFs. The GPU was being shared, but each container had its own isolated slice.

What Didn't Work

Initial VF Creation Failures

The first time I tried enabling SR-IOV, no virtual functions appeared. I had forgotten to enable VT-d in the BIOS. After enabling it and adding intel_iommu=on to the kernel parameters, the VFs showed up.

Permission Issues in Containers

My first attempt at LXC GPU access failed because I didn't properly map the video group. The containers could see /dev/dri/renderD128 but got permission denied errors when Ollama tried to use it. The idmap configuration fixed this, but it took me a while to realize the GID mapping was the issue.

Driver Version Mismatches

I initially tried using older Intel compute runtime packages from Debian stable. They didn't fully support the Arc A380, and Ollama would fall back to CPU inference. I had to install newer packages from Intel's own repositories to get proper Arc support.

Memory Limitations Per VF

Each VF gets a fraction of the GPU's total VRAM. With 4GB on the A380 and 7 VFs, that's roughly 512MB per VF (accounting for overhead). This is fine for smaller models like Llama 2 7B quantized, but larger models won't fit. I ended up using fewer VFs (3-4) to give each one more memory headroom.

Performance Overhead

SR-IOV isn't free. There's some performance loss compared to direct GPU access. In my testing, inference speed was about 15-20% slower per VF compared to using the full GPU in a single VM. For my use case (multiple concurrent but not performance-critical inference tasks), this trade-off was acceptable.

Key Takeaways

SR-IOV works but has limits. Intel Arc GPUs do support SR-IOV, and it does work with Proxmox LXC containers. But you're splitting a relatively small GPU (4GB VRAM) into even smaller slices. This only makes sense for workloads that don't need the full GPU.

LXC GPU passthrough is hacky. Unlike VMs, LXC containers don't have clean PCI passthrough. You're mapping device nodes and fiddling with permissions. It works, but it feels fragile compared to VM passthrough.

Driver support matters more than hardware support. The Arc A380 technically supports SR-IOV, but you need recent kernel versions (6.5+) and up-to-date Intel drivers. Older software stacks won't expose the VFs properly.

IOMMU grouping is critical. If your GPU shares an IOMMU group with other devices, SR-IOV won't work cleanly. I got lucky with my motherboard having good IOMMU separation. If you're planning this setup, verify IOMMU groups before buying hardware.

Not all models fit. With limited VRAM per VF, you're restricted to smaller quantized models. Llama 2 7B works fine, but anything larger requires either fewer VFs or a different approach.

This isn't production-ready. I'm running this setup for personal experimentation. The performance overhead, memory constraints, and general hackiness of LXC GPU access mean I wouldn't deploy this for anything critical. For serious multi-tenant GPU workloads, proper VM passthrough or a dedicated GPU sharing solution would be better.

What I'd Do Differently

If I were starting over, I'd probably use fewer VFs (maybe 3) to give each one more VRAM. I'd also consider using VMs instead of LXC containers for cleaner device isolation, even though the overhead is higher.

For production use, I'd look at Intel's Data Center GPU Flex series instead of Arc. They're designed for this kind of workload and have better SR-IOV support with more VRAM per VF.

But for a home lab setup running multiple isolated Ollama instances? This works. It's not elegant, but it gets the job done.