Debugging out-of-memory crashes when running multiple GGUF models simultaneously in Ollama with shared VRAM pools
Why I Started Looking Into This I run Ollama on a Proxmox VM with GPU passthrough, using a single RTX 3060 with 12GB VRAM. My workflow involves switching...