Implementing automatic model selection based on query complexity: using lightweight classifiers to route requests between quantized and full-precision models in Ollama
Why I wired Ollama to pick its own model I run Ollama on a fanless N100 box in the living-room closet. The CPU has AVX-VNNI but no dGPU, so every millisecond...