building multi-model ai pipelines with litellm proxy: load balancing requests between local ollama and cloud apis with automatic fallback
Why I Built This I run AI workloads on my home lab—mostly local models through Ollama for privacy-sensitive tasks and cost control. But local inference has...