Tech Expert & Vibe Coder

With 15+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Mail Us [email protected]

My Address 14/291, 43H,
A Square Building, Edathala,
Kochi, Kerala, India

Home
Blog
AI & Tools

Category archive

AI & Tools

AI integrations and productivity tools

Focused topic archive 43 articles

AI & Tools Jan 8, 2026

Implementing OpenAI-compatible API gateway with LiteLLM to load-balance requests across Ollama, LM Studio, and vLLM backends

Why I Built This Gateway I run multiple local LLM backends in my homelab—Ollama for quick inference, LM Studio for testing different models, and vLLM when I...

5 min read Read article

AI & Tools Jan 8, 2026

Running Llama 3.3 70B on consumer hardware using Ollama with 4-bit quantization and CPU offloading for sub-10s response times

Why I Started Running Large Models Locally I needed a 70B parameter model running on hardware I actually own. Not cloud credits, not API calls with usage...

6 min read Read article

AI & Tools Jan 4, 2026

Setting up automated model quantization pipelines with llama.cpp to convert new Hugging Face releases for local deployment

Why I Built an Automated Quantization Pipeline I run several AI workloads on my local Proxmox cluster and Synology NAS. When a promising new model drops on...

7 min read Read article

AI & Tools Jan 4, 2026

Debugging out-of-memory crashes when running multiple GGUF models simultaneously in Ollama with shared VRAM pools

Why I Started Looking Into This I run Ollama on a Proxmox VM with GPU passthrough, using a single RTX 3060 with 12GB VRAM. My workflow involves switching...

7 min read Read article

AI & Tools Jan 4, 2026

Implementing token-based cost tracking for self-hosted LLM APIs using Prometheus and Grafana to monitor usage patterns

Why I Started Tracking Token Costs I run several self-hosted LLM instances on my Proxmox cluster—mostly for internal automation, content processing, and some...

5 min read Read article

AI & Tools Jan 4, 2026

Building a local LLM routing layer with Litellm to automatically fallback between Ollama models based on context length limits

Why I Built This I run multiple Ollama models locally on my Proxmox server—some small and fast, others larger and more capable. The problem I kept hitting...

5 min read Read article

AI & Tools Jan 4, 2026

Benchmarking Ollama vs LM Studio inference speeds across different quantization formats on consumer GPUs in 2026

Why I Started Benchmarking Local LLM Inference I run Ollama and LM Studio on my home server because I need AI models that don't send my data to external APIs....

5 min read Read article