Tech Expert & Vibe Coder

With 15+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Mail Us [email protected]

My Address 14/291, 43H,
A Square Building, Edathala,
Kochi, Kerala, India

Home
Blog
AI & Tools

Category archive

AI & Tools

AI integrations and productivity tools

Focused topic archive 43 articles

AI & Tools Jan 27, 2026

Implementing automatic model selection based on query complexity: using lightweight classifiers to route requests between quantized and full-precision models in Ollama

Why I wired Ollama to pick its own model I run Ollama on a fanless N100 box in the living-room closet. The CPU has AVX-VNNI but no dGPU, so every millisecond...

4 min read Read article

AI & Tools Jan 27, 2026

Setting up hybrid inference pipelines: routing complex reasoning tasks to DeepSeek-V3 while keeping simple queries on local Llama models

Why I Built This Hybrid Setup I run a local Llama 3.1 8B model on my Proxmox homelab for quick tasks—summarizing notes, extracting data from logs, answering...

7 min read Read article

AI & Tools Jan 27, 2026

Debugging token generation slowdowns in LM Studio after extended uptime: identifying model cache corruption and implementing automatic recovery

Why I Started Looking Into This I run LM Studio on my Proxmox server to handle local AI inference for various automation tasks. The setup worked perfectly for...

6 min read Read article

AI & Tools Jan 27, 2026

Building AI-powered mathematical proof assistants with local LLMs: implementing theorem verification workflows inspired by autonomous problem solving

Why I Built a Local Math Proof Assistant I've been running local LLMs for about a year now, mostly for text processing and code generation. But I kept...

7 min read Read article

AI & Tools Jan 27, 2026

Optimizing RTX 5090 VRAM allocation for parallel LLM inference: running multiple Ollama models simultaneously without memory thrashing

Why I'm Running Multiple LLMs at Once I run several LLM workflows on my home server. Some handle code generation, others process documentation, and a few...

7 min read Read article

AI & Tools Jan 27, 2026

Debugging Memory Leaks in Long-Running Ollama Instances: Monitoring VRAM Fragmentation and Implementing Automatic Model Reloads

Why I Started Looking Into This I run Ollama on my Proxmox server with a passthrough NVIDIA GPU. It handles various automation tasks through n8n—summarizing...

6 min read Read article

AI & Tools Jan 27, 2026

Implementing Semantic Caching for LLM APIs: Using Vector Embeddings to Match Similar Queries and Reduce Inference Costs

# Implementing Semantic Caching for LLM APIs: Using Vector Embeddings to Match Similar Queries and Reduce Inference Costs Why I Built This I run several...

7 min read Read article

AI & Tools Jan 26, 2026

Running Multiple Quantization Levels of the Same Model: Dynamic VRAM Allocation in Ollama for Speed vs Quality Tradeoffs

# Running Multiple Quantization Levels of the Same Model: Dynamic VRAM Allocation in Ollama for Speed vs Quality Tradeoffs Why I Started Running Multiple...

7 min read Read article

AI & Tools Jan 26, 2026

Building AI-Assisted Code Review Pipelines: Using Local LLMs to Validate Pull Requests Before Human Review

# Building AI-Assisted Code Review Pipelines: Using Local LLMs to Validate Pull Requests Before Human Review ## Why I Built This I maintain several projects...

7 min read Read article