Tech Expert & Vibe Coder

With 15+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Mail Us [email protected]

My Address 14/291, 43H,
A Square Building, Edathala,
Kochi, Kerala, India

Home
Blog
AI & Tools

Category archive

AI & Tools

AI integrations and productivity tools

Focused topic archive 43 articles

AI & Tools Jan 26, 2026

Benchmarking RTX 5090 vs 4090 for Local LLM Inference: Real-World Token/Second Gains with Ollama and LM Studio

Why I Benchmarked the RTX 5090 Against My 4090 I've been running local LLMs on my RTX 4090 for over a year now. My setup includes Ollama for quick CLI...

7 min read Read article

AI & Tools Jan 25, 2026

setting up speculative decoding for faster llm inference: configuring draft models in lm studio to double token generation speed

Why I Started Looking Into Speculative Decoding I run LM Studio on my local machine for testing language models without sending data to external APIs. The...

6 min read Read article

AI & Tools Jan 25, 2026

building multi-model ai pipelines with litellm proxy: load balancing requests between local ollama and cloud apis with automatic fallback

Why I Built This I run AI workloads on my home lab—mostly local models through Ollama for privacy-sensitive tasks and cost control. But local inference has...

7 min read Read article

AI & Tools Jan 25, 2026

debugging raspberry pi 5 performance bottlenecks when hosting llama 3.3 70b: thermal throttling vs memory bandwidth limits

Why I Decided to Test Llama 3.3 70B on a Raspberry Pi 5 I knew this was going to be a disaster before I even started. Running a 70-billion parameter model on a...

6 min read Read article

AI & Tools Jan 25, 2026

implementing llm response caching with redis: reducing ollama inference costs for repeated queries in n8n workflows

Why I Built LLM Response Caching with Redis I run several n8n workflows that use Ollama for text processing, summarization, and question answering. The problem...

6 min read Read article

AI & Tools Jan 25, 2026

running deepseek-v3 on consumer hardware: quantization strategies and vram optimization for 685b parameter models

Why I Started Running DeepSeek-V3 Locally I needed a model that could handle complex reasoning tasks without sending data to external APIs. DeepSeek-V3's 685...

6 min read Read article

AI & Tools Jan 23, 2026

Implementing Custom Encryption Layers for Tailscale State Files After Default Encryption Removal

Why I Built Custom Encryption for Tailscale State Files I run Tailscale on everything—Proxmox hosts, Docker containers, my Synology NAS, and a handful of...

6 min read Read article

AI & Tools Jan 22, 2026

Implementing Automatic Model Quantization Pipelines: Converting HuggingFace Models to GGUF for Ollama with llama.cpp Scripts

Why I Built a Model Quantization Pipeline I run several AI models locally on my Proxmox cluster. Some run in Docker containers, others in VMs. The problem I...

6 min read Read article

AI & Tools Jan 22, 2026

Setting Up Prometheus Metrics for LM Studio API Endpoints: Tracking Token Usage and Response Times with Custom Exporters

Why I Built This I run LM Studio locally on my home server to handle various AI tasks—text processing, summarization, coding assistance. It works well, but I...

7 min read Read article