Tech Expert & Vibe Coder

With 14+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Building a self-hosted AI code review system with Continue.dev and local LLM models on Proxmox

Why I Built a Self-Hosted AI Code Review System

I've been running Continue.dev locally with Ollama for code completion for months. It works well enough—autocomplete suggestions appear, I accept or reject them, and my code stays on my machine. But code completion is reactive. I wanted something that could actively review my work before I pushed it.

The problem with cloud-based AI code review tools is the same as with cloud completion: my code leaves my infrastructure. For personal projects, maybe that's acceptable. For client work or anything under NDA, it's not. I needed a system that could analyze code quality, catch issues, and suggest improvements without sending anything outside my network.

I already had Proxmox running multiple VMs and containers. I already had Ollama serving local LLM models. The pieces existed—I just needed to wire them together into something that could review code on demand or automatically.

My Setup and Constraints

My infrastructure runs on Proxmox 8.1 with:

  • Dell R730 with dual Xeon E5-2680v4 processors (28 cores total)
  • 128GB RAM
  • No dedicated GPU—everything runs on CPU
  • ZFS storage for VM and container data

I already had an LXC container running Ollama with these models pulled:

  • codellama:13b-instruct (for detailed explanations)
  • codellama:7b-code (for faster completion)
  • mistral:7b-instruct (for general reasoning)

The Ollama container has 16GB RAM allocated and runs on 8 dedicated cores. Response times for 7B models are around 2-3 seconds. For 13B models, 5-8 seconds depending on prompt length.

Continue.dev was already installed in VS Code on my workstation, configured to use the local Ollama instance via HTTP. That part worked fine for inline suggestions.

What I Needed to Add

Code completion is one thing. Code review requires:

  • Analyzing entire files or diffs, not just the current line
  • Running checks automatically on commit or push
  • Storing review results somewhere accessible
  • Handling multiple languages and frameworks

Continue.dev has a CLI tool called cn that can run tasks from the command line. I tested it manually first:

cn review --file src/main.py --model codellama:13b-instruct

It worked. The output was structured—issues flagged, suggestions listed, severity levels assigned. But it was slow. A single Python file with 200 lines took about 15 seconds to review. That's acceptable for manual use, not for automated checks on every commit.

Building the Review Pipeline

I decided to split the system into three parts:

  1. A Git hook that triggers reviews on pre-commit or pre-push
  2. A lightweight API wrapper around Continue CLI to queue and process reviews
  3. A simple web interface to view results

Git Hook Setup

I created a pre-push hook in my project's .git/hooks/ directory:

#!/bin/bash
# Get list of changed files
FILES=$(git diff --name-only --cached --diff-filter=ACM | grep -E '\.(py|js|go|ts)$')

if [ -z "$FILES" ]; then
  exit 0
fi

# Send files to review API
for FILE in $FILES; do
  curl -X POST http://proxmox-host:8090/review \
    -H "Content-Type: application/json" \
    -d "{\"file\": \"$FILE\", \"project\": \"$(basename $(git rev-parse --show-toplevel))\"}"
done

exit 0

This only reviews files that changed. No need to re-analyze the entire codebase every time.

API Wrapper for Continue CLI

I wrote a small Flask app to queue review requests and process them asynchronously. This runs in another LXC container on Proxmox with 4GB RAM.

from flask import Flask, request, jsonify
import subprocess
import json
import sqlite3
from datetime import datetime

app = Flask(__name__)

@app.route('/review', methods=['POST'])
def queue_review():
    data = request.json
    file_path = data.get('file')
    project = data.get('project')
    
    # Store in queue database
    conn = sqlite3.connect('reviews.db')
    cursor = conn.cursor()
    cursor.execute(
        "INSERT INTO review_queue (file, project, status, created_at) VALUES (?, ?, 'pending', ?)",
        (file_path, project, datetime.now().isoformat())
    )
    conn.commit()
    conn.close()
    
    return jsonify({"status": "queued"}), 202

@app.route('/process', methods=['POST'])
def process_reviews():
    conn = sqlite3.connect('reviews.db')
    cursor = conn.cursor()
    cursor.execute("SELECT id, file, project FROM review_queue WHERE status = 'pending' LIMIT 5")
    pending = cursor.fetchall()
    
    for review_id, file_path, project in pending:
        try:
            # Run Continue CLI review
            result = subprocess.run(
                ['cn', 'review', '--file', file_path, '--model', 'codellama:13b-instruct'],
                capture_output=True,
                text=True,
                timeout=30
            )
            
            output = result.stdout
            
            # Store result
            cursor.execute(
                "UPDATE review_queue SET status = 'completed', result = ?, completed_at = ? WHERE id = ?",
                (output, datetime.now().isoformat(), review_id)
            )
            conn.commit()
            
        except subprocess.TimeoutExpired:
            cursor.execute(
                "UPDATE review_queue SET status = 'timeout' WHERE id = ?",
                (review_id,)
            )
            conn.commit()
    
    conn.close()
    return jsonify({"processed": len(pending)}), 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8090)

The /review endpoint queues files. A separate cron job hits /process every minute to actually run reviews. This prevents Git operations from blocking while waiting for AI analysis.

Database Schema

CREATE TABLE review_queue (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    file TEXT NOT NULL,
    project TEXT NOT NULL,
    status TEXT DEFAULT 'pending',
    result TEXT,
    created_at TEXT,
    completed_at TEXT
);

What Worked

The system runs. I push code, the hook fires, files get queued, and reviews complete within a few minutes. Results are stored in SQLite and accessible via a basic web UI (just a table showing file, status, and issues found).

Continue.dev's review output is surprisingly useful. It catches:

  • Unused imports and variables
  • Missing error handling
  • Potential null pointer dereferences
  • Inefficient loops or queries
  • Security issues like SQL injection risks

It's not perfect, but it's better than nothing. And because it runs locally, I can review proprietary code without worrying about data leakage.

The 13B model is noticeably better than 7B for code review. It provides more context in its explanations and catches subtler issues. The trade-off is speed—13B takes 3-4x longer to process the same file.

What Didn't Work

The first version tried to run reviews synchronously in the Git hook. That was a mistake. Pushing a branch with 10 changed files meant waiting 2-3 minutes for all reviews to complete. Unacceptable.

I also tried using codellama:34b-instruct for higher quality reviews. It ran out of memory and crashed the Ollama container. 34B models need more RAM than I allocated. I could increase it, but that would steal resources from other VMs.

The web UI is ugly and minimal. I considered building something nicer with React, but decided against it. The point is functionality, not aesthetics. A plain HTML table works fine for viewing results.

Continue.dev's CLI doesn't support batch processing. Each file is reviewed individually, which means multiple API calls to Ollama. I looked into modifying the CLI to accept multiple files at once, but the codebase is complex enough that I decided against it.

Performance Observations

Average review times on my hardware:

  • Small file (< 100 lines): 8-12 seconds with codellama:13b-instruct
  • Medium file (100-300 lines): 15-25 seconds
  • Large file (300+ lines): 30-45 seconds

CPU usage spikes to 100% across all allocated cores during review. RAM usage stays under 12GB even with the 13B model.

The queue-based approach means I can push code and move on. Reviews finish in the background. If I need immediate feedback, I run cn review manually.

Limitations and Trade-offs

This system is not a replacement for human code review. It catches mechanical issues—syntax errors, common bugs, basic security flaws. It doesn't understand business logic or architectural decisions.

The models sometimes hallucinate. They'll suggest fixes that don't actually solve the problem or introduce new bugs. I still review every suggestion manually.

Processing time is slow compared to cloud-based tools. GitHub Copilot or CodeRabbit return results in seconds. My setup takes minutes. That's the cost of keeping everything local.

There's no integration with GitHub PRs or GitLab merge requests. Reviews happen locally, not in the CI pipeline. I could extend the system to post comments via API, but I haven't needed that yet.

Key Takeaways

  • Continue.dev's CLI is powerful but underdocumented. Expect to read source code.
  • Local LLMs are viable for code review if you have the hardware and accept slower responses.
  • Queue-based processing is essential for usability. Synchronous reviews block too long.
  • 13B models provide noticeably better results than 7B for analysis tasks.
  • Keep the system simple. Complex UIs and integrations add maintenance burden.

This setup works for my needs—private, local, and automated. It's not polished, but it's functional. If you have similar requirements and existing Proxmox infrastructure, the pieces are straightforward to assemble.