Tech Expert & Vibe Coder

With 14+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Building a Self-Hosted AI Code Review Bot with Continue.dev, Ollama, and GitLab Webhooks for Merge Request Automation

Why I Built This

I run a self-hosted GitLab instance on Proxmox for my personal projects. Code reviews used to pile up—not because I didn't care, but because context-switching between tasks meant merge requests sat idle longer than they should. I wanted something that could do an initial pass on code changes automatically, flag obvious issues, and give me a head start before I reviewed manually.

I already had Ollama running locally for other AI experiments, and I'd been using Continue.dev in VS Code for code assistance. The idea was simple: hook GitLab's webhook system into a local service that could pull merge request diffs, run them through a local AI model, and post feedback as comments. No cloud services, no external APIs, no monthly fees.

My Setup

Here's what I was working with:

  • GitLab CE running in a Proxmox LXC container (192.168.1.50)
  • Ollama installed on my main Proxmox host (192.168.1.10), serving models locally
  • Continue.dev already configured in VS Code, using Ollama models for chat and code assistance
  • Python 3.11 for the webhook listener and review logic
  • Docker to containerize the webhook service for easier deployment

The model I chose was qwen2.5-coder:7b—it's fast enough for real-time responses, doesn't require massive GPU resources, and handles code understanding reasonably well. I'd already pulled it for Continue.dev work, so it was sitting there ready to use.

How I Built the Webhook Listener

GitLab's webhook system is straightforward. When a merge request is opened or updated, it can POST a JSON payload to any URL you specify. I needed a lightweight HTTP server to catch that payload, extract the diff, and trigger the review process.

I wrote a Flask app that listens on port 5000. The core logic:

  • Receive the webhook POST from GitLab
  • Extract project ID, merge request IID, and diff URL from the payload
  • Fetch the actual diff using GitLab's API
  • Send the diff to Ollama via Continue.dev's API pattern
  • Parse the response and post it back as a merge request comment

The Flask app itself is simple—about 150 lines including error handling. I used the requests library to interact with both GitLab and Ollama's HTTP endpoints.

Fetching the Diff

GitLab's API provides merge request diffs at /api/v4/projects/{project_id}/merge_requests/{mr_iid}/changes. I used a personal access token with api scope to authenticate. The diff comes back as a JSON object with file changes, including added/removed lines.

I filtered out binary files and focused only on text-based code files. The diff format is standard unified diff, which the AI model handles well without extra formatting.

Sending to Ollama

Ollama's API is OpenAI-compatible, which made integration easy. I structured the prompt like this:

"You are a code reviewer. Analyze the following diff and provide feedback on potential issues, code quality, and suggestions. Focus on logic errors, security concerns, and maintainability. Keep feedback concise."

Then I appended the raw diff. The request goes to http://192.168.1.10:11434/v1/chat/completions with the model set to qwen2.5-coder:7b.

Response time varied—small diffs (under 200 lines) came back in 5-10 seconds. Larger diffs (500+ lines) took 30-40 seconds. Not instant, but acceptable for an automated first pass.

Posting the Comment

Once I had the AI's response, I posted it back to the merge request using GitLab's /api/v4/projects/{project_id}/merge_requests/{mr_iid}/notes endpoint. I prefixed each comment with "🤖 Automated Review" so it was clear this wasn't a human review.

Dockerizing the Service

I packaged the Flask app in a Docker container to keep it isolated and easy to redeploy. The Dockerfile:

  • Base image: python:3.11-slim
  • Installed Flask, requests, and python-dotenv
  • Exposed port 5000
  • Used environment variables for GitLab URL, token, and Ollama endpoint

I run the container on the same Proxmox host as Ollama, using --network host so it can reach both GitLab (192.168.1.50) and Ollama (localhost:11434) without extra networking config.

Configuring GitLab Webhooks

In GitLab, I went to Settings → Webhooks for each project I wanted automated reviews on. I added:

  • URL: http://192.168.1.10:5000/webhook
  • Trigger: Merge request events (opened, updated)
  • Secret Token: A random string I generated, validated in the Flask app to prevent unauthorized requests

I tested the webhook using GitLab's "Test" button. The first attempt failed because I forgot to expose the Flask port in Docker. After fixing that, the webhook fired successfully and posted a comment on a test merge request.

What Worked

The system does what I needed:

  • Catches merge request events reliably
  • Fetches diffs without issues
  • Generates useful feedback on code structure, potential bugs, and style inconsistencies
  • Posts comments within a reasonable time frame

The AI caught things I sometimes miss on first glance—unused variables, missing error handling, overly complex conditionals. It's not perfect, but it's a helpful first filter.

Performance-wise, qwen2.5-coder:7b runs smoothly on my setup (AMD Ryzen 9, 64GB RAM, no dedicated GPU). CPU usage spikes during inference but stays manageable.

What Didn't Work

Large diffs overwhelm the model. Anything over 1000 lines produces generic, surface-level feedback. The model loses context and starts repeating itself. I added a check to skip reviews if the diff exceeds 800 lines and post a message saying "Diff too large for automated review."

False positives on style issues. The model sometimes flags intentional design choices as problems. For example, it suggested breaking up a 50-line function that was deliberately kept together for clarity. I tuned the prompt to focus more on logic and security, less on style, which helped but didn't eliminate the issue.

No understanding of project context. The AI only sees the diff, not the broader codebase. It can't know if a function call is valid elsewhere in the project or if a variable is defined in another file. This limits its usefulness for complex changes.

Webhook retries caused duplicate comments. GitLab retries failed webhooks, and if my Flask app crashed mid-request, the retry would post the same comment twice. I added idempotency by checking if a comment with the same content already exists before posting.

Limitations and Trade-offs

This setup is not a replacement for human review. The AI misses nuanced issues, doesn't understand business logic, and sometimes suggests changes that make no sense in context.

It's also not scalable beyond personal use. If multiple merge requests come in simultaneously, the Flask app processes them serially. With a 30-second review time per request, this creates a backlog. I could add a task queue (like Celery), but I haven't needed it yet.

Security-wise, the Flask app trusts the webhook secret token but doesn't validate GitLab's IP range. If someone spoofs the token, they could trigger fake reviews. Not a huge risk on my private network, but worth noting.

Key Takeaways

  • Local AI models are viable for automated code review if you keep expectations realistic.
  • GitLab's webhook system is flexible and easy to integrate with custom services.
  • Ollama's OpenAI-compatible API makes it simple to swap in different models without rewriting code.
  • Diff size matters—smaller, focused changes get better feedback than massive refactors.
  • Always include a way to identify automated comments so reviewers know what's AI-generated.

This system saves me time on initial reviews and catches low-hanging issues automatically. It's not perfect, but it works well enough that I've kept it running for the past few months.