Creating a local AI-powered search engine for archived documentation using embeddings with chromadb and sentence-transformers

Why I Built This

I maintain a lot of technical documentation across different projects. Configuration files, troubleshooting notes, setup guides, API references—all scattered across folders on my Synology NAS and local machines. When something breaks at 2 AM, I need answers fast, not a scavenger hunt through nested directories.

Keyword search doesn’t cut it. I might remember “that Docker networking issue” but not the exact terms I used in the notes. I needed semantic search—something that understands what I’m asking for, not just matches strings.

Cloud-based solutions felt wrong for private documentation. I don’t want my internal configs, API keys, or debugging notes sitting on someone else’s server. So I built a local system using ChromaDB for vector storage and sentence-transformers for embeddings. No external APIs, no data leaving my network.

My Setup

I run this on a Proxmox VM with 8GB RAM and 4 CPU cores. The VM hosts a Python environment with ChromaDB as the vector database and sentence-transformers for generating embeddings locally. All my archived docs live on the Synology NAS, mounted via NFS to the VM.

I chose sentence-transformers over Ollama embeddings because I wanted smaller, faster models that don’t require GPU acceleration. The all-MiniLM-L6-v2 model works well for my use case—it’s 80MB, runs on CPU, and generates embeddings in milliseconds.

ChromaDB stores everything in a local SQLite database with vector indices. No separate database server needed. The entire setup is self-contained and portable.

Key Components

Python 3.11 environment
ChromaDB 0.4.x for vector storage
sentence-transformers for embeddings
Simple Flask API for queries (optional, I mostly use CLI)

How I Process Documents

My documentation includes plain text files, Markdown notes, YAML configs, and Python scripts. I wrote a basic processor that reads files, splits them into chunks, generates embeddings, and stores everything in ChromaDB.

Chunking Strategy

I split documents into 500-character chunks with 50-character overlap. This size works because my notes are usually concise. Larger chunks would work for longer articles, but my docs are practical snippets, not essays.

Overlap matters. If a solution spans two chunks, the overlap ensures context isn’t lost. I tried no overlap initially—search results were worse.

from sentence_transformers import SentenceTransformer
import chromadb
import os

model = SentenceTransformer('all-MiniLM-L6-v2')
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("docs")

def chunk_text(text, chunk_size=500, overlap=50):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

def process_file(filepath):
    with open(filepath, 'r', encoding='utf-8') as f:
        content = f.read()
    
    chunks = chunk_text(content)
    embeddings = model.encode(chunks).tolist()
    
    ids = [f"{os.path.basename(filepath)}_{i}" for i in range(len(chunks))]
    metadatas = [{"file": filepath, "chunk": i} for i in range(len(chunks))]
    
    collection.add(
        embeddings=embeddings,
        documents=chunks,
        metadatas=metadatas,
        ids=ids
    )

I run this script whenever I add new documentation. It takes about 30 seconds to process 100 files.

Metadata Tracking

Each chunk stores the source file path and chunk index. When I get search results, I know exactly where the information came from. This is critical for verification—I don’t trust AI-generated answers without checking the source.

Querying the System

Searching is straightforward. I embed the query using the same model, then ChromaDB finds the most similar chunks based on cosine distance.

def search(query, n_results=5):
    query_embedding = model.encode([query]).tolist()
    
    results = collection.query(
        query_embeddings=query_embedding,
        n_results=n_results,
        include=["documents", "metadatas", "distances"]
    )
    
    return results

# Example usage
results = search("Docker bridge network configuration")
for doc, meta, dist in zip(results['documents'][0], 
                           results['metadatas'][0], 
                           results['distances'][0]):
    print(f"Distance: {dist:.3f}")
    print(f"Source: {meta['file']}")
    print(f"Content: {doc}
")

The distance score tells me how relevant the match is. Anything below 0.5 is usually spot-on. Between 0.5-0.8 is worth checking. Above 0.8, I ignore it.

What Works Well

Semantic search handles my vague queries better than grep ever did. “How do I fix DNS resolution in Docker” returns relevant chunks even though my notes say “container name resolution issues.”

Speed is good. Queries return in under 200ms for my 5,000-chunk database. No noticeable lag even on the VM’s modest CPU.

Privacy is absolute. Everything runs locally. No API calls, no telemetry, no external dependencies beyond the initial model download.

What Didn’t Work

My first attempt used larger chunks (2000 characters). Search results were too broad—I’d get entire documentation sections when I needed a specific command. Smaller chunks fixed this.

I tried using all-mpnet-base-v2 for better accuracy, but it’s 420MB and slower. The quality improvement wasn’t worth the resource cost for my use case.

ChromaDB’s default distance function (L2) gave weird results initially. Switching to cosine similarity made search more intuitive.

Limitations I Accept

This system doesn’t generate answers—it just finds relevant chunks. I read the results myself instead of feeding them to an LLM. That’s intentional. I don’t want hallucinated solutions mixed with real documentation.

No automatic updates. When I modify a file, I need to re-process it manually. I could automate this with file watchers, but I haven’t bothered yet. My docs don’t change that often.

Search quality depends on how I write documentation. Vague notes produce vague results. This forced me to write clearer, more specific documentation, which is probably a good thing.

Current State

I’ve been using this system for six months. It indexes about 800 files totaling 5MB of text. The ChromaDB database is 45MB on disk.

I query it several times a week, mostly for configuration lookups and troubleshooting references. It’s faster than scrolling through files and more reliable than my memory.

The setup runs continuously on the VM with minimal resource usage—about 200MB RAM idle, spikes to 500MB during queries.

Key Takeaways

Semantic search transforms how I use archived documentation. I find what I need based on meaning, not memorized keywords.

Local embeddings are practical. The all-MiniLM-L6-v2 model is fast enough for real-time search without requiring GPU resources.

Chunk size matters more than I expected. Too large and results are unfocused. Too small and context is lost. 500 characters works for my documentation style.

ChromaDB’s simplicity is its strength. No complex setup, no separate database server, just a Python library and local storage.

This system doesn’t replace reading documentation—it helps me find the right documentation faster. That’s exactly what I needed.

Tech Expert & Vibe Coder

Why I Built This