Configuring Caddy as a gRPC reverse proxy for self-hosted AI inference APIs with automatic TLS and load balancing

Why I Built This Setup

I run several AI inference services on my Proxmox cluster—mostly LLM APIs and embedding models exposed over gRPC. The problem was simple: each service had its own port, no TLS by default, and managing certificates manually across multiple endpoints was tedious. I needed a single entry point that could:

Handle TLS automatically
Route requests to the right backend based on hostname or path
Balance load across multiple inference workers when needed
Not require constant certificate maintenance

I had used Nginx for years, but the gRPC configuration always felt brittle—lots of directives, manual HTTP/2 setup, and certificate renewal scripts. Caddy kept coming up in self-hosting circles, and the claim was that it “just works” with gRPC and automatic TLS. I decided to test that claim.

My Real Setup

Here’s what I was working with:

Backend services: Two LXC containers on Proxmox, each running a Python-based gRPC server (port 50051) for text generation
Domain: A subdomain (ai.vipinpg.com) pointed to my home IP via Cloudflare DNS
Reverse proxy host: A separate LXC container running Debian 12, where I installed Caddy
Goal: External clients hit ai.vipinpg.com, Caddy terminates TLS, forwards gRPC traffic to backends using h2c (HTTP/2 Cleartext), and balances load if both workers are healthy

I did not use Docker for this—I prefer LXC containers for infrastructure services because they’re lighter and easier to snapshot on Proxmox. The principles are identical for Docker setups, but the file paths and systemd commands differ slightly.

Installing Caddy

Caddy isn’t in Debian’s default repos, so I used the official install script:

sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy

This gave me Caddy 2.8.x. The systemd service started automatically, but I stopped it immediately because the default config listens on port 80 and serves a placeholder page—not what I needed.

Configuring the Caddyfile

Caddy’s configuration lives in /etc/caddy/Caddyfile. The default file had some commented examples, which I deleted. Here’s what I wrote for my first working gRPC proxy:

ai.vipinpg.com {
    reverse_proxy 192.168.1.101:50051 192.168.1.102:50051 {
        transport http {
            versions h2c
        }
        lb_policy round_robin
        health_uri /health
        health_interval 10s
    }
}

Breaking this down:

ai.vipinpg.com: The hostname Caddy listens for. It automatically provisions a Let’s Encrypt certificate for this domain.
reverse_proxy: The directive that forwards traffic. I listed both backend IPs directly.
transport http { versions h2c }: This tells Caddy to use HTTP/2 Cleartext when talking to backends. gRPC requires HTTP/2, but my backends don’t do TLS internally—they expect plain h2c.
lb_policy round_robin: Distributes requests evenly across both backends.
health_uri and health_interval: Caddy checks /health on each backend every 10 seconds. If a backend fails, it’s removed from rotation until it recovers.

I saved the file and validated it:

sudo caddy validate --config /etc/caddy/Caddyfile

No errors. I restarted Caddy:

sudo systemctl restart caddy

Within 30 seconds, Caddy had obtained a TLS certificate from Let’s Encrypt. I checked the logs:

sudo journalctl -u caddy -f

I saw successful ACME challenge completion and certificate storage. The automatic TLS claim was real—I did nothing beyond pointing my DNS and writing three lines of config.

Testing with grpcurl

I use grpcurl to test gRPC endpoints. First, I verified the backend directly:

grpcurl -plaintext 192.168.1.101:50051 list

This returned the service definition. Then I tested through Caddy:

grpcurl ai.vipinpg.com:443 list

It worked. Caddy terminated TLS, converted the request to h2c, and forwarded it to the backend. I sent a few inference requests and confirmed both backends were being used (I logged which worker handled each request).

What Didn’t Work Initially

Missing h2c transport: My first attempt didn’t include the transport http { versions h2c } block. Caddy tried to connect to backends using HTTP/1.1, which gRPC doesn’t support. Requests failed with malformed HTTP response errors. Adding the h2c directive fixed it immediately.

Health check endpoint: I assumed my gRPC service would respond to /health by default. It didn’t. Caddy marked both backends as unhealthy and stopped routing traffic. I had to implement a basic HTTP health endpoint in my Python service that returned 200 OK. This took about 10 lines of code using Flask running on a separate thread.

Port confusion: I initially forgot that Caddy listens on 443 for HTTPS by default. My firewall rules only allowed 80 and 8080. External clients couldn’t connect until I opened 443. Obvious in hindsight, but it cost me 20 minutes of confusion.

Load Balancing Behavior

Caddy’s round-robin policy worked as expected. I sent 100 requests and logged which backend handled each one—the distribution was 50/50. When I stopped one backend, Caddy detected the failure within 10 seconds (the health check interval) and routed all traffic to the remaining worker. When I restarted the stopped backend, Caddy added it back to rotation automatically.

I did not test other load balancing policies (random, least_conn, ip_hash) because round-robin met my needs. The documentation suggests they work similarly, but I can’t confirm from experience.

Adding a Second Service

I later added an embedding service on a different subdomain. The Caddyfile became:

ai.vipinpg.com {
    reverse_proxy 192.168.1.101:50051 192.168.1.102:50051 {
        transport http {
            versions h2c
        }
        lb_policy round_robin
        health_uri /health
        health_interval 10s
    }
}

embed.vipinpg.com {
    reverse_proxy 192.168.1.103:50052 {
        transport http {
            versions h2c
        }
    }
}

Caddy provisioned a second certificate automatically. No additional configuration needed. This is where Caddy’s simplicity shines—adding services is just adding blocks.

Certificate Management

Caddy stores certificates in /var/lib/caddy/.local/share/caddy/certificates. I checked this directory and found PEM files for both domains. Renewal happens automatically 30 days before expiration. I’ve been running this setup for four months now, and certificates have renewed twice without intervention.

I did not configure any certificate hooks or custom ACME servers. The defaults worked perfectly for my use case.

Limitations and Trade-offs

No gRPC reflection: Caddy proxies gRPC traffic but doesn’t understand the protocol deeply. If your client relies on server reflection to discover methods, it won’t work through Caddy. You need to provide the proto files to your client separately.

Health checks are HTTP-based: Caddy’s health checks use HTTP, not gRPC. This means you need a separate HTTP endpoint for health monitoring. If your service is pure gRPC with no HTTP support, you’ll need to add that capability or skip health checks.

No built-in rate limiting for gRPC: Caddy has rate limiting plugins for HTTP, but they don’t apply cleanly to gRPC. If you need per-client rate limiting, you’ll have to implement it in your backend service.

Logging is basic: Caddy logs successful proxying but doesn’t log gRPC method names or status codes by default. If you need detailed request logging, you’ll need to configure structured logging or add middleware.

Performance Notes

I ran informal load tests using ghz (a gRPC benchmarking tool). With Caddy in the middle, I saw about 5-8% latency overhead compared to hitting backends directly. This is acceptable for my use case—most of the time is spent in inference anyway.

CPU usage on the Caddy container stayed under 10% during normal load (around 50 requests per second). Memory usage was stable at ~40MB. Caddy is efficient, but I didn’t stress-test it beyond a few hundred concurrent connections.

Why This Worked for Me

Caddy solved my certificate management problem completely. I went from manually renewing certs every 90 days to zero maintenance. The Caddyfile syntax is clear enough that I can modify it months later without re-reading documentation.

The automatic h2c handling for gRPC was the other big win. Nginx required explicit grpc_pass directives and careful HTTP/2 configuration. Caddy just worked once I specified the transport.

Load balancing with health checks gave me basic high availability without needing a separate tool like HAProxy. For small-scale self-hosted setups, this is enough.

Key Takeaways

Caddy’s automatic TLS is not marketing—it genuinely works without intervention.
gRPC proxying requires the h2c transport directive. Don’t skip this.
Health checks need HTTP endpoints. If your service is pure gRPC, add a basic HTTP server.
The Caddyfile format is readable and easy to version control. Changes are low-risk.
Caddy is not a full observability solution. You’ll still need backend logging for detailed request tracking.

This setup has been running in production for my personal AI services since September 2024. It’s required zero maintenance beyond updating Caddy once when a security patch was released. For self-hosted gRPC APIs, it’s the simplest reverse proxy I’ve used.

Tech Expert & Vibe Coder

Why I Built This Setup

My Real Setup

Installing Caddy

Configuring the Caddyfile

Testing with grpcurl

What Didn’t Work Initially

Load Balancing Behavior

Adding a Second Service

Certificate Management

Limitations and Trade-offs

Performance Notes

Why This Worked for Me

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG

Tech Expert & Vibe Coder

Configuring Caddy as a gRPC reverse proxy for self-hosted AI inference APIs with automatic TLS and load balancing

Why I Built This Setup

My Real Setup

Installing Caddy

Configuring the Caddyfile

Testing with grpcurl

What Didn’t Work Initially

Load Balancing Behavior

Adding a Second Service

Certificate Management

Limitations and Trade-offs

Performance Notes

Why This Worked for Me

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG