Tech Expert & Vibe Coder

With 15+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Building Automated Container Security Scanning Pipelines: Integrating Trivy with Portainer Webhooks for Pre-Deployment Vulnerability Blocking

Why I Built This

I run a Proxmox cluster with multiple Docker hosts managed through Portainer. Over time, I’ve deployed dozens of containers—some from trusted sources, others from community repositories I barely vetted. The problem hit me when a security advisory came out for a base image I’d been using in production for months. I had no systematic way to know what vulnerabilities existed in my running containers until something broke or made the news.

I needed a way to scan images before they went live, not after. Manual scanning before every deployment wasn’t realistic—I update containers frequently, test new tools, and run automated workflows through n8n. The solution had to integrate with my existing Portainer setup and block deployments automatically if critical vulnerabilities were found.

My Setup

I’m using:

  • Portainer Business (self-hosted) managing 4 Docker environments across my Proxmox cluster
  • Trivy installed on a dedicated Debian 12 VM with 4GB RAM
  • n8n for webhook handling and orchestration
  • A simple Flask API I wrote to act as the scanning gateway
  • PostgreSQL database to track scan results over time

Portainer has webhook functionality that fires on specific events. I configured webhooks to trigger before container creation and updates. The webhook payload includes image details, which I extract and pass to Trivy for scanning.

How I Built the Pipeline

Installing and Configuring Trivy

I installed Trivy directly on the VM rather than running it in a container. This gave me more control over caching and database updates:

wget https://github.com/aquasecurity/trivy/releases/download/v0.48.3/trivy_0.48.3_Linux-64bit.deb
sudo dpkg -i trivy_0.48.3_Linux-64bit.deb

I configured Trivy to use a local cache directory and set up a daily cron job to update the vulnerability database:

0 2 * * * /usr/local/bin/trivy image --download-db-only

This runs at 2 AM and keeps the database current without blocking scans during the day.

Building the Scanning API

I wrote a lightweight Flask API to receive webhook calls from Portainer and execute Trivy scans. The API does three things:

  1. Receives the webhook payload
  2. Extracts the image name and tag
  3. Runs Trivy with specific severity filters
  4. Returns a pass/fail response based on findings

Here’s the core scanning function I use:

import subprocess
import json

def scan_image(image_name):
    cmd = [
        'trivy', 'image',
        '--severity', 'CRITICAL,HIGH',
        '--format', 'json',
        '--quiet',
        image_name
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    scan_data = json.loads(result.stdout)
    
    critical_count = 0
    high_count = 0
    
    for target in scan_data.get('Results', []):
        for vuln in target.get('Vulnerabilities', []):
            if vuln['Severity'] == 'CRITICAL':
                critical_count += 1
            elif vuln['Severity'] == 'HIGH':
                high_count += 1
    
    return {
        'critical': critical_count,
        'high': high_count,
        'passed': critical_count == 0
    }

I initially blocked on any HIGH severity findings, but that was too aggressive. Many images have HIGH vulnerabilities that aren’t actually exploitable in my use case. I settled on blocking only CRITICAL vulnerabilities and logging HIGH ones for manual review.

Configuring Portainer Webhooks

Portainer’s webhook system is straightforward but limited. I set up webhooks at the environment level, not per-stack. This means every container creation or update in that environment triggers the webhook.

The webhook configuration in Portainer:

  • Event: Container creation and container update
  • Endpoint URL: My Flask API endpoint
  • Method: POST

The payload Portainer sends includes the image name, but not always in a consistent format. I had to handle variations like:

  • nginx:latest
  • docker.io/library/nginx:latest
  • registry.example.com/custom-image:v1.2.3

My API normalizes these before passing them to Trivy.

Blocking Deployments

This is where I hit a limitation: Portainer webhooks are notification-only. They fire after an action starts, not before it completes. I cannot actually prevent a container from starting through the webhook alone.

My workaround:

  1. The webhook fires when a container is created
  2. My API scans the image
  3. If the scan fails (CRITICAL vulnerabilities found), the API calls Portainer’s REST API to immediately stop and remove the container
  4. I log the event and send a notification through n8n

This means there’s a brief window (usually 2-5 seconds) where a vulnerable container is technically running. For my use case, this is acceptable because these containers aren’t exposed externally during startup, and my network is segmented.

Handling Private Registries

I use a private registry for some custom images. Trivy needs credentials to scan these. I configured Trivy to use Docker’s credential store:

docker login registry.example.com

Trivy automatically picks up the credentials from ~/.docker/config.json. This works, but I had to ensure the user running the Flask API has access to this file.

What Didn’t Work

Scanning Large Images

Some of my images are 2-3GB. Trivy takes 30-60 seconds to scan these, which causes webhook timeouts. Portainer expects a response within 10 seconds.

I solved this by making the API respond immediately with “scan in progress” and performing the actual scan asynchronously. If the scan fails, I stop the container afterward. This isn’t perfect, but it’s the best I could do without modifying Portainer’s timeout settings (which aren’t exposed in the UI).

False Positives

Trivy reports vulnerabilities based on package versions, not actual exploitability. I’ve had scans flag vulnerabilities in base OS packages that are never executed by my application.

I added a suppression list in my API where I can manually mark specific CVEs as acceptable for certain images. This is not ideal—it requires manual curation—but it’s better than ignoring all HIGH severity findings.

Database Bloat

I log every scan result to PostgreSQL for historical tracking. After a few weeks, the database grew to several GB because I was storing full JSON scan outputs. I changed to storing only summary data (counts by severity) and keep full results for failed scans only.

Performance Considerations

Trivy caches image layers, which significantly speeds up repeat scans. The first scan of an image takes 15-30 seconds. Subsequent scans of the same image (even with updated vulnerability data) take 2-5 seconds.

I allocated 4GB RAM to the scanning VM. Trivy can use more during scans of large images, but 4GB has been sufficient for my workload. CPU usage spikes briefly during scans but averages under 10%.

Integration with n8n

When a scan fails, my API sends a webhook to n8n, which:

  • Logs the failure to a dedicated Slack channel
  • Creates a ticket in my self-hosted Vikunja instance
  • Sends me a Pushover notification if it’s a production environment

This gives me visibility into what’s being blocked without having to constantly check logs.

Limitations I’m Living With

This setup is not a true pre-deployment gate. There’s always a brief moment where a vulnerable container starts before being killed. For truly critical environments, this wouldn’t be acceptable. I’d need to implement scanning at the CI/CD level or use Portainer’s GitOps features with a custom admission controller.

I also don’t scan running containers continuously. Trivy can do this, but I haven’t integrated it yet. My pipeline only scans at deployment time. If a new vulnerability is discovered in an image I deployed last month, I won’t know unless I manually trigger a rescan or redeploy.

The suppression list requires manual maintenance. Every time I suppress a CVE, I document why in a separate markdown file, but this is manual work that doesn’t scale well.

What I Learned

Portainer’s webhook system is useful but limited. It’s designed for notifications, not enforcement. If I were building this again, I’d look into Portainer’s API-first deployment approach or move to a GitOps model where scanning happens before the deployment request even reaches Portainer.

Trivy is fast and accurate, but interpreting its results requires context. Not every vulnerability matters in every deployment. I spent more time building the suppression and filtering logic than I did integrating Trivy itself.

The biggest value isn’t in blocking every vulnerable image—it’s in having visibility. Before this pipeline, I had no idea what was running in my containers. Now I have a record of every image deployed, its vulnerabilities at deployment time, and a process to review findings before they go live.

This system has caught several images with known RCE vulnerabilities that I would have deployed without thinking twice. It’s not perfect, but it’s significantly better than deploying blind.

Leave a Comment

Your email address will not be published. Required fields are marked *