Setting Up Automated Truenas Backup Verification Pipelines: Using Zfs Snapshots and Docker Containers to Test Restore Integrity

Why I Built This

I take snapshots of my TrueNAS datasets every day. They’re automatic, they’re versioned, and they sit there quietly in the background. For a long time, I assumed that meant my backups were solid.

Then I had a moment of doubt: what if a snapshot is corrupted? What if the data I’m snapshotting is already broken? What if I can’t actually restore from these when I need to?

Setting Up Automated Truenas Backup Verification Pipelines: Using Zfs Snapshots and Docker Containers to Test Restore Integrity

I didn’t want to wait for an emergency to find out. I needed a way to regularly verify that my snapshots could actually be restored and that the data inside them was intact. Not just once, but automatically, on a schedule I could trust.

My Setup

I run TrueNAS Scale on a dedicated box with a ZFS pool that holds my critical datasets. I already had periodic snapshot tasks configured through the TrueNAS UI, creating hourly, daily, and weekly snapshots with retention policies.

On the same network, I have a Proxmox host where I run Docker containers for various automation tasks. I use this environment to spin up lightweight test containers that can mount datasets, run checks, and report results.

The goal was simple: pick a recent snapshot, clone it to a temporary dataset, mount it inside a Docker container, run integrity checks on the files, and log the results. If something fails, I want to know immediately.

The Tools I Used

TrueNAS Scale with ZFS snapshots already configured
Docker on a separate Linux host (could also run directly on TrueNAS if you prefer)
A simple Python script to orchestrate the verification process
SSH access from the Docker host to TrueNAS for running ZFS commands
Cronicle for scheduling the verification runs (you could use cron or any other scheduler)

How I Built the Pipeline

Step 1: Listing and Selecting Snapshots

The first thing I needed was a way to programmatically list snapshots and pick one to verify. I wrote a small script that SSHs into my TrueNAS box and runs:

zfs list -t snapshot -o name,creation -s creation

This gives me a list of all snapshots sorted by creation time. I filter for the dataset I care about and grab the most recent daily snapshot. I avoid verifying the very latest snapshot because it might still be in use or actively changing.

Step 2: Cloning the Snapshot

Instead of mounting the snapshot directly (which is read-only and can be tricky with permissions), I clone it to a temporary dataset:

zfs clone pool/dataset@snapshot-name pool/verify-temp

This creates a writable clone that I can mount and test without affecting the original snapshot. The clone is thin-provisioned, so it doesn’t immediately consume extra space.

Step 3: Mounting in a Docker Container

I created a simple Docker container based on Alpine Linux with a few utilities installed: rsync, sha256sum, and find. The container mounts the cloned dataset via NFS from TrueNAS.

I already had NFS shares configured on TrueNAS for other purposes, so I just added the temporary dataset to the allowed exports during the verification run. The Docker container mounts it like this:

docker run --rm 
  -v /mnt/nfs-verify:/data:ro 
  alpine-verify:latest 
  /scripts/verify.sh

The verify.sh script inside the container does the actual integrity checking.

Step 4: Running Integrity Checks

The verification script does a few things:

Counts the total number of files in the dataset
Checks for any files with unexpected permissions or ownership (this caught a misconfiguration once)
Runs checksums on a sample of files to ensure they’re readable and not corrupted
Attempts to open and read a few known critical files (like database dumps or config files)

I don’t checksum every single file because that would take too long on large datasets. Instead, I sample a percentage of files randomly and verify those. If the sample passes, I have reasonable confidence the rest is intact.

For datasets that contain database backups, I also added a step to restore the backup into a temporary PostgreSQL or MySQL container and run a basic query. This ensures the backup file isn’t just present, but actually usable.

Step 5: Cleanup and Logging

After the checks complete, the script logs the results to a file and sends a summary to a monitoring endpoint I have set up (a simple webhook that posts to a Slack channel). If any check fails, the alert is immediate.

Then I clean up:

zfs destroy pool/verify-temp

The clone is deleted, and the space is reclaimed. The whole process takes between 5 and 20 minutes depending on the dataset size.

What Worked

This setup has been running for about six months now, and it’s caught two real issues:

The first was a snapshot of a dataset where the underlying data had been corrupted before the snapshot was taken. The files were there, but they were unreadable. I only discovered this because the verification script tried to checksum them and failed. I was able to restore from an older snapshot before the corruption happened.

The second issue was a database backup that was incomplete. The backup script had silently failed partway through, but the file was still created with a normal-looking filename. The restore test inside the Docker container failed immediately, and I fixed the backup script.

Both of these would have been disasters if I’d only found out during an actual restore scenario.

Why Docker Containers Work Well for This

Using Docker containers for the verification process keeps everything isolated. I can spin up a clean environment, run the checks, and tear it down without leaving any residue on the host system. It also makes it easy to test different types of data—I have separate container images for verifying database backups, media files, and configuration archives.

The containers are lightweight and start quickly, which is important when you’re running these checks on a schedule.

What Didn’t Work

My first attempt tried to mount the ZFS snapshot directly inside the Docker container using a bind mount. This was a mess. Permissions were wrong, the mount was read-only in ways that confused some of my verification tools, and I kept running into issues with stale NFS handles.

Cloning the snapshot to a temporary dataset solved all of these problems. It’s a bit more overhead, but it’s worth it for the reliability.

I also initially tried to verify every file in the dataset, which was way too slow. On a dataset with hundreds of thousands of small files, the verification run would take hours. Switching to a sampling approach (checking 5-10% of files randomly) gave me good enough confidence in a fraction of the time.

Another mistake: I didn’t set up proper alerting at first. The script would run, log results to a file, and I’d forget to check it. Adding the webhook to Slack made a huge difference—I actually see the results now, and I know immediately if something fails.

Key Takeaways

Snapshots are not backups until you’ve verified you can restore from them. Automating that verification is the only way to be sure it actually happens.

Cloning snapshots to temporary datasets is cleaner and more reliable than trying to mount them directly, especially when you need to run tests that expect normal filesystem behavior.

You don’t need to verify every byte of every file. A well-chosen sample plus targeted checks on critical files gives you enough confidence without burning hours of compute time.

Docker containers are a good fit for this kind of work. They’re disposable, isolated, and easy to customize for different types of data.

If you’re not testing your restores, you’re not really backed up. This pipeline gave me the confidence I was missing.

Tech Expert & Vibe Coder

Setting Up Automated Truenas Backup Verification Pipelines: Using Zfs Snapshots and Docker Containers to Test Restore Integrity

Why I Built This

My Setup

The Tools I Used

How I Built the Pipeline

Step 1: Listing and Selecting Snapshots

Step 2: Cloning the Snapshot

Step 3: Mounting in a Docker Container

Step 4: Running Integrity Checks

Step 5: Cleanup and Logging

What Worked

Why Docker Containers Work Well for This

What Didn’t Work

Key Takeaways

Leave a Comment Cancel reply

Search Articles

Categories

About the Author

Vipin PG

Tech Expert & Vibe Coder

Why I Built This

My Setup

The Tools I Used

How I Built the Pipeline

Step 1: Listing and Selecting Snapshots

Step 2: Cloning the Snapshot

Step 3: Mounting in a Docker Container

Step 4: Running Integrity Checks

Step 5: Cleanup and Logging

What Worked

Why Docker Containers Work Well for This

What Didn’t Work

Key Takeaways

Debugging Synology Container Manager Performance After Dsm 7.2 Update: Fixing Cgroup V2 Resource Limits and Overlay2 Storage Issues

Recovering from Catastrophic Fibre Network Failures: Rebuilding Multi-vlan Homelab Connectivity After Physical Layer Damage

Leave a Comment Cancel reply

Search Articles

Categories

About the Author

Vipin PG

Related articles

How to Install Caddy on Proxmox Container With Ubuntu Server

How to Install PeerTube on Proxmox Container With Ubuntu Server

How to Install Prowlarr on Proxmox Container With Ubuntu Server

Get new posts and practical tech notes in your inbox.