Why I worked on this
My main Postgres database lives in a container on a Proxmox VM. The VM’s disk is already an NFS mount from the TrueNAS box in the basement (10 GbE, spinning rust, raid-z2). When I moved the container to a new host last month, every docker compose up took 90 s instead of 8 s and a simple VACUUM inside Postgres felt like it was running on a USB-1 stick. I/O wait sat at 40 % while the array downstairs was basically idle. Something in the Docker → NFS chain was clearly choking.
My real setup
- TrueNAS 13.0-U5, NFS share exported with default options (
rw,no_root_squash). - Proxmox 8.1 VM, Ubuntu 22.04, kernel 6.5, Docker 24.0.7, Compose v2.23.
- Compose snippet that mattered:
volumes: pgdata: driver: local driver_opts: type: nfs o: addr=10.0.40.5,rw,nfsvers=4.2 device: ":/mnt/pool/docker/pg14" - Database: Postgres 14, 250 GB, mostly write-heavy telemetry.
What didn’t work
- Single TCP connection (NFS default)
iostat -x 1showed only one outstanding I/O request no matter how many backends Postgres launched. Throughout capped at ~120 MB/s on a link that pushes 600 MB/s withdd. - Synchronous writes (default mount)
EveryCOMMITwaited for the server to flush ZFS to spinning disks.pg_test_fsyncreported 250 fsync/s — the same number I got when I accidentally ran Postgres on a USB key in 2014. - Cache-dropping "benchmarks"
I triedecho 3 > /proc/sys/vm/drop_cachesbetween runs to get “clean” numbers. All it did was make the database cold and angry; production performance never looked like those numbers anyway.
What worked (and why)
1. Enable multiple TCP connections with nconnect
Linux 5.3+ and NFS 4.1+ support the nconnect mount option. It stripes one NFS session over several TCP flows. I raised the count until throughput stopped growing; 4 turned out to be the sweet spot on my 10 GbE link.
o: addr=10.0.40.5,rw,nfsvers=4.2,nconnect=4
After a docker compose down && docker compose up the same VACUUM finished in 11 min instead of 47 min. iostat now showed 4–6 outstanding requests and 450–500 MB/s read bursts.
2. Allow asynchronous commits with async
My UPS keeps the VM alive for 20 min; losing the last second of writes is acceptable. I added async to the mount so the server can acknowledge writes as soon as they hit RAM, not rust.
o: addr=10.0.40.5,rw,nfsvers=4.2,nconnect=4,async
pg_test_fsync jumped to 12 000 fsync/s — still not local-SSD territory, but 48 × better than before. Application latency (p95) dropped from 42 ms to 9 ms under a simulated 1 000 inserts/sec load.
3. Keep Postgres settings honest
I left wal_level = replica and fsync = on inside the container; the safety switch is the NFS layer, not the database. If you run async on the server side, fsync = off inside Postgres is redundant and dangerous.
4. Re-export the share read-only for backups
Another VM needs nightly pgBackRest. I created a second TrueNAS share of the same dataset read-only and mounted it ro,nolock on the backup VM. That avoids contention and keeps me from accidentally typing DROP in the wrong place.
Key takeaways
- NFS 4.2 with
nconnectgives almost linear throughput scaling up to the wire speed—if the disk array can feed it. asyncon the server side is the cheapest latency win you can get, but only when you already trust your power and your replication.- Docker Compose volume driver options are passed straight to the kernel; anything you can put in
mount -t nfs -o …works here too. - Measure with the real workload, not with
dd. My 120 MB/s “limit” vanished only when Postgres itself issued parallel reads. - Document the mount string somewhere outside the compose file; the next panic reboot will wipe it from memory.
I still wouldn’t run a latency-critical trading database over this stack, but for my telemetry and home-lab services the tuning above turned “unusable” into “good enough that I stopped thinking about it.”