Tech Expert & Vibe Coder

With 14+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Building a Self-hosted Icloud Photos Downloader Pipeline:  Automating Batch Exports to Immich with Metadata Preservation and Duplicate Detection

Why I Built This Pipeline

I run Immich on a Raspberry Pi as a private photo sharing platform for family and friends. I don't use it as my primary photo organizer—that's still Apple Photos on my Mac. Immich is purely my self-hosted replacement for iCloud sharing, with the benefit of unlimited storage on my own hardware.

The problem I faced was getting my existing Apple Photos library into Immich without killing my Pi. When I first tried dumping thousands of photos at once, the server would freeze. The CPU would spike, containers would become unresponsive, and I'd have to hard reboot. I needed a way to export photos from Apple Photos with all their metadata and album structure intact, then feed them to Immich slowly enough that a Raspberry Pi could handle the processing.

My Setup

I'm running Immich in Docker containers on a Raspberry Pi 4 with an external USB drive for storage. On the Mac side, I have my Photos library that I continue to use normally. The pipeline I built has three distinct stages:

  • Export from Apple Photos on my Mac using osxphotos
  • Rsync the exported files to a staging directory on the Pi
  • Slowly move files from staging into Immich's external library directory

The key architectural decision was to keep the staging directory separate from Immich's external library. This gives me control over the ingestion rate and creates a clean separation between "files waiting to be processed" and "files Immich is actively managing."

Exporting from Apple Photos

I use osxphotos, a Python tool that can read the Apple Photos database directly and export photos with their metadata preserved. I installed it in a Python virtual environment on my Mac:

python3 -m venv ~/.python_venv/osxphotos-env
source ~/.python_venv/osxphotos-env/bin/activate
python3 -m pip install osxphotos

I wrote a wrapper script that handles the export process. It exports photos organized by album, preserves metadata, and handles special characters in album names. I also filter out certain raw formats I don't want in Immich—specifically .orf files from an old Olympus camera that Immich doesn't handle well.

The export creates a directory structure where each album becomes a folder. After the export completes, the script uses rsync to transfer everything to the Pi's staging directory. I use rsync because it's resumable and only transfers what's changed.

The Controlled Import Script

This is where the real problem-solving happened. My first attempts at bulk importing would overwhelm the Pi completely. I tried limiting Immich's job concurrency to 1, reducing container resources, and it still wasn't enough.

I wrote a bash script that moves files from the staging directory to Immich's external library in small batches—15 files at a time by default. After each batch, it triggers Immich's library scan API and then monitors CPU load. The script waits until CPU usage drops below a threshold before processing the next batch.

The CPU monitoring was critical. Without it, Immich would queue up processing jobs faster than it could complete them, and the whole system would grind to a halt. By watching CPU load and only feeding it more files when it's ready, the Pi stays responsive.

I use the -n flag with the copy/move commands, which prevents overwriting existing files. This acts as basic duplicate detection—if a file already exists in the destination, it's skipped. It's not sophisticated, but it works for my use case where I'm not dealing with renamed duplicates.

Album Creation

Once photos are in Immich's external library, I need to recreate the album structure. Immich has an external library feature that scans directories, but it doesn't automatically create albums from folder names.

I use a Docker container called immich-folder-album-creator that does exactly this. It reads the directory structure and creates corresponding albums in Immich via the API. I run it after the import is complete and the server load has settled down:

docker run \
  -e SYNC_MODE="1" \
  -e UNATTENDED="1" \
  -e API_URL="http://192.168.1.x:2283/api/" \
  -e API_KEY="my-api-key" \
  -e ROOT_PATH="/usr/src/app/external" \
  salvoxia/immich-folder-album-creator:latest \
  /script/immich_auto_album.sh

I had to get my Immich library ID first using a curl command to the API, then add that to my import script's configuration.

What Didn't Work

My first approach was to point Immich directly at the rsync destination and let it scan everything at once. The Pi froze within minutes. Even after I reduced job concurrency and container memory limits, it couldn't handle the load.

I tried using Immich's CLI upload tool, but it was designed for uploading to the main library, not external libraries. The external library feature was what I needed, but it required this batch-processing approach to be viable on a Pi.

I also initially tried to handle duplicate detection by comparing file hashes, but that added too much overhead. The simple "skip if file exists" approach using cp -n was good enough and didn't slow things down.

The Staging Directory Strategy

Keeping a separate staging directory turned out to be more useful than I expected. It means I have a clean copy of all my exported photos that Immich hasn't touched. If I need to reinstall Immich or something goes wrong with the database, I can point a fresh instance at the external library directory and it will rebuild. And if that directory gets corrupted, I still have the staging directory as a backup.

It also makes the pipeline resumable. If the import script stops for any reason, I can restart it and it picks up where it left off.

What I Learned

Running Immich on a Raspberry Pi is possible, but you have to respect its limitations. Batch processing with CPU monitoring is essential. Without throttling, the Pi will accept more work than it can handle and become unresponsive.

The separation between export, staging, and import stages makes the whole system more maintainable. Each stage can fail independently without corrupting the others.

Metadata preservation matters. Using osxphotos instead of just copying files from the Photos library means I get proper dates, locations, and album associations. Immich can then use that metadata for search and organization.

Simple duplicate detection is often sufficient. I don't need perfect deduplication—I just need to avoid reprocessing the same files on subsequent runs.