Building a Personal Field Recording Pipeline on Unraid
After a morning in the field with a Zoom F3 recorder, I’d come home with a folder of WAV files, a phone camera roll of GPS-tagged photos, and a vague memory of what I’d heard. Processing it all — matching audio to location, running BirdNET, transcribing spoken notes, checking the weather — took long enough that I often just… didn’t.
fram-harness is the fix. Drop a session folder on the NAS, click a button, and get a report.
Work in progress. This project is actively being built. This post covers the design and motivation — a follow-up will cover the build and results.
The Problem
Every field recording session produces:
- WAV files from the Zoom F3, named by time-of-day (
HHMMSS_NNNN.WAV) - Phone photos with GPS EXIF data, taken at the same locations
- Spoken field notes — mumbled into the recorder mid-session
- A date buried in the folder name (
YYMMDD-Location-Detail-Place)
Cross-referencing any of this by hand is tedious. The F3 has no GPS. BirdNET won’t run itself. The weather at the time is a detail I always forget to note.
The gap between “interesting recording session” and “useful documented session” was large enough that most sessions stayed undocumented. That’s the problem worth solving.
The Design
A lightweight web app on the NAS lets me browse session folders, select which analysis stages to run, and submit. Ductile picks it up and hands it to a pipeline container. The pipeline runs the stages in order and writes a report back to the session folder.
Web App (Flask, Unraid)
│
├─ Browse /mnt/field_Recording/F3/Orig/
├─ Session form: stage checkboxes + file picker
└─ On submit:
├─ Write pipeline_config.yaml → session folder
└─ POST to Ductile webhook
Ductile → pipeline-runner container
│
├─ EXIF scan [always] photos → GPS + timestamp map
├─ Transcribe [opt] faster-whisper CUDA → field notes
├─ Refine [opt] Ollama → cleaned text
├─ BirdNET [opt] CUDA inference → species detections
├─ Weather [opt] open-meteo → conditions at time/place
├─ Spectrogram [opt] melspec-to-video → MP4 per file
└─ Report [always] merge → session_report.md + session.json
The output is a session_report.md in the session folder — readable immediately, linkable from Obsidian, archivable.
The Hardware Constraint That Shapes Everything
The NAS has an RTX 2070 (8GB VRAM). Three pipeline stages want the GPU: faster-whisper, BirdNET, and the spectrogram generator. Running them concurrently would cause VRAM contention and unpredictable failures.
The solution is to route GPU jobs through Ductile’s queue — one at a time, serialised. CPU stages (EXIF, weather, report) run independently. This turns a potential mess into a clean dependency graph without needing custom orchestration logic.
GPS Without GPS
The F3 has no GPS. But my phone does — and I take reference photos throughout a session. The pipeline solves this by:
- Scanning all photos for EXIF GPS coordinates and timestamps
- Building a
timestamp → GPSmap - For each WAV file, deriving an absolute datetime from the F3 filename (
HHMMSS) + folder date (YYMMDD) - Finding the nearest photo timestamp and inheriting its GPS coordinates
It’s not surveying accuracy, but it’s accurate enough to place a recording within metres for field note purposes.
The Container Stack
| Container | GPU | Purpose |
|---|---|---|
ollama | yes | LLM inference for field note cleanup |
birdnet-annotations | yes | Bird species detection |
faster-whisper | yes | Transcribe spoken field notes |
pipeline-runner | no | Orchestration, EXIF, weather, report |
fram-web | no | Web UI |
All containers run on the NAS. No cloud. No subscription. The model weights live on local storage.
Reuse from framai
fram-harness shares utility code with framai, an earlier project in the same space:
- EXIF extraction and GPS clustering (
utils/exif.py) - Weather lookups via open-meteo with caching (
weather.py) - Audio header/footer extraction for transcription (
utils/audio.py)
This is the advantage of keeping personal projects in a coherent ecosystem — code written for one context finds a second life in another.
What’s Next
The pipeline design is complete. What remains is building it:
pipeline/f3_parser.py— F3 filename → absolute datetimepipeline/photo_align.py— GPS inheritance from nearest photopipeline/stages/— each analysis stagewebapp/— Flask session browser and form- Docker images for faster-whisper and the pipeline runner
- Ductile webhook configuration
The first real test will be the next field session. The report writes itself, or it doesn’t — no middle ground.
On Automation and Field Work
There’s an argument that automating the analysis removes something from the practice — that the act of reviewing recordings by hand is part of learning to listen. I’ve thought about this.
My counter: I wasn’t reviewing them at all. The gap between session and analysis was producing nothing. An automated first pass that gets data into Obsidian is infinitely better than a manual process I don’t do. The recordings get heard either way; this just makes the threshold for hearing them lower.
The interesting moments still require human ears. fram-harness just makes sure I get to those moments.
And I do still review manually — opening sessions in Audacity, listening through, pulling up the spectrogram view to hunt for signatures I don’t recognise. That part I actually enjoy. The pipeline handles the cataloguing so the time I do spend with the recordings is the interesting part, not the bookkeeping.