GPU-Accelerated Bird Detection on Unraid: From 55x to 379x Realtime
A journey through Python TFLite limitations, Docker gotchas, and a Rust CLI that finally lit up the RTX 2070.
I run an Unraid NAS with an RTX 2070 (8GB VRAM). I also take a Zoom F3 field recorder out for nature recording sessions. After each trip I end up with a folder of 32-bit float WAV files — sometimes hours of recordings — and I want to know what birds were detected in them.
This is one stage of a larger pipeline I’m building (fram-harness) that processes field recording sessions end-to-end. For bird detection, the obvious starting point was birdnet-analyzer — the reference Python implementation from Cornell and Chemnitz University of Technology, 6,000+ species worldwide.
What followed was a three-step journey that ended with 52 minutes of audio being processed in 8 seconds.
Step 1 — The Python Container That Wouldn’t Use the GPU
The first Dockerfile was straightforward:
FROM python:3.11-slim
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
RUN apt-get update && apt-get install -y ffmpeg libsndfile1 \
&& rm -rf /var/lib/apt/lists/*
RUN uv pip install --system --no-cache birdnet-analyzer
Built fine. Ran fine. Produced detections. But nvidia-smi showed the GPU sitting at 30°C drawing 3W — idle.
First theory: missing cuDNN. The nvidia/cuda:12.4.1-runtime-ubuntu22.04 base image doesn’t include it, so I switched to nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04. Rebuilt, re-ran, checked again.
Still 3W.
The real reason
Digging into the birdnet-analyzer source:
os.environ["CUDA_VISIBLE_DEVICES"] = ""
That line is in birdnet_analyzer/model.py. The library explicitly disables CUDA and runs TFLite on CPU by design. This is not a packaging issue, not a driver issue — it’s a deliberate upstream choice. No amount of Docker configuration would change it.
birdnet-analyzer will never use your GPU.
CUDA_VISIBLE_DEVICES = ""is hardcoded in the model code. If you’re building a GPU bird detection pipeline, you need a different tool.
Step 2 — Evaluating the Alternatives
birdnet-go (tphakala/birdnet-go) — 985 stars, Go implementation, Unraid Community Applications template. But it’s a real-time daemon: it expects a live audio source (microphone, RTSP stream) and runs continuous analysis. No batch file processing mode. Also uses TFLite + XNNPACK — no CUDA.
Wrong tool for batch processing field recordings.
birda (tphakala/birda) — same author, different tool. Written in Rust. Uses ONNX Runtime. The README: “GPU Acceleration: Optional CUDA support for faster inference on NVIDIA GPUs.”
| birdnet-analyzer | birda | |
|---|---|---|
| Language | Python | Rust |
| Inference backend | TFLite (CPU only) | ONNX Runtime |
| CUDA | Explicitly disabled | Supported |
| TensorRT | No | Yes (~2x over CUDA) |
| Models | BirdNET v2.4 | v2.4, v3.0, Google Perch v2 |
| Output formats | Raven, CSV | Raven, Audacity, JSON, Parquet, CSV |
| Batch processing | Yes | Yes |
The CUDA release for Linux (birda-linux-x64-cuda-v1.8.0.tar.gz) bundles everything: binary, ONNX Runtime, cuBLAS, cuDNN, cuFFT — no separate CUDA installation needed on the host beyond the NVIDIA driver.
Step 3 — Building the Docker Container
Attempt 1: Ubuntu 22.04 — glibc too old
birda: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found
Ubuntu 22.04 ships with glibc 2.35. birda 1.8.0 requires 2.38. Switch to Ubuntu 24.04 (glibc 2.39). This is a common Rust binary footgun — when a new binary fails with GLIBC_X.XX not found, reach for Ubuntu 24.04.
Attempt 2: Missing libcurand
Ubuntu 24.04 built fine. Model installed. First run:
ERROR ort::ep: Failed to load libonnxruntime_providers_cuda.so:
libcurand.so.10: cannot open shared object file: No such file or directory
WARN: No execution providers registered successfully; may fall back to CPU.
The birda CUDA bundle includes most CUDA libraries — cuBLAS, cuDNN, cuFFT — but not cuRAND. ONNX Runtime’s CUDA execution provider needs it.
The fix: install cuRAND from NVIDIA’s apt repo. But the package name isn’t obvious. Query the repo first:
curl -s "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/Packages" \
| grep "^Package: libcurand"
Package: libcurand-12-5
Package: libcurand-12-6
Package: libcurand-12-8
Package: libcurand-12-9
apt package names require exact minor versions.
libcurand-12doesn’t exist.libcurand-12-6does. Always query the repo rather than guessing — the version suffix changes with CUDA releases.
The working Dockerfile
FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive
# CUDA apt repo for cuRAND (required by ONNX CUDA provider)
RUN apt-get update && apt-get install -y curl && \
curl -sL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb \
-o /tmp/cuda-keyring.deb && \
dpkg -i /tmp/cuda-keyring.deb && \
apt-get update && apt-get install -y \
libsndfile1 ffmpeg libcurand-12-6 \
&& rm -rf /var/lib/apt/lists/* /tmp/cuda-keyring.deb
# Install birda + CUDA/cuDNN/ONNX libs (self-contained bundle)
RUN mkdir -p /app/birda && \
curl -sL https://github.com/tphakala/birda/releases/download/v1.8.0/birda-linux-x64-cuda-v1.8.0.tar.gz \
| tar -xz -C /app/birda/
ENV LD_LIBRARY_PATH=/app/birda
ENV PATH="/app/birda:$PATH"
# Pre-install birdnet-v24 model
RUN birda models install birdnet-v24
ENTRYPOINT ["birda"]
docker-compose.yml on Unraid:
services:
birda:
build: .
image: birda
container_name: birda
volumes:
- /mnt/user/field_Recording:/data/field_Recording
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
The Results
CPU (before GPU fix)
INFO birda::pipeline::processor: Processed 1048 segments in 52.45s
(20.0 segments/sec, 59.9x realtime)
INFO birda: Complete: 1 processed, 714 total detections in 57.06s
55x realtime on CPU. Already significantly faster than birdnet-analyzer.
GPU — RTX 2070, CUDA
INFO birda::pipeline::processor: Processed 1048 segments in 8.29s
(126.5 segments/sec, 379.5x realtime)
INFO birda: Complete: 1 processed, 714 total detections in 14.21s
379x realtime. 52 minutes of audio processed in 8 seconds of inference.
7x speedup over CPU. The RTX 2070 went from 3W idle to active inference load. The difference between “run overnight” and “done before coffee” is not trivial when you’re trying to build a habit of actually processing recordings.
Running It
docker run --rm --gpus all \
-v /mnt/user/field_Recording:/data/field_Recording \
birda --gpu -m birdnet-v24 \
/data/field_Recording/F3/Orig/260329-NationalPark-ForestPath-Calala/290326_001.WAV \
--lat -30.5 --lon 151.6 \
--format raven \
-v
Output is a Raven selection table (.BirdNET.selection.table.txt) written next to the input file — same format as birdnet-analyzer, compatible with existing downstream parsers.
What’s Next
This container is one stage in fram-harness. birda’s JSON output mode (--format json) and batch directory processing will integrate cleanly with the pipeline orchestrator.
TensorRT support (--tensorrt) is also available for another ~2x speedup on top of CUDA — not bundled due to size, requires the TensorRT runtime from NVIDIA separately.
Key Takeaways
- birdnet-analyzer explicitly disables CUDA — hardcoded in
model.py, not a config issue - birda’s CUDA bundle is self-contained — one tarball covers everything except cuRAND
- Ubuntu 24.04 for recent Rust binaries — glibc 2.38+ requirement rules out 22.04
- Query apt before guessing package names —
libcurand-12doesn’t exist;libcurand-12-6does - A consumer GPU from 2019 is fast enough — 379x realtime on an RTX 2070 makes batch processing trivial