GPU-Accelerated Bird Detection on Unraid: From 55x to 379x Realtime

6 April 2026

homelabunraidbirdnetdockercudagpurustfield-recording

A journey through Python TFLite limitations, Docker gotchas, and a Rust CLI that finally lit up the RTX 2070.

I run an Unraid NAS with an RTX 2070 (8GB VRAM). I also take a Zoom F3 field recorder out for nature recording sessions. After each trip I end up with a folder of 32-bit float WAV files — sometimes hours of recordings — and I want to know what birds were detected in them.

This is one stage of a larger pipeline I’m building (fram-harness) that processes field recording sessions end-to-end. For bird detection, the obvious starting point was birdnet-analyzer — the reference Python implementation from Cornell and Chemnitz University of Technology, 6,000+ species worldwide.

What followed was a three-step journey that ended with 52 minutes of audio being processed in 8 seconds.

Step 1 — The Python Container That Wouldn’t Use the GPU

The first Dockerfile was straightforward:

FROM python:3.11-slim

COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

RUN apt-get update && apt-get install -y ffmpeg libsndfile1 \
    && rm -rf /var/lib/apt/lists/*

RUN uv pip install --system --no-cache birdnet-analyzer

Built fine. Ran fine. Produced detections. But nvidia-smi showed the GPU sitting at 30°C drawing 3W — idle.

First theory: missing cuDNN. The nvidia/cuda:12.4.1-runtime-ubuntu22.04 base image doesn’t include it, so I switched to nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04. Rebuilt, re-ran, checked again.

Still 3W.

The real reason

Digging into the birdnet-analyzer source:

os.environ["CUDA_VISIBLE_DEVICES"] = ""

That line is in birdnet_analyzer/model.py. The library explicitly disables CUDA and runs TFLite on CPU by design. This is not a packaging issue, not a driver issue — it’s a deliberate upstream choice. No amount of Docker configuration would change it.

birdnet-analyzer will never use your GPU. CUDA_VISIBLE_DEVICES = "" is hardcoded in the model code. If you’re building a GPU bird detection pipeline, you need a different tool.

Step 2 — Evaluating the Alternatives

birdnet-go (tphakala/birdnet-go) — 985 stars, Go implementation, Unraid Community Applications template. But it’s a real-time daemon: it expects a live audio source (microphone, RTSP stream) and runs continuous analysis. No batch file processing mode. Also uses TFLite + XNNPACK — no CUDA.

Wrong tool for batch processing field recordings.

birda (tphakala/birda) — same author, different tool. Written in Rust. Uses ONNX Runtime. The README: “GPU Acceleration: Optional CUDA support for faster inference on NVIDIA GPUs.”

	birdnet-analyzer	birda
Language	Python	Rust
Inference backend	TFLite (CPU only)	ONNX Runtime
CUDA	Explicitly disabled	Supported
TensorRT	No	Yes (~2x over CUDA)
Models	BirdNET v2.4	v2.4, v3.0, Google Perch v2
Output formats	Raven, CSV	Raven, Audacity, JSON, Parquet, CSV
Batch processing	Yes	Yes

The CUDA release for Linux (birda-linux-x64-cuda-v1.8.0.tar.gz) bundles everything: binary, ONNX Runtime, cuBLAS, cuDNN, cuFFT — no separate CUDA installation needed on the host beyond the NVIDIA driver.

Step 3 — Building the Docker Container

Attempt 1: Ubuntu 22.04 — glibc too old

birda: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found

Ubuntu 22.04 ships with glibc 2.35. birda 1.8.0 requires 2.38. Switch to Ubuntu 24.04 (glibc 2.39). This is a common Rust binary footgun — when a new binary fails with GLIBC_X.XX not found, reach for Ubuntu 24.04.

Attempt 2: Missing libcurand

Ubuntu 24.04 built fine. Model installed. First run:

ERROR ort::ep: Failed to load libonnxruntime_providers_cuda.so:
  libcurand.so.10: cannot open shared object file: No such file or directory
WARN: No execution providers registered successfully; may fall back to CPU.

The birda CUDA bundle includes most CUDA libraries — cuBLAS, cuDNN, cuFFT — but not cuRAND. ONNX Runtime’s CUDA execution provider needs it.

The fix: install cuRAND from NVIDIA’s apt repo. But the package name isn’t obvious. Query the repo first:

curl -s "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/Packages" \
  | grep "^Package: libcurand"

Package: libcurand-12-5
Package: libcurand-12-6
Package: libcurand-12-8
Package: libcurand-12-9

apt package names require exact minor versions. libcurand-12 doesn’t exist. libcurand-12-6 does. Always query the repo rather than guessing — the version suffix changes with CUDA releases.

The working Dockerfile

FROM ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive

# CUDA apt repo for cuRAND (required by ONNX CUDA provider)
RUN apt-get update && apt-get install -y curl && \
    curl -sL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb \
        -o /tmp/cuda-keyring.deb && \
    dpkg -i /tmp/cuda-keyring.deb && \
    apt-get update && apt-get install -y \
        libsndfile1 ffmpeg libcurand-12-6 \
    && rm -rf /var/lib/apt/lists/* /tmp/cuda-keyring.deb

# Install birda + CUDA/cuDNN/ONNX libs (self-contained bundle)
RUN mkdir -p /app/birda && \
    curl -sL https://github.com/tphakala/birda/releases/download/v1.8.0/birda-linux-x64-cuda-v1.8.0.tar.gz \
    | tar -xz -C /app/birda/

ENV LD_LIBRARY_PATH=/app/birda
ENV PATH="/app/birda:$PATH"

# Pre-install birdnet-v24 model
RUN birda models install birdnet-v24

ENTRYPOINT ["birda"]

docker-compose.yml on Unraid:

services:
  birda:
    build: .
    image: birda
    container_name: birda
    volumes:
      - /mnt/user/field_Recording:/data/field_Recording
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

The Results

CPU (before GPU fix)

INFO birda::pipeline::processor: Processed 1048 segments in 52.45s
  (20.0 segments/sec, 59.9x realtime)
INFO birda: Complete: 1 processed, 714 total detections in 57.06s

55x realtime on CPU. Already significantly faster than birdnet-analyzer.

GPU — RTX 2070, CUDA

INFO birda::pipeline::processor: Processed 1048 segments in 8.29s
  (126.5 segments/sec, 379.5x realtime)
INFO birda: Complete: 1 processed, 714 total detections in 14.21s

379x realtime. 52 minutes of audio processed in 8 seconds of inference.

7x speedup over CPU. The RTX 2070 went from 3W idle to active inference load. The difference between “run overnight” and “done before coffee” is not trivial when you’re trying to build a habit of actually processing recordings.

Running It

docker run --rm --gpus all \
  -v /mnt/user/field_Recording:/data/field_Recording \
  birda --gpu -m birdnet-v24 \
  /data/field_Recording/F3/Orig/260329-NationalPark-ForestPath-Calala/290326_001.WAV \
  --lat -30.5 --lon 151.6 \
  --format raven \
  -v

Output is a Raven selection table (.BirdNET.selection.table.txt) written next to the input file — same format as birdnet-analyzer, compatible with existing downstream parsers.

What’s Next

This container is one stage in fram-harness. birda’s JSON output mode (--format json) and batch directory processing will integrate cleanly with the pipeline orchestrator.

TensorRT support (--tensorrt) is also available for another ~2x speedup on top of CUDA — not bundled due to size, requires the TensorRT runtime from NVIDIA separately.

Key Takeaways

birdnet-analyzer explicitly disables CUDA — hardcoded in model.py, not a config issue
birda’s CUDA bundle is self-contained — one tarball covers everything except cuRAND
Ubuntu 24.04 for recent Rust binaries — glibc 2.38+ requirement rules out 22.04
Query apt before guessing package names — libcurand-12 doesn’t exist; libcurand-12-6 does
A consumer GPU from 2019 is fast enough — 379x realtime on an RTX 2070 makes batch processing trivial