A Health-Data Pipeline on Unraid + Ductile, Primitives Up

homelabunraidductiledockersqlitehealth-dataevent-drivenpipelines

A walkthrough of an event-driven pipeline that joins Garmin and Withings source data into a single daily summary on a NAS. Architecture diagram first, then we work down through the container, the plugin manifest, the host-side YAML, and the trigger envelope — all the way to the row that lands on disk.

No story arc. Each section ends in the file that does the work.


The Architecture — Three Hosts, One Seam

            ┌─────────────────── Unraid (data layer) ───────────────────┐
            │                                                            │
  Withings ─┼─► withings plugin ─► event: health.new_data                │
   API      │                            │                               │
            │                            ▼                               │
            │                   pipeline: health-summary                 │
  Garmin   ─┼─► garmin Docker ──► (file-mtime watcher emits future)      │
   cron     │           │                │                               │
            │           ▼                ▼                               │
            │  /healthdata/garmin/  health_data_summary plugin           │
            │  /healthdata/withings/      │                              │
            │           │                 ▼                              │
            │           └────► docker run ductile-healthdata             │
            │                            │                               │
            │                            ▼                               │
            │              /healthdata/healthdata.db ◄── the seam        │
            └────────────────────────────┼───────────────────────────────┘
                                         │ (NAS-mounted SQLite, read-only consumer view)

            ┌─────────────────── Thinkpad (reporting layer) ─────────────┐
            │  health_weekly_report (cron) ─► fabric ─► email_send       │
            └────────────────────────────────────────────────────────────┘

Three hosts, one event, one file seam:

  • Unraid runs the data layer — source plugins (Withings, Garmin) and the joiner (health_data_summary).
  • Thinkpad runs the reporting layer — weekly digest assembled from the joined DB and emailed out.
  • Mac is the dev/deploy host — it edits the NAS-mounted clones and ssh-rebuilds containers on Unraid.

The seam between data and reporting is a SQLite file on the NAS (healthdata.db), not an API. The reporting host opens it read-only with mode=ro&immutable=1. No network protocol, no auth, no schema migration coupling — just a file that one host writes and another reads.


The ETL Container — stdlib-only Protocol

ductile-healthdata is a Python container that joins Garmin and Withings DBs. Its Dockerfile is twelve lines, no pip install:

FROM python:3.12-slim

WORKDIR /app
COPY healthdata_etl/run.py /app/run.py
RUN chmod +x /app/run.py

VOLUME ["/app/data/healthdata"]

ENTRYPOINT ["python3", "/app/run.py"]

The protocol is JSON on stdin, JSON on stdout. The container reads one request envelope and exits:

{
  "command": "handle",
  "config": {
    "healthdata_db":      "/app/data/healthdata/healthdata.db",
    "garmin_summary_db":  "/app/data/healthdata/garmin/DBs/garmin_summary.db",
    "withings_db":        "/app/data/healthdata/withings/withings.db"
  },
  "event": {
    "type": "health.new_data",
    "payload": {
      "source": "garmin",
      "dirty_periods": ["2026-04-26"],
      "detected_at": "2026-04-26T07:00:00+10:00"
    }
  }
}

The response shape includes status, result, events, state_updates, and logs. State updates are a presence-stable snapshot — five keys, every time:

{
  "summary_count": 52,
  "metric_count": 44,
  "pending_updates_total": 2,
  "latest_garmin_source_day": "2026-04-25",
  "latest_withings_source_date": "2026-04-24 21:19:52"
}

Why stdin/stdout instead of a daemon: the gateway already runs Docker, so a one-shot docker run is the cheapest invocation it has. There’s nothing to keep alive, nothing to monitor, no port to bind. The container starts, reads, joins, writes, exits. ~2 seconds end-to-end on the Unraid host.


The Wrapper Plugin — 44 Lines of Shell

The container doesn’t speak directly to the gateway. A wrapper plugin in ductile-plugins/health_data_summary/ is what the gateway invokes; the wrapper does two things: pass the request through unchanged, and shell out to docker run.

Here is the entire run.sh:

#!/usr/bin/env bash
set -euo pipefail

REQUEST="$(cat)"

IMAGE=$(printf '%s' "$REQUEST" | jq -r '.config.image // "ductile-healthdata:latest"')
HOST_DIR=$(printf '%s' "$REQUEST" | jq -r '.config.host_healthdata_dir // "/mnt/user/Projects/healthdata"')
CONT_DIR=$(printf '%s' "$REQUEST" | jq -r '.config.container_healthdata_dir // "/app/data/healthdata"')

err_response() {
  jq -n --arg msg "$1" \
    '{status:"error", error:$msg, retry:false, logs:[{level:"error", message:$msg}]}'
}

if ! command -v docker >/dev/null 2>&1; then
  err_response "docker CLI not available in plugin runtime"
  exit 0
fi

if ! docker image inspect "$IMAGE" >/dev/null 2>&1; then
  err_response "ductile-healthdata image not found: $IMAGE — build it from github.com/mattjoyce/ductile-healthdata"
  exit 0
fi

exec docker run --rm -i \
  -v "$HOST_DIR:$CONT_DIR" \
  "$IMAGE" <<<"$REQUEST"

The wrapper buys two things by sitting between the gateway and the container:

  1. Language-agnostic plugins. The gateway is Go. The container is Python. The wrapper is Bash. The plugin contract is JSON over stdin/stdout, so anything that can read stdin and write JSON works.
  2. Image lifecycle separation. The container can be rebuilt independently of the gateway. The gateway is rebuilt only when its own source or its plugin manifest set changes.

The Plugin Manifest — What the Gateway Needs to Know

The wrapper plugin’s manifest.yaml declares the contract the gateway enforces:

manifest_spec: ductile.plugin
manifest_version: 1
name: health_data_summary
version: 0.2.0
protocol: 2
entrypoint: run.sh
description: "Join Garmin + Withings DBs into a unified daily_health_summary row per dirty day. Delegates execution to the ductile-healthdata Docker image."
concurrency_safe: false

commands:
  - name: handle
    type: write
    description: "Event-dispatcher entry point. Routes by event.type."
    idempotent: true
    retry_safe: true
    values:
      consume:
        - payload.source
        - payload.dirty_periods
        - payload.detected_at
      emit:
        - event: healthdata.etl.completed
          values:
            - payload.source
            - payload.periods_processed
            - payload.metrics_written
            - payload.result

  - name: health
    type: read
    description: "Probe DBs and image availability. Emits no state_updates."
    idempotent: true
    retry_safe: true
    values:
      consume: []
      emit:
        - event: healthdata_etl.health
          values:
            - payload.healthdata_db
            - payload.sources
            - payload.pending_updates
            - payload.checked_at

fact_outputs:
  - when:
      command: handle
    from: state_updates
    fact_type: health_data_summary.snapshot
    compatibility_view: mirror_object

config_keys:
  required:
    - healthdata_db
    - garmin_summary_db
    - withings_db
  optional:
    - image
    - host_healthdata_dir
    - container_healthdata_dir

The interesting parts:

  • commands.handle is the event-dispatcher entry point, not a logical operation like summarize. The router in this version of ductile dispatches command: "handle" for every uses: step. Plugins that want pipeline-driven invocation declare a handle command and route inside it on event.type.
  • values.consume / values.emit are the plugin’s value-flow contract — the gateway uses these for static analysis and to validate downstream pipelines.
  • fact_outputs tells the gateway: “when handle completes, take its state_updates block and store it as a fact of type health_data_summary.snapshot.” That snapshot is what other plugins query when they need to know “what’s the latest state of health_data_summary without re-running it?”
  • concurrency_safe: false because the container writes to a SQLite file. The gateway will serialize calls.

Host Registration — plugins.yaml

Discovery alone isn’t enough. A plugin must be registered in plugins.yaml to be enabled, configured, and scheduled. The Unraid host’s entry:

plugins:
  health_data_summary:
    enabled: true
    timeout: 120s
    max_attempts: 2
    concurrency_safe: false
    config:
      healthdata_db: /app/data/healthdata/healthdata.db
      garmin_summary_db: /app/data/healthdata/garmin/DBs/garmin_summary.db
      withings_db: /app/data/healthdata/withings/withings.db
      # image: ductile-healthdata:latest         (default)
      # host_healthdata_dir: /mnt/user/Projects/healthdata  (default)
      # container_healthdata_dir: /app/data/healthdata     (default)

The three required config keys provide the in-container DB paths. The container will see them at /app/data/healthdata/... because the wrapper bind-mounts the host’s /mnt/user/Projects/healthdata into the container at /app/data/healthdata on every invocation. Same paths in the envelope work on any host that follows the convention.

Two subtle things:

  • timeout: 120s is generous. The ETL itself takes ~2s; the timeout is a safety net for cold-cache cases (large garmin DB read).
  • concurrency_safe: false is repeated here even though the manifest already says it. The gateway lets host config tighten guarantees but not loosen them.

Event Routing — pipelines.yaml

A pipeline is what binds an event to a plugin invocation. The whole entry is five lines:

pipelines:
  - name: health-summary
    on: health.new_data
    steps:
      - id: summarize
        uses: health_data_summary

on: is the trigger event. uses: names the plugin to invoke (the gateway implicitly calls handle). id: is the step name that appears in job logs.

That’s the entire wiring. Anything in the system that emits a health.new_data event — the Withings plugin’s poll, a future Garmin watcher, an HTTP trigger — will route through this pipeline.


The Trigger — One Event, Two Hops

   withings.poll  (every 6h)


   emit: health.new_data { source: "withings", dirty_periods: [...] }


   router: match `on: health.new_data`


   pipeline: health-summary


   step: summarize  →  plugin: health_data_summary  command: handle


   wrapper run.sh  →  docker run ductile-healthdata:latest


   container: handle_command → _resolve_etl_inputs → withings_etl


   write rows to /app/data/healthdata/healthdata.db


   emit: healthdata.etl.completed + state_updates snapshot


   gateway: store snapshot as fact health_data_summary.snapshot

For a synthetic trigger over the API, the request is wrapped in a payload envelope:

curl -X POST http://localhost:8888/pipeline/health-summary \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"payload":{"source":"garmin","dirty_periods":["2026-04-26"],"detected_at":"2026-04-26T10:30:00+10:00"}}'

The gateway constructs the event from the pipeline’s on: field and the body’s payload, then enqueues a job. The synchronous response is the queued job ID; the result is fetched with GET /job/{id}.

A real trigger run, end to end:

{
  "status": "succeeded",
  "result": {
    "result": "garmin ETL: 2 periods, 52 metrics written",
    "events": [{
      "type": "healthdata.etl.completed",
      "payload": {
        "source": "garmin",
        "periods_processed": 2,
        "metrics_written": 52
      }
    }],
    "state_updates": {
      "summary_count": 52,
      "metric_count": 44,
      "pending_updates_total": 2,
      "latest_garmin_source_day": "2026-04-25",
      "latest_withings_source_date": "2026-04-24 21:19:52"
    }
  }
}

The healthdata.db file appears at /mnt/user/Projects/healthdata/healthdata.db on the host. 69 KB after the first run; grows with each new dirty day.


The Output — Fact, Event, Snapshot

Every successful invocation produces three artifacts, each consumed by a different audience:

1. The event (healthdata.etl.completed) is for downstream pipelines that want to react. A future “publish to dashboard” or “post to Discord” pipeline subscribes by adding on: healthdata.etl.completed.

2. The fact (health_data_summary.snapshot) is for plugins that want to query state without invoking. The gateway stores the latest snapshot per plugin in its fact store. A reporting plugin can ask “what was the latest_garmin_source_day at the last summarize?” without paying the cost of a docker invocation.

3. The DB row (daily_health_summary in healthdata.db) is the actual data: one row per day, joining columns from both sources. The reporting layer reads this directly.

The fact and event are operational artifacts the gateway manages. The DB row is the substantive output — what the whole pipeline exists to produce.


The Constraints That Shaped It

The shape of this system isn’t accidental. Five constraints drove the design:

  • No Python or git on Unraid for plugin development. Every container build, every plugin edit, every config change happens on the Mac via the NAS-mounted share. The Unraid box only runs docker over ssh.
  • Plugin source is baked into the gateway image at build time via Docker’s additional_contexts. There is no live plugin reload. After a plugin edit, the gateway image is rebuilt; the running container is recreated.
  • The NAS share is the build context. additional_contexts.plugins-extra: /mnt/user/Projects/ductile-plugins means a git pull on the Mac immediately becomes the next gateway build’s source.
  • Commands are manifest-gated. The router rejects calls to commands the manifest doesn’t declare — even if the underlying code would handle them. Schedules and pipelines must move in lockstep with plugin command renames or the gateway refuses to start.
  • The fact store is the read-side projection. Snapshots written under fact_outputs rules are how plugins share state without coupling to each other’s invocation.

The result is a system where a 6-hour cron trigger on a Withings sync produces a row in a SQLite file on a NAS, and a separate host’s weekly job reads that file to assemble an email — with no shared service, no message bus, and no schema migration coordination between the two halves. Just events, files, and per-plugin snapshots.