A Health-Data Pipeline on Unraid + Ductile, Primitives Up
A walkthrough of an event-driven pipeline that joins Garmin and Withings source data into a single daily summary on a NAS. Architecture diagram first, then we work down through the container, the plugin manifest, the host-side YAML, and the trigger envelope — all the way to the row that lands on disk.
No story arc. Each section ends in the file that does the work.
The Architecture — Three Hosts, One Seam
┌─────────────────── Unraid (data layer) ───────────────────┐
│ │
Withings ─┼─► withings plugin ─► event: health.new_data │
API │ │ │
│ ▼ │
│ pipeline: health-summary │
Garmin ─┼─► garmin Docker ──► (file-mtime watcher emits future) │
cron │ │ │ │
│ ▼ ▼ │
│ /healthdata/garmin/ health_data_summary plugin │
│ /healthdata/withings/ │ │
│ │ ▼ │
│ └────► docker run ductile-healthdata │
│ │ │
│ ▼ │
│ /healthdata/healthdata.db ◄── the seam │
└────────────────────────────┼───────────────────────────────┘
│ (NAS-mounted SQLite, read-only consumer view)
▼
┌─────────────────── Thinkpad (reporting layer) ─────────────┐
│ health_weekly_report (cron) ─► fabric ─► email_send │
└────────────────────────────────────────────────────────────┘
Three hosts, one event, one file seam:
- Unraid runs the data layer — source plugins (Withings, Garmin) and the joiner (
health_data_summary). - Thinkpad runs the reporting layer — weekly digest assembled from the joined DB and emailed out.
- Mac is the dev/deploy host — it edits the NAS-mounted clones and ssh-rebuilds containers on Unraid.
The seam between data and reporting is a SQLite file on the NAS (healthdata.db), not an API. The reporting host opens it read-only with mode=ro&immutable=1. No network protocol, no auth, no schema migration coupling — just a file that one host writes and another reads.
The ETL Container — stdlib-only Protocol
ductile-healthdata is a Python container that joins Garmin and Withings DBs. Its Dockerfile is twelve lines, no pip install:
FROM python:3.12-slim
WORKDIR /app
COPY healthdata_etl/run.py /app/run.py
RUN chmod +x /app/run.py
VOLUME ["/app/data/healthdata"]
ENTRYPOINT ["python3", "/app/run.py"]
The protocol is JSON on stdin, JSON on stdout. The container reads one request envelope and exits:
{
"command": "handle",
"config": {
"healthdata_db": "/app/data/healthdata/healthdata.db",
"garmin_summary_db": "/app/data/healthdata/garmin/DBs/garmin_summary.db",
"withings_db": "/app/data/healthdata/withings/withings.db"
},
"event": {
"type": "health.new_data",
"payload": {
"source": "garmin",
"dirty_periods": ["2026-04-26"],
"detected_at": "2026-04-26T07:00:00+10:00"
}
}
}
The response shape includes status, result, events, state_updates, and logs. State updates are a presence-stable snapshot — five keys, every time:
{
"summary_count": 52,
"metric_count": 44,
"pending_updates_total": 2,
"latest_garmin_source_day": "2026-04-25",
"latest_withings_source_date": "2026-04-24 21:19:52"
}
Why stdin/stdout instead of a daemon: the gateway already runs Docker, so a one-shot docker run is the cheapest invocation it has. There’s nothing to keep alive, nothing to monitor, no port to bind. The container starts, reads, joins, writes, exits. ~2 seconds end-to-end on the Unraid host.
The Wrapper Plugin — 44 Lines of Shell
The container doesn’t speak directly to the gateway. A wrapper plugin in ductile-plugins/health_data_summary/ is what the gateway invokes; the wrapper does two things: pass the request through unchanged, and shell out to docker run.
Here is the entire run.sh:
#!/usr/bin/env bash
set -euo pipefail
REQUEST="$(cat)"
IMAGE=$(printf '%s' "$REQUEST" | jq -r '.config.image // "ductile-healthdata:latest"')
HOST_DIR=$(printf '%s' "$REQUEST" | jq -r '.config.host_healthdata_dir // "/mnt/user/Projects/healthdata"')
CONT_DIR=$(printf '%s' "$REQUEST" | jq -r '.config.container_healthdata_dir // "/app/data/healthdata"')
err_response() {
jq -n --arg msg "$1" \
'{status:"error", error:$msg, retry:false, logs:[{level:"error", message:$msg}]}'
}
if ! command -v docker >/dev/null 2>&1; then
err_response "docker CLI not available in plugin runtime"
exit 0
fi
if ! docker image inspect "$IMAGE" >/dev/null 2>&1; then
err_response "ductile-healthdata image not found: $IMAGE — build it from github.com/mattjoyce/ductile-healthdata"
exit 0
fi
exec docker run --rm -i \
-v "$HOST_DIR:$CONT_DIR" \
"$IMAGE" <<<"$REQUEST"
The wrapper buys two things by sitting between the gateway and the container:
- Language-agnostic plugins. The gateway is Go. The container is Python. The wrapper is Bash. The plugin contract is JSON over stdin/stdout, so anything that can read stdin and write JSON works.
- Image lifecycle separation. The container can be rebuilt independently of the gateway. The gateway is rebuilt only when its own source or its plugin manifest set changes.
The Plugin Manifest — What the Gateway Needs to Know
The wrapper plugin’s manifest.yaml declares the contract the gateway enforces:
manifest_spec: ductile.plugin
manifest_version: 1
name: health_data_summary
version: 0.2.0
protocol: 2
entrypoint: run.sh
description: "Join Garmin + Withings DBs into a unified daily_health_summary row per dirty day. Delegates execution to the ductile-healthdata Docker image."
concurrency_safe: false
commands:
- name: handle
type: write
description: "Event-dispatcher entry point. Routes by event.type."
idempotent: true
retry_safe: true
values:
consume:
- payload.source
- payload.dirty_periods
- payload.detected_at
emit:
- event: healthdata.etl.completed
values:
- payload.source
- payload.periods_processed
- payload.metrics_written
- payload.result
- name: health
type: read
description: "Probe DBs and image availability. Emits no state_updates."
idempotent: true
retry_safe: true
values:
consume: []
emit:
- event: healthdata_etl.health
values:
- payload.healthdata_db
- payload.sources
- payload.pending_updates
- payload.checked_at
fact_outputs:
- when:
command: handle
from: state_updates
fact_type: health_data_summary.snapshot
compatibility_view: mirror_object
config_keys:
required:
- healthdata_db
- garmin_summary_db
- withings_db
optional:
- image
- host_healthdata_dir
- container_healthdata_dir
The interesting parts:
commands.handleis the event-dispatcher entry point, not a logical operation likesummarize. The router in this version of ductile dispatchescommand: "handle"for everyuses:step. Plugins that want pipeline-driven invocation declare ahandlecommand and route inside it onevent.type.values.consume/values.emitare the plugin’s value-flow contract — the gateway uses these for static analysis and to validate downstream pipelines.fact_outputstells the gateway: “whenhandlecompletes, take itsstate_updatesblock and store it as a fact of typehealth_data_summary.snapshot.” That snapshot is what other plugins query when they need to know “what’s the latest state of health_data_summary without re-running it?”concurrency_safe: falsebecause the container writes to a SQLite file. The gateway will serialize calls.
Host Registration — plugins.yaml
Discovery alone isn’t enough. A plugin must be registered in plugins.yaml to be enabled, configured, and scheduled. The Unraid host’s entry:
plugins:
health_data_summary:
enabled: true
timeout: 120s
max_attempts: 2
concurrency_safe: false
config:
healthdata_db: /app/data/healthdata/healthdata.db
garmin_summary_db: /app/data/healthdata/garmin/DBs/garmin_summary.db
withings_db: /app/data/healthdata/withings/withings.db
# image: ductile-healthdata:latest (default)
# host_healthdata_dir: /mnt/user/Projects/healthdata (default)
# container_healthdata_dir: /app/data/healthdata (default)
The three required config keys provide the in-container DB paths. The container will see them at /app/data/healthdata/... because the wrapper bind-mounts the host’s /mnt/user/Projects/healthdata into the container at /app/data/healthdata on every invocation. Same paths in the envelope work on any host that follows the convention.
Two subtle things:
timeout: 120sis generous. The ETL itself takes ~2s; the timeout is a safety net for cold-cache cases (large garmin DB read).concurrency_safe: falseis repeated here even though the manifest already says it. The gateway lets host config tighten guarantees but not loosen them.
Event Routing — pipelines.yaml
A pipeline is what binds an event to a plugin invocation. The whole entry is five lines:
pipelines:
- name: health-summary
on: health.new_data
steps:
- id: summarize
uses: health_data_summary
on: is the trigger event. uses: names the plugin to invoke (the gateway implicitly calls handle). id: is the step name that appears in job logs.
That’s the entire wiring. Anything in the system that emits a health.new_data event — the Withings plugin’s poll, a future Garmin watcher, an HTTP trigger — will route through this pipeline.
The Trigger — One Event, Two Hops
withings.poll (every 6h)
│
▼
emit: health.new_data { source: "withings", dirty_periods: [...] }
│
▼
router: match `on: health.new_data`
│
▼
pipeline: health-summary
│
▼
step: summarize → plugin: health_data_summary command: handle
│
▼
wrapper run.sh → docker run ductile-healthdata:latest
│
▼
container: handle_command → _resolve_etl_inputs → withings_etl
│
▼
write rows to /app/data/healthdata/healthdata.db
│
▼
emit: healthdata.etl.completed + state_updates snapshot
│
▼
gateway: store snapshot as fact health_data_summary.snapshot
For a synthetic trigger over the API, the request is wrapped in a payload envelope:
curl -X POST http://localhost:8888/pipeline/health-summary \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"payload":{"source":"garmin","dirty_periods":["2026-04-26"],"detected_at":"2026-04-26T10:30:00+10:00"}}'
The gateway constructs the event from the pipeline’s on: field and the body’s payload, then enqueues a job. The synchronous response is the queued job ID; the result is fetched with GET /job/{id}.
A real trigger run, end to end:
{
"status": "succeeded",
"result": {
"result": "garmin ETL: 2 periods, 52 metrics written",
"events": [{
"type": "healthdata.etl.completed",
"payload": {
"source": "garmin",
"periods_processed": 2,
"metrics_written": 52
}
}],
"state_updates": {
"summary_count": 52,
"metric_count": 44,
"pending_updates_total": 2,
"latest_garmin_source_day": "2026-04-25",
"latest_withings_source_date": "2026-04-24 21:19:52"
}
}
}
The healthdata.db file appears at /mnt/user/Projects/healthdata/healthdata.db on the host. 69 KB after the first run; grows with each new dirty day.
The Output — Fact, Event, Snapshot
Every successful invocation produces three artifacts, each consumed by a different audience:
1. The event (healthdata.etl.completed) is for downstream pipelines that want to react. A future “publish to dashboard” or “post to Discord” pipeline subscribes by adding on: healthdata.etl.completed.
2. The fact (health_data_summary.snapshot) is for plugins that want to query state without invoking. The gateway stores the latest snapshot per plugin in its fact store. A reporting plugin can ask “what was the latest_garmin_source_day at the last summarize?” without paying the cost of a docker invocation.
3. The DB row (daily_health_summary in healthdata.db) is the actual data: one row per day, joining columns from both sources. The reporting layer reads this directly.
The fact and event are operational artifacts the gateway manages. The DB row is the substantive output — what the whole pipeline exists to produce.
The Constraints That Shaped It
The shape of this system isn’t accidental. Five constraints drove the design:
- No Python or git on Unraid for plugin development. Every container build, every plugin edit, every config change happens on the Mac via the NAS-mounted share. The Unraid box only runs
dockerover ssh. - Plugin source is baked into the gateway image at build time via Docker’s
additional_contexts. There is no live plugin reload. After a plugin edit, the gateway image is rebuilt; the running container is recreated. - The NAS share is the build context.
additional_contexts.plugins-extra: /mnt/user/Projects/ductile-pluginsmeans agit pullon the Mac immediately becomes the next gateway build’s source. - Commands are manifest-gated. The router rejects calls to commands the manifest doesn’t declare — even if the underlying code would handle them. Schedules and pipelines must move in lockstep with plugin command renames or the gateway refuses to start.
- The fact store is the read-side projection. Snapshots written under
fact_outputsrules are how plugins share state without coupling to each other’s invocation.
The result is a system where a 6-hour cron trigger on a Withings sync produces a row in a SQLite file on a NAS, and a separate host’s weekly job reads that file to assemble an email — with no shared service, no message bus, and no schema migration coordination between the two halves. Just events, files, and per-plugin snapshots.