0007 — the libav-direct engine + the ffmpeg-wasi project¶
Status: DRAFT / SCOPING (the design for the WASM media engine afmpeg drives. Pivots away from "compile the ffmpeg CLI to wasm" — see §1. Supersedes spec 0002. Review before building.) Date: 2026-06-28 Parent: 0001-afmpeg.md; supersedes 0002-wasm-build-pipeline.md Owns: R-AF-3 (the codec/filter set, reframed), R-AF-6 (reproducible build), R-AF-10 (licence variants)
1. Why this supersedes 0002 (the pivot)¶
Spec 0002 assumed we'd compile the ffmpeg CLI to wasm (adapting go-ffmpreg). Research (recorded in 0001 §9 and the session decision log) found that path is a dead end for a security-first, current-FFmpeg, CGO-free tool:
- go-ffmpreg pins FFmpeg n5.1, which is end-of-life (the 5.1 branch ended at n5.1.10; current is n8.x). Shipping an EOL media decoder is a non-starter.
- FFmpeg 7.0+ rewrote the
ffmpegCLI to be multithreaded (fftools/ffmpeg_sched.c;ffmpeg_deps="… threads"). It cannot be built without threads, and wazero implements the threads instructions (atomics) but notwasi-threadsthread-*spawn* — so the modern CLI can't run on wazero. Switching to a thread-capable runtime (Wasmtime, etc.) means CGO, which forfeits afmpeg's entire reason to exist (R-AF-1).
The way under the wall: the threading is only in fftools (the CLI). The libav*
libraries build to wasm32-wasi single-threaded with no trouble. So instead of the
CLI, we link the libraries directly and drive them with our own thin C program. This was
spike-proven (2026-06-28): FFmpeg n8.1.2 libav* compiled to wasm, a C driver
linked them, and it ran under wazero (with afmpeg's existing setjmp/longjmp env module),
reporting version_info: n8.1.2 and a working codec/muxer/filter API.
This gives us current FFmpeg, on pure-Go wazero, CGO-free — and makes us the reference for server-side (WASI) FFmpeg, since the CLI wall stops everyone else.
2. Two repositories (the architecture)¶
┌─ ffmpeg-wasi (NEW repo) ───────────────────────────┐ ┌─ afmpeg (this repo, permissive) ──────┐
│ libav* (current FFmpeg) + codec deps │ │ Runtime.Run(fs, args…) + vfs + env │
│ + OUR C driver: a JSON job-spec → libav │ ───► │ Command builder → driver job spec │
│ (demux/decode/filter/encode/mux), single-thread │ .wasm│ WithModuleURL(<artifact>, WithSHA256) │
│ → driver.wasm release artifact (LGPL + GPL) │ │ (Run stays generic for any module) │
└─────────────────────────────────────────────────────┘ └────────────────────────────────────────┘
The interface between them is the job-spec vocabulary (§4) — a versioned compatibility surface, not "arbitrary ffmpeg CLI args."
3. The ffmpeg-wasi engine¶
- Libraries: current FFmpeg (target n8.x),
libavformat/libavcodec/libavfilter/ libavutil/libswscale/libswresample, compiledwasm32-wasi, single-threaded (--disable-pthreads,--disable-asm), via wasi-sdk/clang with the LLVM native setjmp/longjmp lowering (-mllvm -wasm-enable-sjlj, which emits the__wasm_setjmp/__wasm_longjmpimports afmpeg's runtime already provides — spec 0004 R-0004-9). - The driver (
driver.c, our code, MIT): a WASI command. It reads a job spec (§4), opens inputs/outputs against the mounted WASI fs, builds the filtergraph viaavfilter_graph_parse2, configures decoders/encoders, runs the processing loop, and exits. No CLI, no threads. Errors → stderr + non-zero exit; probe results → stdout JSON. - The artifact:
driver.wasm(+.gz), published per §7.
4. The vocabulary (the contract) — clean & custom (D-FW-B, RESOLVED)¶
Not an ffmpeg-CLI subset (that would be a leaky, partial-compat trap). A structured job
spec the driver reads — mirroring afmpeg's existing Command model, with the one place
reinvention is folly (the filter graph) delegated to libav's own parser:
// operation "process": transcode / filter / mux
{
"op": "process",
"inputs": [ { "path": "in/clip.mp4", "options": { "…": "…" } } ],
"filter": "[0:v]scale=1280:-2[v]", // ffmpeg filtergraph STRING (libavfilter parses it)
"outputs": [ { "path": "out/x.mp4",
"map": ["[v]"],
"video_codec": "libx264", "audio_codec": "aac",
"options": { "crf": "23", "movflags": "+faststart" } } ]
}
// operation "probe": report stream info (duration, codecs, …) as JSON on stdout
{ "op": "probe", "inputs": [ { "path": "in/clip.mp4" } ] }
- Structured inputs/outputs/codecs/maps/options — typed, validated, exactly its capabilities (no "looks-like-ffmpeg-but-isn't"). We own and document this surface.
- The
filterstring is ffmpeg's filtergraph syntax — we delegate toavfilter_graph_parse2; reinventing that DSL would be folly, and users' filtergraph knowledge transfers there. - Transport (D to confirm in impl): the spec is passed as a single argv argument, or
written to a file in the vfs the driver reads. afmpeg's
Run/vfs already carry it. - This is afmpeg's
Command(Inputs / FilterComplex string / Outputs) — so afmpeg's builder serialises to the job spec rather than to CLI args; no model redesign.
5. Licensing — MIT tooling, both LGPL+GPL artifacts shipped (D-FW-C, RESOLVED pending review)¶
Three licences, kept deliberately distinct:
- The ffmpeg-wasi repo source — build tooling +
driver.c— is MIT (ours). Clean-room (the libav-direct pivot means we don't use go-ffmpreg's 40 KB GPL CLI patch); it vendors no FFmpeg/x264 source (cloned at build time); and — the fact that keeps it MIT — the tooling never links libav/x264, it only orchestrates the build (clone → configure → compile → package), so it is not a derivative of GPL code. This MIT pipeline is genuinely valuable, reusable IP: the reference "FFmpeg → WASI, libav-direct" build nobody else has. - The released artifacts carry the licence their contents demand — libav* is LGPL-2.1+:
- LGPL variant (default): no
--enable-gpl, no x264. H.264 encode via openh264 (BSD; document the self-compiled AVC-patent caveat) or omitted. Proprietary-compatible. - GPL/full variant (opt-in):
--enable-gpl+ libx264, best-in-class H.264. - We ship BOTH variants in every release, so a consumer who just wants a working module picks the licence that fits and skips building. This does not compromise us: distributing two separate, independent artifacts together is mere aggregation (GPLv3 §5) — the GPL artifact does not infect the LGPL artifact, the MIT tooling/driver source, or afmpeg.
- Obligations we meet: each asset is clearly licence-labelled; the provenance manifest
records variant + licence; and we satisfy GPL/LGPL corresponding-source (and LGPL
relink) via the public MIT repo (our scripts +
driver.c) + the pinned upstream FFmpeg/x264 — anyone can rebuild/relink from public sources. - LGPL is the floor (we cannot relicense libav* below it); MIT is what we own (tooling + driver); afmpeg stays permissive (it downloads an artifact; the GPL/LGPL obligation attaches to the consumer who runs it, never to afmpeg's source). (A real licence review precedes any release.)
6. The codec/filter baseline (R-AF-3, reframed)¶
A general baseline (validated by the §9 workflow spread), not one consumer's set: decode of
common containers/codecs; encode of at least H.264 (libx264 GPL / openh264 LGPL),
AAC (native), plus common image/audio encoders; the general filter set
(scale/crop/pad/overlay/concat/xfade/format + the audio filters). A lean
vs full build is selectable. Finalised in the ffmpeg-wasi repo's own build spec.
7. Versioning & release (D-FW: track upstream FFmpeg)¶
- Tag = upstream FFmpeg version + build revision, e.g.
n8.1.2-1(the suffix bumps for toolchain/config rebuilds of the same FFmpeg). Releases are cut when a new FFmpeg version lands or a rebuild is needed — not conventional-commit/releaser-pleaser driven. - Custom release pipeline (no goreleaser): tag → Docker build (both variants) →
emit
.wasm(+.gz) + a provenance manifest (FFmpeg/dep/toolchain versions, configure line, variant, licence) +checksums.txt→ publish as GitLab release assets. - afmpeg consumers pin
WithModuleURL(<release asset>, WithSHA256(<published sum>)).
8. afmpeg-side integration¶
Runtime.Runstays generic — it runs any wasm module with args over the vfs; the interim go-ffmpreg path (raw ffmpeg args) still works for anyone who wants it.- The
Commandbuilder targets the job-spec vocabulary (§4) — itsArgs()(or a newJob()) serialises to the driver's spec.Probebecomes anop:"probe"job parsing the driver's JSON, replacing theffmpeg -istderr scrape (spec 0004 D-0004-A) once the driver is the module in use. - Pinning: afmpeg documents/pins a known-good
ffmpeg-wasiartifact + sha; the job-spec vocabulary version is the compatibility check between the repos.
Status (2026-06-28) — validated end-to-end¶
The whole stack — keyrx → afmpeg → ffmpeg-wasi — was validated before the first
ffmpeg-wasi release: keyrx's afmpeg renderer drives the engine to produce a real reel
(PNG stills → xfade-concat + audio mix → h264/aac mp4) entirely in memory, no system
ffmpeg. Done on the afmpeg side:
- ✅
Result.Stdout— exposes the engine's structured (probe/process) JSON. - ✅
Command.JobSpec()+Runtime.RunJob()— the generic emitter from theCommandstruct to the job spec;Args()stays for CLI ffmpeg. No consumer concepts leaked in. - ✅
WithModuleURL+WithSHA256— pin a publishedffmpeg-wasiartifact. - ✅ Integration tests (
TestIntegration_FFmpegWasiDriver,TestIntegration_RunJob, gated onAFMPEG_TEST_FFMPEG_WASI) prove the seam against the real driver.
Remaining afmpeg work (next steps)¶
- ✅
Probeover the driver (done, v0.4.0) —Probedrives the engine'sprobeop ({"op":"probe"}) and parsesResult.StdoutintoProbe{Format, DurationSec, Streams}. The CLI path was removed entirely (job-spec only). Unblocks keyrx swapping itsffprobeProbeDurationtoafmpeg.Probe. - Runtime reuse guidance —
Newcompiles the module (expensive); document compile-once, reuse-many (it already serialises invocations). - A "consume ffmpeg-wasi" how-to — extend
docs/how-to/obtain-a-modulefurther as the API settles.
9. Requirements¶
R-FW-1Current FFmpeg libav* builds towasm32-wasi, single-threaded, CGO-free; loads and runs under wazero (composing afmpeg's env module + features). Spike-proven.R-FW-2The driver executes a job spec end-to-end over the WASI fs — a real in-memory transcode (decode→filter→encode→mux), no host fs. (Next validation beyond the spike.)R-FW-3The job-spec vocabulary (§4): structured I/O/codecs + the libav filtergraph string;processandprobeoperations; versioned.R-FW-4LGPL (default) + GPL (full) variants from clean-room, permissively-licensed tooling; MIT driver; per-artifact licence recorded.R-FW-5Reproducible build; tag tracks upstream FFmpeg; release pipeline publishes checksummed, provenance-stamped artifacts.R-FW-6Validated across the workflow spread: transcode, scale, overlay, concat, thumbnail, audio-extract, probe.R-FW-7afmpeg integration:Command→ job spec;Runstays generic; pinned consumption.R-FW-8Docs & marketing are a first-class, day-one deliverable — not an afterthought. ffmpeg-wasi is a flagship "nobody else has done this" project, so the narrative is part of the product. It ships:- A full Diátaxis docs site: tutorials (your first in-memory transcode), how-to (each workflow + choosing a variant + verifying checksums), reference (the job-spec vocabulary, the artifacts/variants/provenance, the supported codec/filter matrix), explanation (the architecture, why libav-direct beats the CLI/threads wall, the licensing model).
- A marketing narrative leaning into the genuine differentiators: current, maintained FFmpeg (not EOL) · WASI-native / server-side (not the browser one) · pure-Go-embeddable, CGO-free (wazero) · sandboxed · the reference for FFmpeg on WASI — because everyone else hit the threading wall and we went under it.
10. Decisions¶
- D-FW-A — name. RESOLVED 2026-06-28:
ffmpeg-wasi. The most truthful name: it is FFmpeg's libav*, andwasi(notwasm) owns the uncontested server-side niche, distinct from the crowded browserffmpeg.wasm. Keeps the searchable, honestffmpegkeyword. - D-FW-B — vocabulary. RESOLVED 2026-06-28: clean custom structured job spec (not an ffmpeg-CLI subset), with the filter graph delegated to libav's parser.
- D-FW-C — licensing. RESOLVED 2026-06-28 (pending legal review): the repo source (build
tooling +
driver.c) is MIT — owned, reusable IP (it orchestrates, never links GPL). Both LGPL (default) and GPL/x264 (opt-in) artifacts ship in every release for consumer convenience — mere aggregation (GPLv3 §5), no compromise. LGPL is the floor; afmpeg stays permissive; corresponding-source/relink met via the public repo + pinned upstream. - D-FW-D — separate repo. RESOLVED: the GPL/LGPL engine lives in
ffmpeg-wasi; afmpeg consumes the artifact and stays permissive. - D-FW-E — driver language. RESOLVED 2026-06-28: C (not Rust). The driver is a thin shim over C libraries; Rust's safety would cover only the glue (the codecs stay C), the wasm sandbox already contains memory-bug blast radius to the guest, and Rust adds a second toolchain + FFI/allocator integration risk for marginal benefit.
11. Phased roadmap¶
- Phase A — engine build (R-FW-¼/5): clean-room build of current libav* + the LGPL/GPL variants; the release pipeline. (The spike de-risked the compile/link/run.)
- Phase B — the driver (R-FW-⅔/6): the job-spec parser + the processing loop; the workflow-spread validation (real in-memory transcode).
- Phase C — afmpeg integration (R-FW-7):
Command→ job spec; pinned consumption; reframeProbeonto the driver. - Phase D — hardening (0006 carries over): perf, LGPL/openh264 hardening, the lean/full matrix.
12. Definition of done (this scoping spec)¶
- The pivot, the two-repo split, the vocabulary, the licensing posture, and the versioning are recorded and agreed. 0002 is marked superseded. The ffmpeg-wasi repo is created with its own build spec citing this one; afmpeg's specs are reframed accordingly.