Log Watching

The component that turns “something happened on this system” into bytes a parser can read. Correctness under rotation, truncation, and burst load is non-negotiable.

The claim

fail2zig follows every log source it is configured for without dropping events, without double-reading after rotation, and without polling the filesystem for file sources. It uses the lowest-level primitive that fits each source: inotify for files, and a journalctl poll for the systemd journal.

Both feed the same event loop. Parser functions do not care which watcher found the bytes; every line arrives as a slice through the same interface.

The topology

flowchart LR
    subgraph Sources
        A["/var/log/auth.log"]
        B["/var/log/nginx/error.log"]
        C[systemd journal]
    end
    A -->|inotify| W[Log Watcher]
    B -->|inotify| W
    C -->|journalctl poll| J[Journald Source]
    J -->|lines| W
    W -->|slices| P[Parser Engine]

Every source produces the same output: a slice of bytes representing one log line. The watcher is the abstraction that hides the source mechanism from the rest of the daemon.

File-based sources on Linux

The Linux file mechanism is inotify (engine/core/log_watcher.zig). Each watched file gets two watches:

a file watch for IN_MODIFY | IN_MOVE_SELF | IN_DELETE_SELF — appended content and the two ways a file can disappear out from under us;
a parent-directory watch for IN_CREATE | IN_MOVED_TO — the new file arriving after a rotation.

The inotify fd is IN_NONBLOCK | IN_CLOEXEC and is registered into the same epoll instance as every other fd the daemon cares about — the IPC socket, the timer fd, signalfd, and the metrics HTTP listener. There is no io_uring in this path; the event loop is epoll-based (io_uring is a planned future backend, not shipped). When the inotify fd is ready, the watcher drains all buffered events in one non-blocking pass.

On IN_MODIFY, the watcher reads forward from the file’s current offset to EOF, splits on \n, and hands each line to the parser. When a file is first attached, the offset is seeded to the current end of file — a fresh start does not replay historical log content. The offset is tracked in memory per file; it is not persisted across restarts (a restart re-attaches at EOF).

Rotation and truncation

Rotation is where most log-following code has bugs. The cases handled:

Event	What happened	What the watcher does
`IN_MOVE_SELF`	`mv auth.log auth.log.1`	Detach the old fd; wait for the parent `IN_CREATE` / `IN_MOVED_TO`, then reopen at offset 0.
`IN_DELETE_SELF`	`rm auth.log`	Same reopen path as a move.
`IN_CREATE` / `IN_MOVED_TO`	a new file appears at the watched path	Open the new inode, start watching it, read from offset 0.
Copytruncate	file copied aside, then truncated in place	Detected when the offset exceeds the current file size — reset offset to 0 and resume.

A 64-byte content fingerprint is sampled on read to catch truncate-and-rewrite races that a size check alone would miss. A read error (permission loss, unmount) detaches the file cleanly rather than spinning.

Shared directories (BUG-007, fixed in v0.2.2). When several jails watch files in the same directory, inotify collapses the parent-directory watch to a single descriptor. An earlier version mapped that descriptor to one jail, so after the first rotation the other jails silently stopped consuming events. The watch table is now many-to-one: a parent event reopens every watched file under that descriptor, and stale IN_IGNORED events are tracked per descriptor so a reused watch number can’t clobber a fresh watch. This was a real protection failure caught on a live host; it now has regression tests.

Why not `tail -f` the whole file

Holding every log file open with blocking reads was rejected because blocking reads don’t compose with the single-threaded epoll loop (you’d need a thread or a poll per file), tail -f famously misses bytes across rotation, and 50 sources would mean 50 open fds where inotify needs far fewer. The kernel already solves “notify me when this file changes” correctly; the watcher uses that.

The systemd journal

On modern distributions sshd logs to the journal and /var/log/auth.log may not exist at all. For those hosts fail2zig reads the journal — but not via libsystemd/sd_journal. The v1 implementation (engine/core/journald_source.zig) spawns journalctl -o json as a subprocess and polls it on a 1-second timer:

First run issues a baseline query (-n 1) to capture the journal’s current cursor. That entry is never handed to the parser — there is no replay of history, so starting the daemon cannot ban on yesterday’s logs.
Steady state polls forward with --after-cursor=<cursor>, decoding the JSON records and extracting MESSAGE for matching.
The cursor is persisted to a CRC32-checked sidecar (journald-cursors.bin, next to the state file, written atomically at mode 0600) so a restart resumes exactly where it left off.

The journal selector is fixed for the sshd filter: SYSLOG_IDENTIFIER=sshd, SYSLOG_IDENTIFIER=sshd-session, and the matching _COMM values (sshd’s listener/session split). In v0.2.2 only the sshd filter has a journald selector — a non-sshd jail set to source = journald fails closed at startup rather than reading nothing.

Why a subprocess, not libsystemd

The on-disk journal format is explicitly not a stable ABI, so a native reader would chase every libsystemd change. Linking libsystemd would also add a shared library to an otherwise static binary. Polling journalctl -o json keeps the binary static and the journal contract on the stable CLI surface. The trade-off is honest and worth stating plainly: a jail using source = journald depends on the journalctl binary and spawns it as a subprocess. This is the one runtime subprocess fail2zig uses — the ban/enforcement path never shells out (it programs nftables directly via netlink). If journalctl is absent and a jail explicitly requests source = journald, the daemon fails closed; under the default source = auto it degrades to the file tailer instead.

Line length

A single log line is capped at 4096 bytes (a compile-time constant — there is no config knob for it). A longer line is delivered truncated to 4096 bytes with a truncated flag set, and the remainder up to the next newline is skipped so an over-long line can’t fragment into phantom lines. Truncated lines are not acted on — they are dropped at dispatch rather than risk a partial match.

Back-pressure

There is no unbounded buffering anywhere in the watcher. The per-file line buffer is a fixed capacity (64 KB) that compacts in place as the read head advances; the inotify fd is drained non-blocking so a burst coalesces into one read rather than a backlog of events.

The journald poll is bounded per tick: at most 4096 entries and 16 MB of journalctl output are processed in one poll, after which the cursor is committed and the rest is read on the next tick. Nothing is dropped — work is deferred, and the cursor only advances past entries actually processed.

How to verify

The resolved source and read-health for every jail are visible from the client — there is no separate stats command:

# Per-jail resolved SOURCE and HEALTH (ok / broken / unknown):
$ fail2zig-client jails
 
# Overall protection state, including DEGRADED if a source stops reading:
$ fail2zig-client status

A journald-backed sshd jail reports its source as journald (sshd); a file jail reports its log path. The rotation and journald mechanics have inline tests:

$ zig build test -- "log_watcher"
$ zig build test -- "journald"

Parser engine — what happens to log lines after the watcher.
Zero runtime dependencies — the dependency boundary, including the journalctl subprocess used by journald sources.
Configuration reference — the per-jail source key (auto / file / journald) and logpath.

Edit on GitHub →