Parser Engine

The component that turns attacker-controlled log lines into structured events. Its correctness under hostile input is the product.

The design in one line

fail2zig does not run a regex engine in-process. Every built-in filter compiles at build time to a specialised match function. Those functions operate on slices of the buffered log line and never allocate. There is no runtime pattern language and no regex VM anywhere in the daemon.

That is the entire parser. Everything below is why those choices matter and how they hold.

Why no regex engine

The regex engine in most intrusion-prevention daemons is the largest single attack surface they have — a general-purpose matcher running in-process on bytes the attacker chose. PCRE, RE2, and Python’s re have all shipped security-relevant bugs: catastrophic backtracking, buffer overreads on malformed patterns, integer overflows in quantifier handling. A regex engine in this position is asking the daemon to evaluate a Turing-equivalent program against hostile input — close to the worst possible shape for attack surface.

fail2ban’s filters are failregex = … lines compiled by Python’s re and applied to every log line. A single filter with .* in the wrong place can stall the runtime under crafted input (ReDoS): feed a line that pushes the engine into exponential backtracking and the daemon stops processing legitimate traffic.

fail2zig solves this structurally, not by hardening. Its filters are comptime-specialised match functions. They are not programmable at runtime — attacker-controlled input cannot influence what code the parser executes, only which path it takes through fixed, bounds-checked code.

Comptime specialisation

Zig’s comptime runs code at build time. fail2zig uses it to generate a specialised match function per filter from a pattern definition. Each function has the signature:

pub const MatchFn = *const fn (line: []const u8) ?ParseResult;

The filter registry (engine/filters/registry.zig) holds the 15 built-in filters; matching tries them in order and returns the first hit, or null. A pattern compiles (via engine/core/parser.zig) to a straight, inlined byte scan — startsWith checks, literal advances, and capture extraction — with no regex VM, no interpretation, and no runtime backtracking. The compiler emits exactly the scan the pattern requires.

This is not an optimisation that can be undone later; it is the architecture. Built-in filters are code that ships with the daemon, not config entries.

No runtime filter language

fail2zig has no [[filter]] config section and no runtime pattern matcher. The filter key on a jail must name one of the 15 built-ins; an unknown filter on an enabled jail makes the daemon fail closed (refuse to start) rather than run a jail that silently matches nothing. Extending coverage means contributing a comptime filter to the registry — a code change, built and tested like any other — not a regex typed into a config file. That is a deliberate trade: the parser’s surface is fixed at build time, so there is no attacker-reachable pattern compiler to get wrong.

IP extraction

Finding and validating the source IP in a line is the parser’s most frequent operation. It is a scalar, single-pass scan — there is no SIMD or vectorisation in the engine.

IPv4 accumulates decimal octets in one pass, rejecting on the first out-of-range octet, capped at 15 bytes.
IPv6 is validated structurally in a single pass (group count, a single ::), and only the proven-valid span is handed once to std.net.Ip6Address.parse — not re-parsed repeatedly.
IPv4-mapped IPv6 (::ffff:a.b.c.d) is canonicalised to its IPv4 form so a ban can’t be evaded by switching address family (SEC-001).

Cold/reject paths are marked @branchHint(.unlikely). Nothing is allocated; the extracted address is a fixed-size value on the stack.

Zero-copy against buffered input

The watcher hands the parser a slice of its line buffer — a pointer and a length — not a freshly allocated copy. The parser works on that slice in place, and capture fields are emitted as further slices into the same buffer.

Nothing is copied, nothing is allocated. The ban decision receives the extracted IP as 4 bytes (IPv4) or 16 bytes (IPv6) in a stack value; the original line is never retained past the parser call. This is a direct consequence of the bounded-memory posture — if the parser allocated per line, daemon memory would grow with log volume. Because it does not, log volume is a throughput concern, not a memory concern.

Bounds-checking on every byte

Every slice access in the parser is bounds-checked, every loop has a termination guard against the slice length, and every capture is validated (a bad IP capture returns null and the line is dropped, never passed up as garbage). This is the language, not a layer on top: Zig’s ReleaseSafe keeps runtime bounds checks active on all slice access, and fail2zig ships production binaries in ReleaseSafe — not ReleaseFast — precisely to keep those checks on the attacker-controlled path.

Input limits

A log line is capped at 4096 bytes by the watcher’s line buffer (a compile-time constant), so the parser never sees an oversized slice; an over-length line is delivered truncated and is not acted on. Lines that look valid but fail capture (malformed IP, out-of-range timestamp) are dropped rather than passed up as a partial match. Ban and jail activity is observable through the Prometheus /metrics endpoint and fail2zig-client.

Throughput

Parsing is not the bottleneck. On a single core against a real auth.log replay, the end-to-end log-line-to-parsed-event rate measured on the benchmark suite is ~5.96M lines/sec — the ceiling is the memory bandwidth of the log reader, not the parser. That headroom is why fail2zig keeps ReleaseSafe bounds checks on: the safety margin costs a few percent of a number that is already orders of magnitude beyond what a single host produces.

How to verify

# Run the parser + filter tests (fuzz harnesses run inside the suite):
$ zig build test -- "parser"
$ zig build test -- "registry"
 
# The filters are compiled in — confirm the binary carries no regex engine:
$ strings zig-out/bin/fail2zig | grep -iE 'pcre|libpcre|re2'   # empty

Memory model — why the parser allocates nothing.
Zero runtime dependencies — why a regex engine would be a runtime dependency fail2zig will not ship.
Built-in filters — the 15 filters that have comptime-specialised parsers.
Configuration reference — the filter key and why an unknown filter fails closed.

Edit on GitHub →