Type · position statement Audience · operators, reviewers Revised · 2026-06-06

Why a new implementation.

fail2ban built the category and still works for most of the hosts running it. This page is not an argument that it was wrong. It is an argument that the architecture it chose in 2004 has reached its ceiling, and the tools to go further are now mature.

01 · What fail2ban did right

The category exists because fail2ban proved it could.

In 2004, automatically correlating log failures and writing firewall bans on the host was not an established pattern. fail2ban made it one.

What it introduced

jail filter maxretry findtime bantime ignoreip

Those concepts survived because they were operationally correct. fail2zig keeps the model. What changes is the implementation ceiling — not the category definition.

02 · What the model costs in 2026

Four concrete problems the shell-action model cannot resolve.

PR.01

An interpreter in the TCB

CPython, the Python standard library, python3-systemd, and a chain of C extension modules all execute in the daemon's address space with root. The interpreter can load any bytecode and parse any input. Whatever else a root daemon should be, a shape that accepts "any bytecode" as legitimate input is a larger trust contract than a single static binary.

PR.02

Process-per-action overhead

Every ban forks /bin/sh and execs iptables / nft / ipset. On a host being scanned at a few hundred attempts per second — a modest VPS workload in 2026 — the fork-exec cost becomes the binding constraint on how fast bans can actually land. A shell-action can be written efficiently; it cannot be written faster than execve.

PR.03

Unbounded memory growth

Attackers shaping log traffic to pressurise the state tracker is a real scenario. A dictionary-backed data model grows under that pressure — and resizes during growth, which pauses the event loop. Under sustained attack, those pauses stack and the daemon falls behind its own log stream.

PR.04

No runtime in minimal images

Distroless images, scratch containers, OpenWrt routers, hardened minimal servers, FreeBSD jails. The platforms where a host IPS matters most are precisely the platforms that do not tolerate a 30 MB Python runtime. Operators running those today run no host IPS at all, which is the wrong answer.

03 · Why this is architectural

These are structural, not patchable.

All four problems in the previous section come from the same choice: orchestrate in Python, act through shell. That is the shape of the architecture, not the quality of any single piece of it.

What can't be patched

Tune the shell template

execve is still on every ban. The process spawn is the operation, not an implementation detail.

Rewrite the filter

CPython is still in the daemon's address space. The interpreter is the trust contract, not the code it loads.

Cache the state tracker

The dictionary still resizes. Growth under attacker pressure is the data model, not a bug in the lookup path.

A better ceiling requires a different shape — a statically-compiled binary, memory bounded explicitly, firewall state written to the kernel directly, a data model designed for attacker-shaped input.

04 · What fail2zig chose

Different ingredients, same problem.

The work the category has always done, rebuilt with the tools that were not available when it was first invented.

The binary is the TCB

A single statically-linked musl binary. No interpreter, no VM, no dynamic loading, no third-party Zig packages in the critical path. What you audit is exactly what runs — one binary, with nothing loading code at runtime that wasn't reviewed before it shipped. See the trusted computing base.

Direct netlink, no shell

Bans are written to the kernel through AF_NETLINK directly — the default nftables backend spawns no nft binary at all. The kernel write path drops from the milliseconds a fork-exec chain costs to tens of microseconds, and the nftables ban path is exactly one program: fail2zig itself. (The optional iptables / ipset fallbacks exec their CLI; nftables does not.) See netlink interop.

A hard memory ceiling

A memory ceiling set at startup (memory_ceiling_mb) bounds the state tracker — the dominant consumer, and the structure attack traffic actually grows — through a ceiling-derived entry cap with eviction. Tracked state cannot grow unbounded regardless of attack volume. When the tracker reaches capacity, eviction fires — the daemon does not resize a growing map, does not pause the event loop, does not get killed by the OOM reaper. The parser hot path allocates nothing.

A filter engine compiled at build time

Filters are specialised at compile time via Zig's comptime. No runtime regex engine, no pattern interpreter — attackers cannot influence how their input is parsed because the parser was fixed before the binary shipped.

Zero runtime dependencies

No Python, no Perl, no Ruby, no Node. No D-Bus. No /bin/sh. No plugins. The default ban path is the binary and the kernel and nothing else — the only subprocesses anywhere are optional: the journald source's journalctl poll and the ipset / iptables fallback backends. See zero runtime dependencies for which kernel ABIs the binary depends on and why those are the acceptable boundary.

fail2ban is the adoption path

Your jails, filters, thresholds, and ignore lists port over with one command: fail2zig --import-config /etc/fail2ban. The mental model is identical. The operational vocabulary is identical. The only thing that changes is what runs as root on your machine.

05 · What fail2zig is not

Scoping is a feature.

Not a WAF. fail2zig does not inspect request bodies, enforce rules on HTTP payloads, or sit in the request path. WAFs belong at the edge. fail2zig lives on the host.

Not a SIEM. fail2zig does not aggregate, store, index, or correlate logs across hosts. It reads one host's logs, decides who is attacking it, and tells the kernel to drop their packets.

Not an IDS. Intrusion detection is a deeper problem space — protocol decoding, packet inspection, behavioural anomaly detection. Projects like Suricata and Zeek do that work. fail2zig consumes the signals an IDS (or the services themselves) produce and enforces the blunt response.

Not a network firewall. nftables is the firewall. fail2zig writes to it. A network firewall, a cloud security group, and a host IPS are three different layers; running fail2zig does not replace the first two.

Not a replacement for hardening. fail2zig bans attackers after they have already probed. Disabling password auth for SSH, keeping services patched, and minimising exposed surface are what prevent the probe-to-compromise path in the first place. fail2zig raises the cost of probing; it does not replace the reasons probing is ineffective against a well-run host.

Where to go from here.

If you run fail2ban today and the values above speak to the problems you have actually hit, the import path is the shortest route.

$ install Migration guide → Read the threat model