Why a new implementation.
fail2ban built the category and still works for most of the hosts running it. This page is not an argument that it was wrong. It is an argument that the architecture it chose in 2004 has reached its ceiling, and the tools to go further are now mature.
The category exists because fail2ban proved it could.
In 2004, automatically correlating log failures and writing firewall bans on the host was not an established pattern. fail2ban made it one.
jail filter maxretry findtime bantime ignoreip Those concepts survived because they were operationally correct. fail2zig keeps the model. What changes is the implementation ceiling — not the category definition.
Four concrete problems the shell-action model cannot resolve.
An interpreter in the TCB
CPython, the Python standard library, python3-systemd, and a chain of C
extension modules all execute in the daemon's address space with root. The interpreter
can load any bytecode and parse any input. Whatever else a root daemon should be, a
shape that accepts "any bytecode" as legitimate input is a larger trust contract than a
single static binary.
Process-per-action overhead
Every ban forks /bin/sh and execs
iptables / nft / ipset. On a host being scanned
at a few hundred attempts per second — a modest VPS workload in 2026 — the fork-exec
cost becomes the binding constraint on how fast bans can actually land. A shell-action
can be written efficiently; it cannot be written faster than execve.
Unbounded memory growth
Attackers shaping log traffic to pressurise the state tracker is a real scenario. A dictionary-backed data model grows under that pressure — and resizes during growth, which pauses the event loop. Under sustained attack, those pauses stack and the daemon falls behind its own log stream.
No runtime in minimal images
Distroless images, scratch containers, OpenWrt routers, hardened minimal servers, FreeBSD jails. The platforms where a host IPS matters most are precisely the platforms that do not tolerate a 30 MB Python runtime. Operators running those today run no host IPS at all, which is the wrong answer.
These are structural, not patchable.
All four problems in the previous section come from the same choice: orchestrate in Python, act through shell. That is the shape of the architecture, not the quality of any single piece of it.
execve is still on every ban. The process spawn is the operation, not an implementation
detail.
A better ceiling requires a different shape — a statically-compiled binary, memory bounded explicitly, firewall state written to the kernel directly, a data model designed for attacker-shaped input.
Different ingredients, same problem.
The work the category has always done, rebuilt with the tools that were not available when it was first invented.
A single statically-linked musl binary. No interpreter, no VM, no dynamic loading, no third-party Zig packages in the critical path. What you audit is what runs. The trusted computing base is ~12,000 lines of Zig you can read in an afternoon.
Bans are written to the kernel through
AF_NETLINK directly — no nft,
iptables, or ipset binaries are ever spawned. Ban latency drops
from milliseconds to tens of microseconds, and the supply chain for the ban path contains
exactly one program. See
netlink interop.
A fixed-size arena, configured at startup, enforced at the allocator level. The daemon cannot exceed it regardless of attack volume. When the state tracker reaches capacity, eviction policy fires — the daemon does not resize, does not pause the event loop, does not get killed by the OOM reaper.
Filters are specialised at compile time via Zig's
comptime. No runtime regex engine, no pattern interpreter — attackers
cannot influence how their input is parsed because the parser was fixed before the
binary shipped.
No Python, no Perl, no Ruby, no Node. No D-Bus. No
/bin/sh. No plugins. The binary plus the Linux kernel is the complete
dependency graph. See
zero runtime dependencies
for which kernel ABIs the binary does depend on and why those are the acceptable boundary.
Your jails, filters, thresholds, and ignore lists port over with one command:
fail2zig --import-config /etc/fail2ban. The mental model is identical.
The operational vocabulary is identical. The only thing that changes is what runs as
root on your machine.
Scoping is a feature.
Where to go from here.
If you run fail2ban today and the values above speak to the problems you have actually hit, the import path is the shortest route.