waap

क Karna: we built our own WAF. Modern, Fast and Free.

We replaced ModSecurity with Karna, our open-source WAF engine in Lua and C running as a Kong plugin. CRS-compatible, MCP-aware, can sanitize instead of blocking, and 2 to 4 times faster than ModSecurity in our benchmarks. The honest story of why, how, and how to try it.

Andrea Menin

08 Jun 2026 • 14 min read

TL;DR

For about four years the WAF module of the Sicuranext WAAP ran on ModSecurity. It served us well, but it fought us on the things that matter at scale: CPU burned mostly inside PCRE, an nginx reload just to change one customer's rule, no real control over what happens when a rule matches, rate-limiting that is trivial by 2026 standards, and an audit log in a format no modern log-analytics tool wants. So we built our own WAF engine in Lua and C, called Karna, that runs as a plugin inside Kong Gateway. It is 100% compatible with the OWASP Core Rule Set (so all the detection knowledge you already trust still works), it blocks attacks 2 to 4 times faster than ModSecurity, it understands modern MCP / JSON-RPC agent traffic, and it can sanitize a request instead of blocking it. Today it is open source: github.com/sicuranext/karna.

I know what you're thinking: "just show me the benchmark"... here it is: https://karna.sicuranext.com/benchmark.html

This post is the honest story of why we did it, what we built, the performance wall we almost gave up on, and how you can run it yourself.

A word on why we think we have something useful to say here. The Sicuranext WAAP handles around 20 million HTTP requests a day, so we have spent years on the real-world user experience, integration and performance of ModSecurity on nginx, not in a lab. And I spent about five years as a developer on the OWASP Coreruleset team, so I know the project's strengths and its sharp edges from the inside. None of what follows is a drive-by complaint.

(Too) Many years on ModSecurity

ModSecurity is a great piece of software, I use it since 2016. It is also the reason the OWASP CoreRuleSet project exists, and I think the CRS is still the best open detection ruleset in the world. I am not here to bury it.

But our product is a WAAP that lives on Kong Gateway in a AWS cloud behind a AWS Application Load Balancer... using our custom cache and CDN... and ModSecurity was designed in 2002 for Apache... single server, with configuration files... two decades and a whole new runtime later, a lot of what we were doing was working around assumptions that simply do not exist in Kong/Nginx/AWS. At some point we asked ourselves a question that sounds reckless when you say it out loud: what if we wrote our own WAF?

We did. Here is everything that pushed us there.

Maybe you're wondering why not just use the AWS WAF... (lol 🤣) you can discover it by yourself by reading this https://blog.sicuranext.com/aws-waf-bypass/ it's not just useless in terms of protecting webapp (missing features, bypass by design) but the costs are non-sense.

What went wrong with ModSecurity (and the OWASP CoreRuleSet) for us

This is not "ModSecurity is bad". This is "ModSecurity is the wrong runtime for a Kong-based WAAP in 2026". The list:

Rules live in files. ModSecurity loads its rules from .conf files on disk. We wanted the opposite: rules that live in a datastore, not on a filesystem, so one customer's rule can change centrally and take effect on the very next request. In Karna a rule is just data in the plugin config, pushed over the Admin API, with no file to edit and no reload.
The Core Rule Set is written for ModSecurity's engine, not a clean one. Two decades of CRS carry ModSecurity-engine idioms that a modern engine would never invent, and to stay compatible we had to emulate every one of them. A few we hit head-on, straight from Karna's own parser and engine:
- There is no "does this variable exist". CRS tests for presence by counting a variable and comparing the count to zero, as in SecRule &TX:foo "@eq 0". We had to expose a virtual count: variable so that &VAR @gt 0 keeps meaning what CRS expects.
- The anomaly score is built entirely out of side effects: rules setvar into TX:* and later rules read them back through %{tx.*} macro expansion. We had to implement the whole TX variable bag and a macro resolver just to run the score-then-block model at all.
- Tuning happens through ctl: directives (ctl:ruleRemoveById, ctl:ruleRemoveTargetByTag) that rewrite the active rule set in the middle of a request. Karna parses and applies those at runtime too.
- CRS targets ModSecurity-internal variable names for multipart and XML (MULTIPART_*, XML:/*). Each one had to be bridged onto Karna's native namespaces by hand.
Response rules are dangerous as designed. The way CRS inspects responses today does not make sense to us, and it is genuinely risky: buffering and scanning response bodies is a denial-of-service vector in its own right. We documented exactly how it backfires in Response filter denial of service: a new way to shut down a website.
The audit log fights modern tooling. It is line-based and ModSecurity-shaped, so every pipeline has to reverse-engineer the format before it can do anything useful. We wanted structured JSON we could ship straight into Loki, OpenSearch or S3 and query without writing a parser first.
Rate-limiting is stuck in the Apache era. ModSecurity counters lean on SecAction initcol and an Apache persistence file. By 2026 that is close to useless: no shared state across workers or nodes, no clean per-key windows, and an awkward fit next to the Redis nearly every stack already runs. We wanted real, Redis-backed rate-limiting as a first-class rule action.
The multipart parser can be bypassed. This one is personal. I went looking for ways to slip a malicious file past a WAF and found plenty, and I wrote it all up in Breaking down multipart parsers: validation bypass. Once you have seen how many parsers disagree with the backend they protect, you stop trusting off-the-shelf parsing for security inspection.
JSON inside a key=value parameter is a nightmare. Modern clients happily send JSON inside a single parameter: a=b&c={"foo":"bar"}. ModSecurity sees c as one opaque blob and you lose all structure. For Karna it is automatic: it detects the embedded JSON and flattens it into per-field variables you can target individually, and it does this the same way for the query string, for headers like Cookie, and for the request body.
CPU burns for no obvious reason. This is the one that hurt most. We would watch CPU spike with no clear link to the traffic we were actually serving. We traced it to two recurring culprits: the sheer number of ctl: directives inside large exclusion rule sets, which the engine re-walks on every request, and regexes with heavy backtracking, where a single pathological pattern can pin a worker. On a WAF, speed is the whole game, and ModSecurity gave us too little control over either.

Add it up, and the conclusion was clear: keep the detection knowledge the CRS encodes, drop the ModSecurity runtime, and fix the CRS-isms that only ever made sense inside it.

Meet Karna

Karna is a self-contained WAF engine written in Lua (with a couple of small C helpers via FFI) that runs as a Kong plugin. A few things we are proud of:

It speaks CRS, fully. Karna loads the OWASP Core Rule Set 4.x and runs it. On our own regression harness, with the production-default config, it passes 100% of PL1 (2757/2757), 100% of PL2 (4071/4071) and 99.9% of PL3 and PL4. The detection you already trust just works.

Rules are plain JSON. Here are the shapes you will actually write.

A rule matches one or more variables with an operator, and on a match it runs an action. This one blocks known scanners by their user-agent:

{
  "id": "block_scanners",
  "phase": "access",
  "message": "known scanner user-agent",
  "conditions": [
    {
      "op": "rx",
      "variables": ["request.header.value:user-agent"],
      "value": "(?i)(sqlmap|nikto|nmap|masscan)"
    }
  ],
  "action": { "fixed_response": { "status_code": 403, "body": "Forbidden\n" } }
}

When a CRS rule is too aggressive on one route, you kill the false positive without disabling it everywhere. Rule controls scope the change to where it belongs. Here CRS rule 941100 (reflected XSS) is dropped only for the rich-text editor under /admin/articles, where HTML in the body is expected; on every other path it still runs:

{
  "id": "fp_richtext_xss",
  "phase": "access",
  "message": "rich-text editor legitimately posts HTML",
  "conditions": [
    { "op": "beginsWith", "variables": ["request.raw_path"], "value": "/admin/articles" }
  ],
  "rule_control": [
    { "remove_rule": { "rule_id": "941100" } }
  ]
}

Rate-limiting is a rule action, Redis-backed and keyed by a macro (the client IP by default). This throttles login attempts to 10 per minute per IP and answers 429 with an automatic Retry-After:

{
  "id": "rl_login",
  "phase": "access",
  "message": "throttle login attempts per client IP",
  "conditions": [
    { "op": "beginsWith", "variables": ["request.raw_path"], "value": "/login" }
  ],
  "action": {
    "rate_limit": {
      "key": "%{remote_addr}",
      "limit": 10,
      "window_seconds": 60,
      "response": { "status_code": 429, "body": "Slow down\n" }
    }
  }
}

Sanitize, do not block. This is the feature we missed the most. A real name like O'Brien, or an address like Via dell'Orso, 5, trips the same SQL-injection heuristics as an attack. Every WAF answers with a 403 and a broken signup form. Karna's fix_matched_parts action strips the dangerous characters out of the matched field and lets the request through, so a false positive costs a character, not a customer:

{
  "id": "vp_signup_name",
  "phase": "access",
  "conditions": [{ "op": "libinjection_sqli", "variables": ["request.arg.value:name"] }],
  "action": { "fix_matched_parts": { "remove_chars_pattern": "[<>\"'&;]" } }
}

The request reaches your app with the unsafe characters removed, and the audit log records action: "sanitized". No 403.

It sees inside JSON smuggled in a parameter. Remember the a=b&c={...} problem. Because Karna flattens embedded JSON into real variables, you can point a detection straight at the nested fields. This runs libinjection against every value inside a JSON-encoded parameter, so an SQLi hidden in payload={"q":"1' OR 1=1--"} is caught even though it never appears as a plain argument:

{
  "id": "sqli_in_json_param",
  "phase": "access",
  "message": "SQLi inside a JSON-encoded parameter",
  "conditions": [
    { "op": "libinjection_sqli", "variables": ["request.body.urlencode.json"] }
  ],
  "action": { "fixed_response": { "status_code": 403, "body": "Forbidden\n" } }
}

It is MCP-aware. Karna parses the JSON-RPC envelope on the request side and reassembles SSE responses on the Streamable HTTP transport, evaluating rules per event. Your model gateway gets the same layered defense as the rest of your API.

You can rewrite CRS behavior from config. Want the whole XSS rule family to sanitize instead of block? You do not fork the CRS, you add one override:

{ "selector": { "tags": ["attack-xss"] },
  "action": { "type": "fix", "remove_chars_pattern": "[<>\"'&;]" } }

rule_action_overrides and rule_response_overrides let you switch existing rules to fix, passthrough or block, and customize the response, by id, range or tag. The cached rule pack is never touched.

The rest of the toolbox: libinjection via FFI for SQLi/XSS, always-on validation gates (method, path, headers, content-type) that run before any rule, native Redis-backed rate limiting, a clean JSON audit log v2, and the ability for a Karna detection to write into kong.ctx.shared so a sibling Kong plugin downstream can react without even knowing Karna is there.

And a little operator tool. We ship a small Python script, karna-rules, that pushes a JSON file of local rules or overrides onto a service through the Admin API, interactively or directly. It is the kind of thing you write for yourself and then realize everyone needs.

Why Kong Gateway

Karna is a pure Lua plugin. luarocks make, kong reload, done. No rebuild, ever. You change rules at runtime through the Admin API with no restart.

But the part people miss: because Karna runs on Kong, it sits in front of anything. You do not need a Kong-native architecture. Kong is the reverse proxy, so Karna guards whatever is already behind it, in any language, on any host, including things that are already in production. Run it DB-less with a tiny declarative config and traffic flows client -> Karna/Kong -> your app.

Why Karna is so fast?

Here is the part most launch posts leave out.

The first time we benchmarked Karna against a real baseline, it was bad. On benign traffic Karna was roughly 10 times slower than the ModSecurity stacks. We did a clean bisection on a dedicated box and the numbers were brutal: bare Kong did 2229 requests/second, Kong with Karna did 360. The engine was eating about 84% of the gateway's throughput on legitimate traffic.

A WAF that degrades good traffic that hard is not shippable. We were genuinely close to throwing the whole thing away and going back to ModSecurity.

First we got honest about the number. The 6x figure was against no WAF at all, which is not a fair fight. Measured against ModSecurity 3 doing the same CRS work on the same box, we were about 2.9x slower. Still bad, but a real target instead of a scary one.

Then we tried a lot of things that did not work, and this is the honest part:

We compiled rules into Lua closures. Neutral to slightly negative. LuaJIT was already tracing the hot path fine.
We refactored the operator dispatch. Neutral.
We tried an Aho-Corasick prefilter on the regex rules. It lost, because the per-scan overhead beat the savings on short inputs.
We built a literal prefilter and hoped it was the answer. It only gated a fraction of the rules.
We tried a per-worker LRU transform cache. It was slower than a plain Lua table.

We reverted all of them. The lesson, repeated until it stuck: on an interpreted engine, micro-optimizations are noise, and the only real lever is changing WHAT you scan, not how fast each tiny step runs.

Then it started to turn. Every win that landed was algorithmic, and there were a lot of them. The engine went from eating 84% of the gateway to comfortably outrunning every ModSecurity stack we tested. The full breakdown, with every number and the soundness gate that kept us honest, is later in this post under Why Karna is so fast.

Benchmarks: Karna beats the field

We benchmarked Karna against the three most deployed open-source WAF stacks: Apache + ModSecurity 2, nginx + ModSecurity 3, and OWASP Coraza (the Go WAF, on Caddy). Same dedicated host (Hetzner CCX, 2 cores per container), same OWASP CRS at PL1, same traffic from k6 at 20 virtual users. Every WAF returns the same HTTP status on every request (benign 200, attack 403, k6 checks_rate = 1.0), so these numbers compare throughput, not leniency.

Requests per second, higher is better:

Scenario	Apache+ModSec2	nginx+ModSec3	Coraza+Caddy	Karna
Attack blocking	852	1623	570	3326
Mixed real-world traffic	612	1270	337	1569
API with embedded attacks	190	688	184	815
Benign GET, no cache	392	1139	319	1310
Big urlencoded body (950 args)	1.9	14.4	1.8	20.1
Deeply nested JSON (depth 400)	201	220	107	228
Multipart upload (3 files)	486	580	277	541

Karna leads on the job a WAF exists for, blocking attacks, by a wide margin: more than double nginx+ModSec3 and almost 4x Apache+ModSec2. It wins almost every other scenario too, and runs 2 to 11 times faster than Coraza across the board. Full per-scenario numbers and methodology are in BENCHMARKS.md

OK, but why Karna is so fast?!?

Stop redoing static work (10 to 12x). Profiling was blunt: the cost was not the regexes, it was work repeated for every value of every rule. Macro detection, transform cache keys, lowercasing the keyword files on each call. We computed all of it once, at load time, and cached it per rule. Benign throughput went up 10 to 12 times. This single change is what made the rest worth doing.

Keep file uploads out of ARGS (25x on multipart). Uploaded file bytes were being merged into the ARGS map, so every ARGS-targeted rule re-scanned the whole file, megabytes at a time. ModSecurity deliberately keeps file content out of ARGS, and we did the same. On multipart uploads that was 25x, and it closed a real denial-of-service amplifier at the same time.

Skip regexes that cannot possibly match (+13 to 15%). From each CRS regex we extract a required literal substring once, at load time. Before running the regex we do a cheap string.find for that literal. If it is not present the regex cannot match, so we skip it entirely. It only gates a fraction of the rules, but the ones it gates it gates for free.

The big one, RE2::Set (+102%). This was the breakthrough. Instead of running roughly 210 CRS @rx regexes one at a time in the Lua VM, we compile all of them into a single Google RE2::Set automaton in C, loaded over FFI, and ask one question per input: which of these patterns matched. One linear pass instead of hundreds of separate matches. Before betting on it we checked the thing that could have killed it: zero of the 292 @rx patterns in CRS use a PCRE-only feature like a backreference or lookaround, so RE2 can run 100% of them with identical semantics. On the worst scenario, the benign-heavy one that nearly ended the project, throughput went from 305 to 618 requests per second. More than double. And because RE2 is linear-time by construction, that entire rule class is ReDoS-immune for free. We gate it carefully: 173 of the 283 @rx rules go through the RE2::Set pass, selected by which variable namespaces the request actually populates, so we never scan a body that is not there.

Aho-Corasick for phrase matching (+18% on top). The @pm and @pmFromFile operators were looping over about 17 keyword files in Lua for every value. We moved them into a C Aho-Corasick automaton: one linear pass finds every keyword at once. The funny part is that the same technique lost when we first tried it as a prefilter in front of the regexes, because the per-call overhead beat the savings on short inputs, and won here, because matching against thousands of fixed phrases is exactly what it is built for.

Two smaller wins. A PCRE-JIT prescan for strict CRLF framing in the multipart parser bought another 7.6% on uploads, and a nested transform cache that removes a per-call string concatenation added 2 to 4%.

Choosing what to scan, without losing coverage. Speed also came from not scanning what cannot matter. Behind an API gateway, a slice of CRS exists to enforce protocol hygiene that Kong and nginx already enforce upstream. We prune 54 of those redundant protocol-enforcement rules. They never fire on real traffic, and removing them does not change the CRS regression suite by a single test. Body-scanning rules are skipped when there is no body. Multipart scanning is skipped when the request is not multipart. The ARGS copy is skipped on the fast path. None of this is sampling or guessing: every request still gets the full, correct rule set for its actual shape.

A guard against the thing we were running from. CRS ships a handful of regexes that backtrack catastrophically. We cap PCRE backtracking per match with lua_regex_match_limit, and a safe_re_match wrapper turns a blown cap into a fail-closed no-match for negative operators and a plain no-match for positive ones. A pathological pattern can no longer pin a worker the way it did under ModSecurity.

The rule we never broke. Every one of these had to produce a byte-for-byte identical result on the CRS regression suite before it shipped. If an optimization changed even one test, it did not ship. That is why all of them are on by default: they are faster, and provably not weaker. The dead ends we mentioned earlier (rule-to-closure compilation, an operator-dispatch refactor, Aho-Corasick as a regex prefilter, the literal prefilter as a standalone answer, a per-worker LRU transform cache) all failed that bar or simply did not move the needle, so we reverted them.

Why Karna is slower on multipart (and why we are fine with it)

You probably noticed the one row where Karna does not win: multipart uploads. Karna does 541 requests/second, nginx+ModSec3 does 580. We are about 7% behind (we still beat Apache by 11% and Coraza by 2x, but nginx wins this one).

We could close that gap by relaxing our multipart parser. We will not, and here is why.

Remember that bypass research I mentioned? Our multipart parser is the strictest in the field on purpose, because we are the ones who documented how the loose ones get bypassed. It rejects the things lenient parsers wave through:

duplicate name or filename parameters (validator reads one, backend reads the other),
the filename*= RFC 5987 form (validators do not URL-decode it, backends do),
unquoted parameter values,
bare \n instead of strict \r\n line framing,
a missing closing --boundary--.

nginx wins multipart because it has a native C++ body parser and it is more permissive. We pay a few percent to be the parser that sees the same file the backend will see. For a security product, that is not a hard trade. If you are curious about exactly what those bypasses look like, the research post walks through them against real commercial and open-source WAFs.

Try it

Karna is open source under Apache-2.0: github.com/sicuranext/karna.

The repo ships a Docker setup that bakes Kong, the OWASP CoreRuleSet, libinjection and Karna into one image. Point it at whatever you already run with a DB-less config:

# docker/kong.yml
_format_version: "3.0"
services:
  - name: my-app
    url: http://my-backend:8080      # your existing app, any language
    routes:
      - name: my-app
        paths: ["/"]
    plugins:
      - name: karna
        config:
          engine_blocking_mode: false   # start in detection-only
          paranoia_level: 1
          auditlog_enabled: true

git clone https://github.com/sicuranext/karna.git
cd karna

# build the self-contained image, then bring it up DB-less in front of your app
docker build -f docker/Dockerfile -t karna .
docker compose -f docker/docker-compose.prod.yml up -d

Start in detection-only, watch the JSON audit log, then flip engine_blocking_mode to true when the signal is clean. Need to push some local rules or CRS overrides? The karna-rules script in scripts/ does that interactively. The README has the full configuration reference.

Come say hi

Karna is young and we want it to be useful to people outside SicuraNext too. If you run it, break it, or have an idea, open an issue on the repo.

Happy filtering.