← Back to Blog

ChaCha20 + Poly1305 from RFC to Running Code

April 24, 2026 cryptostdlibwave-9

Daniel J. Bernstein's ChaCha20-Poly1305 construction is, by 2026, the default AEAD for TLS 1.3, WireGuard, Noise, and most modern secure channels. The specification is RFC 8439, and it fits on about twenty readable pages.

The appeal: no constant-time worries about S-box lookups (there aren't any), no susceptibility to cache-timing attacks that AES has without dedicated hardware, and a design so simple you can implement it correctly from the spec alone. That last part is the thesis of this post.

◉ The ChaCha20 quarter-round is four lines

ChaCha20 operates on a 4×4 matrix of 32-bit words. The core operation is the "quarter round," which mixes four of them:

fn chacha20_qr(state: list, a: int, b: int, c: int, d: int) {
    state[a] = (state[a] + state[b]) & 0xFFFFFFFF
    state[d] = rotl32(state[d] ^ state[a], 16)
    state[c] = (state[c] + state[d]) & 0xFFFFFFFF
    state[b] = rotl32(state[b] ^ state[c], 12)
    state[a] = (state[a] + state[b]) & 0xFFFFFFFF
    state[d] = rotl32(state[d] ^ state[a], 8)
    state[c] = (state[c] + state[d]) & 0xFFFFFFFF
    state[b] = rotl32(state[b] ^ state[c], 7)
}

A block is 20 rounds, alternating four "column" rounds and four "diagonal" rounds. That's it for the core cipher.

The state is seeded with four constants ("expand 32-byte k"), a 256-bit key, a 32-bit counter, and a 96-bit nonce. To encrypt, you generate keystream blocks and XOR with the plaintext. Decryption is the same function — XOR is its own inverse.

◉ Poly1305 is one polynomial evaluation

Poly1305 authenticates a message by evaluating a polynomial over the prime field GF(2¹³⁰ − 5), with the message chunked into 128-bit coefficients. The "clamping" step in § 2.5.1 zeroes specific bits of the key so multiplication behaves nicely modulo a power of two; that's the only trick. The whole MAC fits in one function:

pub fn poly1305_mac(key: list, msg: list) -> list {
    let r = clamp_r(key)                 // low 128 bits, clamped
    let s = key_to_int(key, 16, 16)      // high 128 bits, masking pad
    let prime = 1329227995784915872903807060280344576   // 2^130 - 5
    let acc = 0
    let i = 0
    while i < len(msg) {
        let block = chunk(msg, i, 16)
        let n = read_le(block) + (1 << (8 * len(block)))
        acc = ((acc + n) * r) % prime
        i = i + 16
    }
    acc = acc + s
    return int_to_le(acc, 16)
}

Compare this to, say, GHASH — the MAC inside AES-GCM — which requires a carryless-multiplication primitive and an irreducible polynomial table. Poly1305's one big-integer multiply-and-add is the entire construction.

◉ Putting them together

AEAD is the pairing: ChaCha20 to encrypt, Poly1305 to authenticate, with a one-time Poly1305 key derived from ChaCha20's own block zero. Here's the full seal from the Wave 9 demo:

import chacha20
import poly1305

fn seal(key: list, nonce: list, plaintext: list) -> list {
    // Derive the one-time Poly1305 key from ChaCha20 block 0.
    let p1key = slice(chacha20_block(key, nonce, 0), 0, 32)
    // Encrypt plaintext starting at counter=1 (counter=0 is reserved).
    let ct    = chacha20_xor(key, nonce, 1, plaintext)
    // MAC the ciphertext (RFC 8439 §2.8).
    let tag   = poly1305_mac(p1key, ct)
    return ct + tag
}

Twelve lines of glue over two pure-Lateralus stdlib modules. Open the example file to see it called end-to-end after an X25519 key exchange.

◉ Test vectors that must pass or you have a bug

The spec gives exact test vectors. We pin them in tests/stdlib_spiral_wave_9.ltl:

When all three pass, you haven't just "tried to implement it" — you've implemented it.

◉ "But isn't rolling crypto evil?"

Yes, if by "crypto" you mean a novel construction. No, if you mean implementing a published standardised primitive in a language you already trust, with byte-for-byte test vectors, and reading the spec rather than copy-pasting from StackOverflow.

The thing we avoid by shipping this is an opaque dependency: a pre-built libsodium.so or cryptography.whl that your attacker can substitute at install time and you can't audit. The Lateralus stdlib files are text. They're diffable. They fit on screen. If the BLAKE2s IV table on line 8 of blake2s.ltl doesn't match the RFC, your grep will find it immediately.

◉ Performance is fine

In the tree-walking interpreter, these are too slow for bulk data. That's by design — the interpreter is the correctness baseline. The C99 backend (lateralus cgcc -O2) produces native binaries with byte-identical output, and at that point ChaCha20 is throughput-competitive with any other pure-C implementation. On compute-heavy benchmarks today the C99 backend is 30–60× faster than CPython.

◉ The surrounding cast

ChaCha20 + Poly1305 is just the AEAD. To use it in a real handshake you want:

Five files, five RFCs, and you have a readable WireGuard-shaped handshake in a language that reads like Python. That's what the Wave 9 "AEAD lane" ships.

If you're shopping for a scripting language to prototype a secure-channel protocol — and you don't want to link against OpenSSL or pull in a 40 MB cryptography wheel — this is the niche Lateralus is built for.