gni-compression is on npm — What a month of building a domain-adaptive LLM compressor taught me
Tech

gni-compression is on npm — What a month of building a domain-adaptive LLM compressor taught me

Seven articles ago I shipped a serialization layer that recovered 1M+ messages losslessly. Today the package is on npm and the compression numbers are real. Here's where I landed. What shipped gni-compression is a domain-adaptive lossless compression package for LLM conversation data. It's a Rust native binary (via napi-rs) with a thin JS wrapper. Two functions: const { compress, decompress } = require('gni-compression') const compressed = await compress(Buffer.from(longContext)) const restored = await decompress(compressed) // lossless, verified No warmup. No session state. The domain knowledge is baked into a pre-trained dictionary (gcdict.bin) bundled with the package — trained on real LLM conversation corpora. The numbers Benchmarked against brotli-6 across five public corpora (50 messages each, lossless round-trip verified): Corpus GN Ratio Savings brotli-6 WildChat 4.94x 79.8% ~2.1x ShareGPT 8.65x 88.4% ~2.0x LMSYS 10.38x 90.4% ~2.1x Ubuntu IRC 8.40x 88.1% ~1.2x Claude convos 12.40x 91.9% ~1.9x Ubuntu IRC is the surprising one. Messages average 67 bytes — too short for brotli to do much (1.2x). GN gets 8.4x because IRC vocabulary is extremely consistent. Short repetitive messages are where a domain dictionary wins hardest. Why the numbers are what they are The architecture splits input into separate token-ID and literal streams before compression. Token IDs are compact integers referencing the pre-trained vocabulary. Literals are the residual bytes that didn't match anything in the dictionary. Each stream compresses independently with different characteristics. The tok stream is tiny (integers, high redundancy). The lit stream is whatever didn't compress semantically — it gets deflate with the GCdict applied. When I swept minimum phrase length I found the vocabulary isn't a smooth distribution — it's two clusters with a gap: · minLen 4→5: token count drops 68% (short filler tokens) · minLen 5–9: flat, essentially nothing lives here · minLen 10+: another 84% drop (long phrase tokens) This means compression cuts filler preferentially. That's probably why we see a small consistent downstream quality improvement when feeding compressed context back to models — the signal-to-noise ratio improves. What it took to get here Phase 1 (article 1) was a serialization layer. Caught a CRC32 bug in our own validation before it hit anyone. Getting from that to a published package with real compression ratios took: figuring out why the pure JS engine lost to brotli on every corpus (it does — the Rust GCdict pipeline is what actually wins), solving the round-trip problem (the raw split format has no inverse without the original buffer — I had to rebuild around the interleaved format), and training a dictionary that generalizes across corpora without overfitting any single one. The version history on npm reflects that — 3.x was the interleaved pipeline, 4.x settled the API. Why I built it I'm building NN Dash, a persistent AI agent scaffold that routes across Claude, GPT, and local Ollama. The goal is to make a long-running AI relationship essentially free. GN is what makes multi-thousand-message context sessions viable without the token bill killing it. The compression work got an NLNet grant application. The algorithm is solid enough to write up formally. Use it npm install gni-compression const { compress, decompress } = require('gni-compression') const compressed = await compress(Buffer.from(longContext)) const restored = await decompress(compressed) Source: github.com/atomsrkull/glasik-core (MIT) Feedback on the numbers, methodology, or use cases welcome.

Read full story →

Comments

Loading comments…

Related