Verified Performance
Two formats for different needs: TQN for LLM token efficiency, TBF for binary size optimization.
Token Efficiency
Best for LLM inputs • 1,000 records
Binary Size
Best for storage & network • 1,000 records
Token counts verified with tiktoken cl100k_base (GPT-4o/Claude tokenizer)
Cost based on GPT-4o/Claude Sonnet pricing ($3/1M input tokens, March 2026)
Choose the Right Format
TQN (Text)
Tauq Notation
- • 49% fewer tokens than JSON
- • Human-readable and editable
- • Best for LLM prompts and responses
- • Schema reuse with
!def/!use - • Streaming parser support
TBF (Binary)
Tauq Binary Format
- • Up to 83% smaller than JSON
- • CLI generic: ~44-56% reduction, Schema-aware: ~83% reduction
- • Columnar encoding for analytics
- • Best for storage and network transfer
- • Apache Iceberg integration
Benchmark Methodology
Tokenizer
- • tiktoken cl100k_base
- • Used by GPT-4o, Claude 3.5/4
- • Industry-standard BPE
- • Reproducible results
Fair Comparison
- • TOON via toon-python
- • Proper spec-compliant encoding
- • No artificial handicaps
- • Same source data
Reproducibility
- • Docker container
- • Seeded random data
- • Open source benchmark suite
- • 408 automated tests
Token Efficiency Results
10 datasets testing different data structures. All counts verified with tiktoken cl100k_base.
| Dataset | vs JSON | JSON | TQN | TOON | vs TOON | Winner |
|---|---|---|---|---|---|---|
| flat_100 100 user records (5 fields) | -53.8% | 2,402 | 1,109 | 1,411 | -21.4% | TQN |
| flat_1000 1,000 user records (5 fields) | -54.1% | 24,005 | 11,012 | 14,015 | -21.4% | TQN |
| mixed_structure Nested objects with arrays | -41.2% | 689 | 405 | 457 | -11.4% | TQN |
| wide_records 100 records with 10 fields each | -55.1% | 6,494 | 2,915 | 3,120 | -6.6% | TQN |
| heterogeneous Records with varying schemas | -20.8% | 72 | 57 | 98 | -41.8% | TQN |
| timeseries 200 timestamp/value pairs | -19.9% | 5,003 | 4,007 | 4,008 | -0.0% | TQN |
| ecommerce Product catalog with nested data | -42.1% | 2,970 | 1,719 | 1,779 | -3.4% | TQN |
| api_response Paginated API response | -30.0% | 1,089 | 762 | 706 | +7.9% | TOON |
| config_style Application config object | 6.1% | 49 | 52 | 58 | -10.3% | TQN |
| TOTAL | -48.5% | 42,773 | 22,038 | 25,652 | -14.1% | TQN |
TQN wins 7 of 9 datasets. TOON wins on api_response. Tied on timeseries.
Binary Size Results
TBF's columnar encoding with schema-aware compression achieves dramatic size reductions.
| Dataset | JSON | TQN | TQN vs JSON | TBF | TBF vs JSON |
|---|---|---|---|---|---|
| flat_1000 1,000 user records | 92 KB | 43 KB | -53% | 16 KB | -83% |
| ecommerce Product catalog | 45 KB | 22 KB | -51% | 8 KB | -82% |
| timeseries 200 data points | 38 KB | 28 KB | -26% | 6 KB | -84% |
Why TQN Beats TOON on Tokens
1. Space Delimiters Beat Commas
Spaces often merge with adjacent tokens during BPE, while commas create separate tokens.
1,User1,user1@example.com,21,false 1 User1 user1@example.com 21 false 2. Simpler Schema Syntax
TQN's !def is more compact than TOON's count-prefixed headers.
[1000]{id,name,email,age,active}: !def Record id name email age active 3. No Count Prefix Required
TOON requires knowing array length upfront. TQN supports true streaming.
TBF Compression: Two Paths
Generic Encoding (CLI: ~44-56% reduction)
Fast, no setup: tauq build data.tqn --format tbf
Standard serde serialization without schema knowledge. Perfect for quick conversions and dynamic data.
Schema-Aware Encoding (Rust API: ~83% reduction)
Best compression with type hints: #[derive(TableEncode)]
Uses compile-time schema + columnar layout for maximum compression:
Columnar Encoding
All values of each column stored together, enabling better compression of similar values.
Adaptive Integers
Automatically selects U8/U16/U32/VarInt based on actual value ranges.
Dictionary Compression
Repeated strings stored once, referenced by index.
Schema Hints
Offset encoding: store ages 18-100 as 0-82 in a single byte.
Format Comparison
Same data in each format (3 records):
JSON (minified)
[{"id":1,"name":"User1","email":"user1@example.com","age":21,"active":false},{"id":2,"name":"User2","email":"user2@example.com","age":22,"active":true},{"id":3,"name":"User3","email":"user3@example.com","age":23,"active":false}] TOON (v3.0 spec)
[3]{id,name,email,age,active}:
1,User1,user1@example.com,21,false
2,User2,user2@example.com,22,true
3,User3,user3@example.com,23,false TQN (Tauq Notation)
!def Record id name email age active
1 User1 user1@example.com 21 false
2 User2 user2@example.com 22 true
3 User3 user3@example.com 23 false TBF (Binary)
// TBF: Binary columnar encoding
// [MAGIC][rows:3][cols:5]
// [col:id → U16][1,2,3]
// [col:name → Dict]["User1","User2","User3"]
// [col:email → Dict][...]
// [col:age → U8][21,22,23]
// [col:active → Bool][0b101]
// Total: ~35 bytes vs 180 bytes JSON Real-World Impact
* Based on GPT-4o/Claude Sonnet ($3/1M) to Claude Opus ($5/1M) pricing, March 2026
Transparency Notes
- • TOON wins in 1 scenario: api_response (+7.9%). Tied on timeseries.
- • TQN's advantage: space delimiters, bareword values, simpler schema syntax
- • TBF's advantage: columnar layout, adaptive compression, schema-aware encoding
- • Fair comparison: TOON encoded via official toon-python library
Run Your Own Benchmarks
Verify these results yourself:
git clone https://github.com/epistates/tauq
cd tauq/benchmarks
docker build -t tauq-benchmark .
docker run --rm tauq-benchmark Feature Comparison
| Feature | TQN | TBF | TOON | JSON |
|---|---|---|---|---|
| Token efficiency | -49% vs JSON | N/A (binary) | -40% vs JSON | baseline |
| Binary size | -51% vs JSON | -83% vs JSON | ~-45% vs JSON | baseline |
| Human readable | Yes | No | Yes | Yes |
| Streaming support | Native | Batch | Requires [N] | SAX parsers |
| Schema reuse | !def/!use | Type hints | Inline only | No |
| Iceberg integration | No | Yes | No | No |
| Comments | Yes (#) | No | No | No |