Verified Performance

Two formats for different needs: TQN for LLM token efficiency, TBF for binary size optimization.

Token Efficiency

Best for LLM inputs • 1,000 records

JSON (minified)
24,005 tokens baseline
TOON
14,015 tokens -41.6%
Tauq (TQN) BEST
11,012 tokens -54.1%
49%
Fewer tokens vs JSON

Binary Size

Best for storage & network • 1,000 records

JSON (minified)
92 KB baseline
Tauq (TQN)
43 KB -53%
Tauq (TBF) BEST
16 KB -83%
83%
Smaller than JSON
12,993
Tokens Saved vs JSON
14.1%
Better than TOON
$0.04
Saved per 1K Records
408
Tests Passing

Token counts verified with tiktoken cl100k_base (GPT-4o/Claude tokenizer)
Cost based on GPT-4o/Claude Sonnet pricing ($3/1M input tokens, March 2026)

Choose the Right Format

📝

TQN (Text)

Tauq Notation

  • 49% fewer tokens than JSON
  • • Human-readable and editable
  • • Best for LLM prompts and responses
  • • Schema reuse with !def/!use
  • • Streaming parser support

TBF (Binary)

Tauq Binary Format

  • Up to 83% smaller than JSON
  • • CLI generic: ~44-56% reduction, Schema-aware: ~83% reduction
  • • Columnar encoding for analytics
  • • Best for storage and network transfer
  • • Apache Iceberg integration

Benchmark Methodology

Tokenizer

  • tiktoken cl100k_base
  • • Used by GPT-4o, Claude 3.5/4
  • • Industry-standard BPE
  • • Reproducible results

Fair Comparison

  • TOON via toon-python
  • • Proper spec-compliant encoding
  • • No artificial handicaps
  • • Same source data

Reproducibility

  • Docker container
  • • Seeded random data
  • • Open source benchmark suite
  • • 408 automated tests

Token Efficiency Results

10 datasets testing different data structures. All counts verified with tiktoken cl100k_base.

Dataset vs JSON JSON TQN TOON vs TOON Winner
flat_100
100 user records (5 fields)
-53.8% 2,402 1,109 1,411 -21.4% TQN
flat_1000
1,000 user records (5 fields)
-54.1% 24,005 11,012 14,015 -21.4% TQN
mixed_structure
Nested objects with arrays
-41.2% 689 405 457 -11.4% TQN
wide_records
100 records with 10 fields each
-55.1% 6,494 2,915 3,120 -6.6% TQN
heterogeneous
Records with varying schemas
-20.8% 72 57 98 -41.8% TQN
timeseries
200 timestamp/value pairs
-19.9% 5,003 4,007 4,008 -0.0% TQN
ecommerce
Product catalog with nested data
-42.1% 2,970 1,719 1,779 -3.4% TQN
api_response
Paginated API response
-30.0% 1,089 762 706 +7.9% TOON
config_style
Application config object
6.1% 49 52 58 -10.3% TQN
TOTAL -48.5% 42,773 22,038 25,652 -14.1% TQN

TQN wins 7 of 9 datasets. TOON wins on api_response. Tied on timeseries.

Binary Size Results

TBF's columnar encoding with schema-aware compression achieves dramatic size reductions.

Dataset JSON TQN TQN vs JSON TBF TBF vs JSON
flat_1000
1,000 user records
92 KB 43 KB -53% 16 KB -83%
ecommerce
Product catalog
45 KB 22 KB -51% 8 KB -82%
timeseries
200 data points
38 KB 28 KB -26% 6 KB -84%

Why TQN Beats TOON on Tokens

1. Space Delimiters Beat Commas

Spaces often merge with adjacent tokens during BPE, while commas create separate tokens.

TOON (comma-delimited)
1,User1,user1@example.com,21,false
TQN (space-delimited)
1 User1 user1@example.com 21 false

2. Simpler Schema Syntax

TQN's !def is more compact than TOON's count-prefixed headers.

TOON: Inline header with count
[1000]{id,name,email,age,active}:
TQN: Clean schema definition
!def Record id name email age active

3. No Count Prefix Required

TOON requires knowing array length upfront. TQN supports true streaming.

TBF Compression: Two Paths

Generic Encoding (CLI: ~44-56% reduction)

Fast, no setup: tauq build data.tqn --format tbf

Standard serde serialization without schema knowledge. Perfect for quick conversions and dynamic data.

Schema-Aware Encoding (Rust API: ~83% reduction)

Best compression with type hints: #[derive(TableEncode)]

Uses compile-time schema + columnar layout for maximum compression:

Columnar Encoding

All values of each column stored together, enabling better compression of similar values.

Adaptive Integers

Automatically selects U8/U16/U32/VarInt based on actual value ranges.

Dictionary Compression

Repeated strings stored once, referenced by index.

Schema Hints

Offset encoding: store ages 18-100 as 0-82 in a single byte.

Format Comparison

Same data in each format (3 records):

JSON (minified)

[{"id":1,"name":"User1","email":"user1@example.com","age":21,"active":false},{"id":2,"name":"User2","email":"user2@example.com","age":22,"active":true},{"id":3,"name":"User3","email":"user3@example.com","age":23,"active":false}]
~72 tokens • 180 bytes

TOON (v3.0 spec)

[3]{id,name,email,age,active}:
  1,User1,user1@example.com,21,false
  2,User2,user2@example.com,22,true
  3,User3,user3@example.com,23,false
~42 tokens • 140 bytes

TQN (Tauq Notation)

!def Record id name email age active
1 User1 user1@example.com 21 false
2 User2 user2@example.com 22 true
3 User3 user3@example.com 23 false
~33 tokens • 130 bytes

TBF (Binary)

// TBF: Binary columnar encoding
// [MAGIC][rows:3][cols:5]
// [col:id → U16][1,2,3]
// [col:name → Dict]["User1","User2","User3"]
// [col:email → Dict][...]
// [col:age → U8][21,22,23]
// [col:active → Bool][0b101]
// Total: ~35 bytes vs 180 bytes JSON
N/A tokens • ~35 bytes

Real-World Impact

$39-$65
Saved per 1M records*
12,993
Tokens saved per 1K records
83%
Binary size reduction
Iceberg
Data lake ready

* Based on GPT-4o/Claude Sonnet ($3/1M) to Claude Opus ($5/1M) pricing, March 2026

Transparency Notes

  • TOON wins in 1 scenario: api_response (+7.9%). Tied on timeseries.
  • TQN's advantage: space delimiters, bareword values, simpler schema syntax
  • TBF's advantage: columnar layout, adaptive compression, schema-aware encoding
  • Fair comparison: TOON encoded via official toon-python library

Run Your Own Benchmarks

Verify these results yourself:

git clone https://github.com/epistates/tauq
cd tauq/benchmarks
docker build -t tauq-benchmark .
docker run --rm tauq-benchmark

Feature Comparison

Feature TQN TBF TOON JSON
Token efficiency -49% vs JSON N/A (binary) -40% vs JSON baseline
Binary size -51% vs JSON -83% vs JSON ~-45% vs JSON baseline
Human readable Yes No Yes Yes
Streaming support Native Batch Requires [N] SAX parsers
Schema reuse !def/!use Type hints Inline only No
Iceberg integration No Yes No No
Comments Yes (#) No No No