Benchmarks - Tauq Performance

Verified Performance

Primary benchmark: 1,000 user records with tiktoken cl100k_base (GPT-4/Claude tokenizer).

JSON (minified)

24,005 tokens baseline

TOON

12,002 tokens -50.0%

Tauq SOTA

11,012 tokens -54.1%

12,993

Tokens Saved vs JSON

10.8%

Better than TOON (avg)

$0.04

Saved per 1K Records

* All measurements verified with tiktoken cl100k_base tokenizer
** Cost based on GPT-4o/Claude 3.5 Sonnet pricing ($3 per 1M input tokens)
*** TOON encoded via toon-python library for fair comparison. Tauq v0.1.0

Benchmark Methodology

Tokenizer

• tiktoken cl100k_base
• Used by GPT-4, GPT-4o, and Claude
• Industry-standard BPE tokenization
• Reproducible with Python tiktoken library

Fair Comparison

• TOON via toon-python library
• Proper [N]{fields}: headers
• Comma-delimited values per spec
• No artificial handicaps

Complete Results

We tested across 9 different dataset types to ensure a comprehensive comparison. All token counts verified with tiktoken cl100k_base. Results vary by dataset - flat structured data shows the highest efficiency gains (up to 54% vs JSON), while nested data shows more modest improvements (aggregate: 44% vs JSON, 11% vs TOON).

Dataset	Tauq vs JSON	JSON	Tauq	TOON	Tauq vs TOON	Winner
flat_100 100 user records (5 fields)	-53.8%	2,402	1,109	1,199	-7.5%	Tauq
flat_1000 1,000 user records (5 fields)	-54.1%	24,005	11,012	12,002	-8.2%	Tauq
mixed_structure Nested objects with arrays	-41.2%	689	405	457	-11.4%	Tauq
deeply_nested 10 orgs with deep nesting	-23.3%	6,990	5,359	7,630	-29.8%	Tauq
wide_records 100 records with 15 fields each	-53.1%	10,494	4,925	4,891	+0.7%	TOON
heterogeneous 100 records with varying schemas	-18.5%	1,594	1,299	1,765	-26.4%	Tauq
timeseries 200 timestamp/value pairs	-19.9%	5,003	4,007	3,799	+5.5%	TOON
ecommerce Product catalog with nested data	-42.1%	2,970	1,719	1,879	-8.5%	Tauq
api_response Paginated API response	-30.0%	1,089	762	706	+7.9%	TOON
config_style Realistic application config	15.6%	411	475	502	-5.4%	Tauq
TOTAL	-44.2%	55,647	31,072	34,830	-10.8%	Tauq

Tauq wins 7 of 10 dataset types. TOON wins on wide_records, timeseries, and api_response.

Performance by Category

-8.2%

Flat Tabular Data

TOON's primary use case

-26.4%

Heterogeneous Data

100 records with varying schemas

-29.8%

Deeply Nested

10 orgs with deep nesting

Primary Benchmark: 1,000 Records

Results Summary

Format	Tokens	vs JSON	vs TOON	Cost Savings*
JSON	24,005	baseline	+100%	$0.00
TOON	12,002	-50.0%	baseline	$0.036
Tauq	11,012	-54.1%	-8.2%	$0.039

* Cost based on GPT-4o/Claude Sonnet 4 pricing ($3 per 1M input tokens, Nov 2025)

** TOON encoded via toon-python library for fair comparison. Tauq v0.1.0

Why Tauq Beats TOON

1. Space Delimiters Beat Commas

In cl100k_base, spaces are often absorbed into preceding tokens, while commas create separate tokens.

TOON (comma-delimited)

1,User1,user1@example.com,21,false

Tauq (space-delimited)

1 User1 user1@example.com 21 false

2. Simpler Schema Syntax

Tauq's !def is more compact than TOON's count-prefixed headers.

TOON: Inline header with count

[1000]{id,name,email,age,active}:

Tauq: Clean schema definition

!def Record id name email age active

3. Bareword-Friendly Values

Common patterns like emails and paths don't need quotes in Tauq.

JSON: 3 tokens per quoted string

"user@example.com"

Tauq: 1 token bareword

user@example.com

4. No Count Prefix Required

TOON requires knowing array length upfront. Tauq supports true streaming.

Real-World Impact

$39-$65

Saved per 1M records*

12,993

Fewer tokens per 1K records

54%

Reduction vs JSON

10.8%

More efficient than TOON

* Based on GPT-4o/Claude Sonnet ($3/1M) to Claude Opus 4.5 ($5/1M) pricing as of Nov 2025

Sample Output Comparison

Here's how the same data looks in each format (first 3 records of flat_100):

JSON (minified)

[{"id":1,"name":"User1","email":"user1@example.com","age":21,"active":false},{"id":2,"name":"User2","email":"user2@example.com","age":22,"active":true},{"id":3,"name":"User3","email":"user3@example.com","age":23,"active":false}]

~72 tokens for 3 records

TOON (v3.0 spec)

[3]{id,name,email,age,active}:
  1,User1,user1@example.com,21,false
  2,User2,user2@example.com,22,true
  3,User3,user3@example.com,23,false

~42 tokens for 3 records

Tauq

!def Record id name email age active
1 User1 user1@example.com 21 false
2 User2 user2@example.com 22 true
3 User3 user3@example.com 23 false

~33 tokens for 3 records

Transparency Notes

• TOON wins in 3 scenarios: wide_records (+0.7%), timeseries (+5.5%), and api_response (+7.9%) where TOON's compact inline syntax helps.
• Tauq's advantage comes from: space delimiters (absorbed into tokens), bareword emails/paths, and simpler schema syntax.
• TOON encoded fairly: We use the official toon-python library, not a strawman implementation.
• Reproducible: Run docker build -t tauq-benchmark . && docker run --rm tauq-benchmark in the benchmarks directory to verify all numbers.

Run Your Own Benchmarks

Verify these results yourself using our benchmark suite:

git clone https://github.com/epistates/tauq
cd tauq/benchmarks
docker build -t tauq-benchmark .
docker run --rm tauq-benchmark

The benchmark generates all test datasets, encodes them in JSON/TOON/Tauq, and counts tokens with tiktoken cl100k_base.

Feature Comparison

Feature	Tauq	TOON	JSON
Token efficiency (flat data)	-54% vs JSON	-50% vs JSON	baseline
Streaming support	Native (no count prefix)	Requires [N] prefix	SAX/streaming parsers
Schema reuse across document	Yes (!def/!use)	No (inline only)	No
Comments	Yes (#)	No	No
File imports	Yes (!import)	No	No
Query language	Yes (TQQ)	No	jq (external)
Spec maturity	1.0	v3.0	RFC 8259