TBF Overview

Understanding the format structure, encoding types, and wire format details of Tauq Binary Format.

Format Structure

TBF files consist of a fixed header followed by columnar data:

[MAGIC:4][VERSION:1][ROW_COUNT:varint][COL_COUNT:varint][COL_1][COL_2]...[COL_N]
Magic Header: 0x54 0x42 0x46 0x01 ("TBF" + version)
Row Count: Variable-length integer: Number of records
Column Count: Variable-length integer: Number of fields per record
Column Data: Each column encoded sequentially

Encoding Types

Integer Encodings

Encoding Range Bytes
I8 -128 to 127 1
I16 -32,768 to 32,767 2
I32 -2B to 2B 4
I64 Full i64 range 8
U8 0 to 255 1
U16 0 to 65,535 2
U32 0 to 4B 4
U64 Full u64 range 8
VarInt Adaptive 1-10

Offset Encodings

Store values relative to a base offset, useful for constrained ranges:

// Age values 18-100 stored as 0-82 in a single byte
#[tauq(encoding = "u8", offset = 18)]
age: u32,

// Years 2020-2275 stored as 0-255
#[tauq(encoding = "u8", offset = 2020)]
year: u32,

String Encodings

Encoding Best For Description
Dictionary Repeated values Builds dictionary, stores indices
Inline Unique values Length-prefixed UTF-8
Auto Unknown Analyzes data, picks best

Float Encodings

Encoding Precision Bytes
Float32 ~7 digits 4
Float64 ~15 digits 8

Columnar Layout

TBF uses columnar encoding for tabular data, storing all values of each column together:

Traditional (Row-Major)

[id:1, name:"Alice", age:30]
[id:2, name:"Bob", age:25]
...

TBF (Column-Major)

[1, 2, 3, ...]
[Alice, Bob, Carol, ...]
[30, 25, 28, ...]

Benefits

  • Better compression: Similar values stored together
  • SIMD-friendly: Process multiple values at once
  • Selective reading: Read only needed columns

Adaptive Encoding

The AdaptiveIntEncoder analyzes values and picks optimal encoding automatically:

// Small range (0-255): uses U8
let ages = vec![25, 30, 28, 35, 42];

// Large range: uses VarInt
let ids = vec![1, 1000, 1000000, 999999999];

Wire Format Details

Varint Encoding

Variable-length integers use continuation bits for compact representation:

Value 0-127:     [0xxxxxxx]
Value 128-16383: [1xxxxxxx][0xxxxxxx]
...continues for larger values

Column Header

Each column starts with type information and value count:

[COLUMN_TYPE:varint][VALUE_COUNT:varint][...data...]

Column Types:

  • 0: Bool (packed bits)
  • 1: Float32
  • 2: Float64
  • 3: VarInt
  • 4: Dictionary string
  • 5: Inline string
  • 6-13: Fixed-width integers (I8, I16, I32, I64, U8, U16, U32, U64)

Performance Characteristics

Operation Complexity Notes
Encode O(n) Single pass
Decode O(n) Single pass
Random access O(1) With index
Compression ~84% vs JSON Depends on data

Full Encoding Example

use tauq::tbf::{TableSchemaBuilder, FieldEncoding};

// Define schema
let schema = TableSchemaBuilder::new()
    .column("id", FieldEncoding::U16)
    .column("name", FieldEncoding::Dictionary)
    .column("age", FieldEncoding::U8Offset { offset: 18 })
    .column("salary", FieldEncoding::Float64)
    .build();

// Encode data
let bytes = data.encode_with_schema(&schema);