Sharded GROUP BY - Parallel Merge

 Thread 0   Thread 1    Thread 2    Thread 3       Final Result
┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐       ┌─────────┐
│Shard0│────│Shard0│────│Shard0│────│Shard0│──────►│ Result0 │
└──────┘    └──────┘    └──────┘    └──────┘       └─────────┘
┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐       ┌─────────┐
│Shard1│────│Shard1│────│Shard1│────│Shard1│──────►│ Result1 │
└──────┘    └──────┘    └──────┘    └──────┘       └─────────┘
┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐       ┌─────────┐
│Shard2│────│Shard2│────│Shard2│────│Shard2│──────►│ Result2 │
└──────┘    └──────┘    └──────┘    └──────┘       └─────────┘
┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐       ┌─────────┐
│Shard3│────│Shard3│────│Shard3│────│Shard3│──────►│ Result3 │
└──────┘    └──────┘    └──────┘    └──────┘       └─────────┘

4 parallel merges instead of 1! No key appears in multiple results.
Result: No single-threaded bottleneck!

Unorthodox Java: Building QuestDB

High-Performance Time-Series Database

About Me

What is QuestDB?

The Numbers

What do we mean by "high-performance"?

Demo

Language Breakdown

Implementation

But... Unorthodox Java!

The Big Question

Why Java at all?!

Java is great!

It gives us

But also: Rust wasn’t an option yet

Rust 1.0, was published on May 15, 2015

QuestDB Design Principles

Core Philosophy

The founder has a background in High Frequency Trading

Why Zero-GC?

We have ZGC, Shenandoah, Azul C4 but...

What This Led To...

Three Core Disciplines

DISCIPLINE 1: MEMORY

The Foundation of Performance

Frontend vs Backend Memory Strategy

Frontend

Backend

What Our Stdlib Provides

Memory Technique 1: Zero Allocation

Single-threaded pools

Memory Technique 1b: Zero Allocation

Postgres Wire Protocol and double columns

DEMO - PGConnectionContext.appendRecord()

Memory Technique 1c: Zero Allocation

Memory Technique 2: Off-Heap Memory

Memory-Mapped Files (mmap)

Flyweight Pattern with Off-heap

Memory Discipline Recap

DISCIPLINE 2: EXECUTION

Making Every CPU Cycle Count

Memory Layout Matters

Row vs Columnar Storage

Traditional Row Storage

Columnar Storage

QuestDB's Secret: Time Ordering

Why Time Ordering Matters

Efficient Time Filtering

Cache Locality

What is SIMD? Single Instruction, Multiple Data

Explicit SIMD in Java I

Explicit SIMD in Java II

Execution Technique 1: JIT compiled SIMD filters

Not Java Vector API - Our Own JIT!

Execution Technique 1b: JIT Architecture

SQL JIT Architecture

JIT Performance Impact

Real Query Example

Live DEMO (actually faster, because multithreaded)

Pre-JIT Filtering

JIT Filtering

Execution Technique 2: Runtime Bytecode Generation

Custom Comparators for ORDER BY

Traditional Generic Comparator

QuestDB: Runtime Bytecode Generation

Demo: RecordComparatorCompiler

Generated Class Structure

QuestDB: Runtime Bytecode Generation

DISCIPLINE 3: CONCURRENCY

Scaling Without Contention

Parallel GROUP BY Evolution

The Journey to Scale

Single-threaded GROUP BY

Concurrent GROUP BY?

Single Writer Principle Violated!

Naive Parallel GROUP BY I

Naive Parallel GROUP BY II

Solution: Sharded GROUP BY

Sharded GROUP BY - Parallel Merge

The Three Disciplines - Recap

Postgres Wire Protocol and `double` columns

DEMO - `PGConnectionContext.appendRecord()`

Demo: `RecordComparatorCompiler`