The columnar store is the unfair advantage.

Every other eBPF observability tool routes data to an external database. aacyn keeps it in-process — a custom C columnar store with SIMD acceleration that ingests 5M events/sec and scans 5M rows in under 300 microseconds.

Benchmark Results

All measurements from a single Minisforum UM890 Pro (Ryzen 9 8945HS, 32GB DDR5). Full methodology and reproducibility instructions in BENCHMARKS.md.

5,089,364Ingestion throughputevents/sec — single consumer mini PC

286μsScan latency (max duration)across 5M events — AVX-512, 16 floats/cycle

35μsScan latency (error count)across 5M events — uint8 column fits in L2 cache

16msp99 ingestion latencyat 50K concurrent requests/sec

0.00%Error rate under load254M events, zero errors, zero drops

SIMD Scan Performance

5 million events · 62MB columnar data · AVX-512 on AMD Ryzen 9 8945HS. All scans complete in under half a millisecond at p99.

OperationMedianp99Effective Rate

scan_duration_max286μs402μs17.5B events/sec

scan_error_count35μs60μs141.6B events/sec

scan_duration_filter (>10ms)298μs415μs16.8B events/sec

How It Works

Five design decisions that eliminate the bottlenecks in traditional observability pipelines.

Columnar (SoA) layout

Timestamps, durations, and error flags are stored in separate contiguous arrays — not interleaved rows. The CPU prefetcher sees a straight line through memory. No pointer chasing, no cache misses from mixed-type rows.

mmap'd ring buffer

The entire store is backed by a memory-mapped file. The OS manages write-back to disk asynchronously. On restart, the ring buffer is recovered directly from the page cache — no replay, no WAL, no recovery log.

FlatBuffer binary protocol

Events are ingested as pre-serialized FlatBuffer payloads — 16 bytes per event, no JSON parsing. The C engine reads the buffer with a bounds check and memcpy's directly into the column arrays. Zero allocation in the hot path.

AVX-512 / NEON SIMD scans

Queries compile to SIMD intrinsics at build time. AVX-512 processes 16 floats per instruction; NEON processes 4. Both paths have a scalar fallback. The scan_duration_max function reads 5M floats in 286μs — that's 17.5 billion effective events/sec of scan bandwidth.

bun:ffi — zero-copy FFI

TypeScript calls into C through Bun's FFI layer with a raw pointer. No serialization, no V8 GC pressure, no context switching. The TS side passes a pointer; the C side writes directly into the output buffer. Round-trip is measured in nanoseconds.

Industry Comparison

Directionally valid comparisons from publicly documented benchmarks.

PlatformThroughputHardware

aacyn5,089,364 evt/sec1× mini PC (8C/16T)

ClickHouse (logs)~120,000 rows/secSingle node (8C, 16GB)

Vector (Datadog)~76 MiB/secFile-to-TCP pipeline

These are different workloads — ClickHouse persists to disk with full indexing; Vector routes bytes through a pipeline. aacyn's advantage is architectural: by keeping data in columnar memory and avoiding external database round-trips, it eliminates the largest sources of latency in observability pipelines. See BENCHMARKS.md for detailed apples-to-apples notes.

Binary Protocol vs. JSON

The same ingestion pipeline, comparing FlatBuffer binary payloads against equivalent JSON.

JSON (Path A)314K evt/secp95: 218.79ms

→

Binary (Path B)5.09M evt/secp95: 12.73ms

16.2×throughput improvement

Reproduce the benchmarks yourself.

Everything is open source. The benchmark harness, the data generator, and the methodology are all in the repository.

BENCHMARKS.md →Architecture comparison →