• 3 min read

Building Zue Part 1: Files Are Just Arrays (If You're Brave)


The “Oh No” Moment

It was 3 AM. My OS kernel assignment was “working.” I had processes switching context. I had a networking stack that… technically networked.

But I was bored.

For this project I wanted to learn how computers talk to other computers without setting the server room on fire. I wanted to know why Kafka is so fast and why S3 never loses my cat photos.

So I did the logical thing: I threw away my kernel (I had a ton of fun, definitely picking it up later) and decided to build a distributed storage engine in Zig.

I call it Zue. This is Part 1 of my journey, where I learn that files are actually just arrays, and the OS is hiding things from me.


The Philosophy: Append-Only

The first rule of high-performance storage: Random I/O is the enemy.

Spinning disks hate jumping around. Even NVMe drives prefer a straight line. So Zue is an Append-Only Log. We never overwrite data. We just add it to the pile.

When you “update” a key in Zue, we don’t go back and change the old value. we just write a new record at the end. The old one is dead to us (until compaction runs, which is a problem for Future Me).

The Log Structure

Here is how Zue actually organizes bytes on disk. It’s not just one giant file—it’s a series of specified Segments.

%%{init: {'theme':'base', 'themeVariables': {'fontSize':'23px'}, 'flowchart':{'nodeSpacing': 40, 'rankSpacing': 40}}}%% flowchart LR subgraph Log [Log] direction TB S1[Segment 0] S2[Segment 1] S3[Segment ...] end subgraph Segment [Segment] direction TB IndexFile[Index File] LogFile[Log File] end subgraph Index ["Index Entry (12B)"] direction TB I_CRC["CRC (4B)"] I_Off["Rel Offset (4B)"] I_Pos["Position (4B)"] end subgraph Record ["Log Record"] direction TB R_CRC["CRC (4B)"] R_TS["Timestamp (8B)"] R_Key_Len["Key len (4B)"] R_Key["Key"] R_Val_Len["Value len (4B)"] R_Val["Value"] end S2 --> Segment IndexFile --> Index LogFile --> Record style Log fill:transparent,stroke:#6366f1,stroke-width:3px style Segment fill:transparent,stroke:#6366f1,stroke-width:3px style Index fill:transparent,stroke:#6366f1,stroke-width:3px style Record fill:transparent,stroke:#6366f1,stroke-width:3px style S1 fill:transparent,stroke:#10b981,stroke-width:3px style S2 fill:transparent,stroke:#10b981,stroke-width:4px style S3 fill:transparent,stroke:#10b981,stroke-width:3px style IndexFile fill:transparent,stroke:#3b82f6,stroke-width:3px style LogFile fill:transparent,stroke:#3b82f6,stroke-width:3px style I_Off fill:transparent,stroke:#ef4444,stroke-width:3px style I_Pos fill:transparent,stroke:#ef4444,stroke-width:3px style I_CRC fill:transparent,stroke:#ef4444,stroke-width:3px style R_CRC fill:transparent,stroke:#f59e0b,stroke-width:3px style R_TS fill:transparent,stroke:#f59e0b,stroke-width:3px style R_Key_Len fill:transparent,stroke:#f59e0b,stroke-width:3px style R_Key fill:transparent,stroke:#f59e0b,stroke-width:3px style R_Val_Len fill:transparent,stroke:#f59e0b,stroke-width:3px style R_Val fill:transparent,stroke:#f59e0b,stroke-width:3px

Why segments? Because deleting a 10TB file to free up space is… not graceful. Deleting a 1GB segment file involving old data? Instant.


3. The Sparse Index

If you have a 10GB log file, how do you find key “user:123”?

  1. Scan everything? O(N). Too slow.
  2. Index every key? O(1). Too much RAM.

Zue uses a Sparse Index. We only write down the location of every Nth record (specifically, every 4KB of data).

To find “Banana”:

  1. Check the index. “Banana” is between “Apple” (Offset 1000) and “Cat” (Offset 5000).
  2. Jump to Offset 1000.
  3. Scan forward linearly until you hit “Banana”.

We trade a tiny bit of CPU (linear scan) for massive RAM savings. It’s the “good enough” engineering principle in action.


4. The Cheat Code: mmap

Upon some research, I found that every file operation required read() and write() syscalls. This meant every request went through the kernel, forcing the kernel to do its thing—checking permissions, copying buffers, and generally context-switching my CPU cycles away.

It’s slow.

If I already proved to the kernel that “I am not the bad guy,” it’s just really dumb to keep doing that again and again for every single request.

I wanted speed.

Enter mmap (memory-mapped I/O, aka the “Trust Me Bro” card). It lets you pretend a file on disk is just an array in RAM. You want to read byte 4000? data[4000]. You want to write? data[5000] = 0xFF. The OS handles the dirty work of flushing pages to disk.

%%{init: {'theme':'base', 'themeVariables': {'fontSize':'16px'}, 'flowchart':{'nodeSpacing': 60, 'rankSpacing': 50}}}%% flowchart TB App["Application Code<br/>ptr = mmap(file)"] subgraph Disk ["DISK"] F0["[0] 0x41"] F1["[1] 0x42"] F2["[2] 0x43"] F3["[3] 0x44"] end Map["mmap() creates<br/>mapping"] subgraph RAM ["PHYSICAL RAM"] R0["0x41"] R1["0x42"] R2["0x43"] R3["0x44"] end Ptr["ptr[0], ptr[1], ptr[2], ptr[3]<br/>Direct pointer access<br/>NO syscalls!"] Write["ptr[2] = 0xFF"] subgraph RAM2 ["RAM AFTER WRITE"] R20["0x41"] R21["0x42"] R22["0xFF"] R23["0x44"] end Sync["OS syncs<br/>dirty pages"] subgraph Disk2 ["DISK AFTER SYNC"] F20["[0] 0x41"] F21["[1] 0x42"] F22["[2] 0xFF"] F23["[3] 0x44"] end App --> Map F0 -.-> Map F1 -.-> Map F2 -.-> Map F3 -.-> Map Map --> R0 Map --> R1 Map --> R2 Map --> R3 R0 --> Ptr R1 --> Ptr R2 --> Ptr R3 --> Ptr Ptr --> Write Write --> R20 Write --> R21 Write --> R22 Write --> R23 R22 --> Sync Sync --> F20 Sync --> F21 Sync --> F22 Sync --> F23 style App fill:transparent,stroke:#8b5cf6,stroke-width:3px style Map fill:transparent,stroke:#10b981,stroke-width:3px style Ptr fill:transparent,stroke:#10b981,stroke-width:3px style Write fill:transparent,stroke:#f59e0b,stroke-width:3px style Sync fill:transparent,stroke:#10b981,stroke-width:3px style Disk fill:transparent,stroke:#3b82f6,stroke-width:3px style RAM fill:transparent,stroke:#f59e0b,stroke-width:3px style RAM2 fill:transparent,stroke:#f59e0b,stroke-width:3px style Disk2 fill:transparent,stroke:#10b981,stroke-width:3px style F0 fill:transparent,stroke:#3b82f6,stroke-width:3px style F1 fill:transparent,stroke:#3b82f6,stroke-width:3px style F2 fill:transparent,stroke:#3b82f6,stroke-width:3px style F3 fill:transparent,stroke:#3b82f6,stroke-width:3px style R0 fill:transparent,stroke:#f59e0b,stroke-width:3px style R1 fill:transparent,stroke:#f59e0b,stroke-width:3px style R2 fill:transparent,stroke:#f59e0b,stroke-width:3px style R3 fill:transparent,stroke:#f59e0b,stroke-width:3px style R20 fill:transparent,stroke:#f59e0b,stroke-width:3px style R21 fill:transparent,stroke:#f59e0b,stroke-width:3px style R22 fill:transparent,stroke:#ef4444,stroke-width:4px style R23 fill:transparent,stroke:#f59e0b,stroke-width:3px style F20 fill:transparent,stroke:#10b981,stroke-width:3px style F21 fill:transparent,stroke:#10b981,stroke-width:3px style F22 fill:transparent,stroke:#ef4444,stroke-width:4px style F23 fill:transparent,stroke:#10b981,stroke-width:3px

The macOS Trap

In Linux, when your file grows, you call mremap. The kernel says “Sure thing,” and seamlessly expands your virtual memory mapping. It’s fast, atomic, and beautiful.

In macOS? mremap does not exist.

I found this out the hard way: segfault.

To make Zue work on my MacBook, I had to implement a workaround:

  1. Check if write will exceed mapping.
  2. munmap (Un-map the whole file). (Scary).
  3. ftruncate (Grow the file).
  4. mmap (Remap the whole file).

It’s slow. It’s ugly. But it works.


Next Up: Part 2 - Networks Are Liars