• 2 min read

Building Zue Part 2: Networks Are Liars


The Protocol Problem

When I started Zue, I thought: “I’ll just use JSON! It’s debuggable, easy, and everyone loves it.”

Then I realized that serializing { "key": "user:123", "value": "..." } for every single request is:

  1. Slow: String parsing is CPU-intensive.
  2. Verbose: The keys take up ton of space than the data.

So I went lower level. I designed a simple Length-Prefixed Binary Protocol.

The Packet Structure

Every message in Zue looks exactly like this on the wire. No delimiters. No newlines. Just raw bytes.

%%{init: {'theme':'base', 'themeVariables': {'fontSize':'16px'}, 'flowchart':{'nodeSpacing': 60, 'rankSpacing': 60}}}%% flowchart LR subgraph Packet ["Binary Packet Structure"] direction LR Len["Length (4 Bytes)<br/>(Little Endian)"] Type["Type (1 Byte)<br/>(Enum)"] Pay["Payload (Variable)<br/>(Protobuf / Struct)"] end Len --> Type --> Pay style Len fill:transparent,stroke:#10b981,stroke-width:3px style Type fill:transparent,stroke:#f59e0b,stroke-width:3px style Pay fill:transparent,stroke:#3b82f6,stroke-width:3px style Packet fill:transparent,stroke:#6366f1,stroke-width:2px

Why length-prefixed? Because TCP is a stream.

The “Partial Read” Trap

Beginners (read: me, two weeks ago) think that if you call send("Hello") on the client, you will receive “Hello” in one recv() call on the server.

Wrong.

You might get “Hel”. Then “lo”. Or you might get “HelloWor” if two messages got stuck together.

Length prefixing solves this. My server reads 4 bytes first. It gets the number 50. It then loops read() until exactly 50 bytes have arrived.

%%{init: {'theme':'base', 'themeVariables': {'fontSize':'16px'}}}%% sequenceDiagram participant Net as Network participant Buf as App Buffer Net->>Buf: "Length: 50" (4 bytes) Note right of Buf: App knows: Wait for 50 bytes. Net->>Buf: "Payload: Hello..." (10 bytes) Note right of Buf: Total: 10/50. Keep reading. Net--xBuf: *Pause* (Network Lag) Net->>Buf: "...World!" (40 bytes) Note right of Buf: Total: 50/50. Done! Buf->>App: Deserialize()

The Async Event Loop

The second biggest mistake I made was using blocking I/O.

In v1, if a Follower node was slow to respond, the Leader would just… wait. The client would wait. The universe would wait.

To fix this, I rewrote the entire engine to use a single-threaded Event Loop using poll().

Leader Loop

The Leader never blocks. It checks sockets. If they have data, it reads. If they don’t, it moves on. It runs a tickRepair function periodically to fix broken followers in the background.

%%{init: {'theme':'base', 'themeVariables': {'fontSize':'30px'}, 'flowchart':{'nodeSpacing': 30, 'rankSpacing': 30}}}%% flowchart LR Start([Start Loop]) Timer{"Timer<br/>Expired?<br/>2s interval"} Heartbeat["Send Heartbeats<br/>to Followers"] Repair["Run Repair Tick<br/>tickRepair"] Ready{"POLL<br/>Sockets Ready?"} Type{"Socket<br/>Type?"} Accept["Accept New<br/>Connection"] MsgType{"Message<br/>Type?"} Append["Append Request:<br/>Replicate with Quorum"] Read["Read Request:<br/>Read from Log"] Cleanup["Handle<br/>Retry/Success/Failure"] Start --> Ready Timer -->|"Yes"| Heartbeat Heartbeat --> Repair Repair --> Start Timer -->|"No"| Start Ready -->|"No"| Timer Ready -->|"Yes"| Type Type -->|"Listener"| Accept Type -->|"Client"| MsgType Accept --> Cleanup MsgType -->|"Append"| Append MsgType -->|"Read"| Read Append --> Cleanup Read --> Cleanup Cleanup --> Timer style Start fill:transparent,stroke:#10b981,stroke-width:4px style Heartbeat fill:transparent,stroke:#f59e0b,stroke-width:3px style Repair fill:transparent,stroke:#f59e0b,stroke-width:3px style Append fill:transparent,stroke:#ef4444,stroke-width:4px style Timer fill:transparent,stroke:#8b5cf6,stroke-width:3px style Ready fill:transparent,stroke:#8b5cf6,stroke-width:3px style Type fill:transparent,stroke:#8b5cf6,stroke-width:3px style MsgType fill:transparent,stroke:#8b5cf6,stroke-width:3px style Accept fill:transparent,stroke:#3b82f6,stroke-width:3px style Read fill:transparent,stroke:#3b82f6,stroke-width:3px style Cleanup fill:transparent,stroke:#3b82f6,stroke-width:3px

Follower Loop

The Followers are simpler. They just do what they’re told. If a client tries to write to them, they politely yell “I am not the Leader!” and close the door.

%%{init: {'theme':'base', 'themeVariables': {'fontSize':'30px'}, 'flowchart':{'nodeSpacing': 30, 'rankSpacing': 30}}}%% flowchart LR Start([Start Loop]) Ready{"Sockets<br/>Ready?"} Type{"Socket<br/>Type?"} Accept["Accept New<br/>Connection"] MsgType{"Message<br/>Type?"} ClientReq["Client Request:<br/>Reject Redirect to Leader"] Replicate["Replicate Request:<br/>Validate Offset & Append"] Heartbeat["Heartbeat Request:<br/>Update Timestamp"] Cleanup["Send<br/>Response"] Start --> Ready Ready -->|"No"| Start Ready -->|"Yes"| Type Type -->|"Listener"| Accept Type -->|"Client"| MsgType Accept --> Cleanup MsgType -->|"Read"| ClientReq MsgType -->|"Replicate"| Replicate MsgType -->|"Heartbeat"| Heartbeat ClientReq --> Cleanup Replicate --> Cleanup Heartbeat --> Cleanup Cleanup --> Start style Start fill:transparent,stroke:#10b981,stroke-width:4px style Replicate fill:transparent,stroke:#f59e0b,stroke-width:4px style Heartbeat fill:transparent,stroke:#f59e0b,stroke-width:3px style ClientReq fill:transparent,stroke:#ef4444,stroke-width:3px style Ready fill:transparent,stroke:#8b5cf6,stroke-width:3px style Type fill:transparent,stroke:#8b5cf6,stroke-width:3px style MsgType fill:transparent,stroke:#8b5cf6,stroke-width:3px style Accept fill:transparent,stroke:#3b82f6,stroke-width:3px style Cleanup fill:transparent,stroke:#3b82f6,stroke-width:3px

This architecture allowed Zue to handle 100+ concurrent clients on a single thread. Blocking is a crime.


Next Up: Part 3 - Herding Cats (Distributed Consensus)