<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  
  <title>jonas keller / notes on systems</title>
  <subtitle>notes on systems</subtitle>
  <link href="https://example.com/feed.xml" rel="self" />
  <link href="https://example.com/" />
  <updated>2026-06-14T00:00:00Z</updated>
  <id>https://example.com/</id>
  <author>
    <name>Jonas Keller</name>
  </author>
  <entry>
    <title>Writing a toy allocator in 200 lines of C</title>
    <link href="https://example.com/posts/toy-allocator/" />
    <updated>2026-06-14T00:00:00Z</updated>
    <id>https://example.com/posts/toy-allocator/</id>
    <content type="html">&lt;p&gt;Every few years I talk myself into writing a memory allocator from scratch. It is the kind of project that looks trivial from a distance — bump a pointer, hand back the bytes — and reveals a surprising amount of texture the moment you ask it to do anything real. This post walks through a small one, roughly two hundred lines, and the handful of decisions that mattered most.&lt;/p&gt;
&lt;p&gt;The goal here is not to beat &lt;code&gt;malloc&lt;/code&gt;. The goal is to understand the shape of the problem: how free lists fragment, why alignment is load-bearing, and where the real cost hides once you start measuring.&lt;/p&gt;
&lt;h2&gt;Start with the arena&lt;/h2&gt;
&lt;p&gt;The simplest useful primitive is an arena: a single contiguous block you carve allocations out of by advancing an offset. There is no per-allocation bookkeeping and freeing is a no-op until you reset the whole thing. It is fast precisely because it refuses to answer the hard question.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void *arena_alloc(Arena *a, size_t n, size_t align) {
    uintptr_t p = (uintptr_t)a-&amp;gt;cursor;
    uintptr_t aligned = (p + (align - 1)) &amp;amp; ~(align - 1);
    if (aligned + n &amp;gt; a-&amp;gt;end) return NULL;   // out of room
    a-&amp;gt;cursor = (char *)(aligned + n);
    return (void *)aligned;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Three lines of arithmetic and one branch. The alignment mask is the part people get wrong: round the address up to the next multiple of &lt;code&gt;align&lt;/code&gt;, then check that the request still fits. Skip it and you will spend an afternoon chasing a bus error on the one platform that cares.&lt;/p&gt;
&lt;h2&gt;Where it gets interesting&lt;/h2&gt;
&lt;p&gt;An arena is wonderful right up until you need to free individual objects. The moment you do, you are back in the land of free lists, size classes, and coalescing — and every one of them is a trade between speed, fragmentation, and how much metadata you are willing to carry.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Free lists are O(1) to push and pop, but a naive one fragments badly under mixed sizes.&lt;/li&gt;
&lt;li&gt;Size classes trade a little internal waste for far less external fragmentation.&lt;/li&gt;
&lt;li&gt;Coalescing adjacent free blocks fights fragmentation but adds a footer to every allocation.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;The allocator you actually want depends entirely on your allocation pattern — and you will not know that pattern until you measure it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Which is the real lesson. Measure first.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Why your p99 latency lies to you</title>
    <link href="https://example.com/posts/p99-latency/" />
    <updated>2026-05-02T00:00:00Z</updated>
    <id>https://example.com/posts/p99-latency/</id>
    <content type="html">&lt;p&gt;Tail latency is the number everyone quotes and almost nobody computes correctly. We put p99 on dashboards, alert on it, and write it into SLOs — and then measure it in a way that quietly removes the worst cases from the sample.&lt;/p&gt;
&lt;h2&gt;Coordinated omission&lt;/h2&gt;
&lt;p&gt;The classic mistake is sending a request, waiting for the response, and only &lt;em&gt;then&lt;/em&gt; sending the next one. When the system stalls, your load generator stalls with it, so the long pause never produces the flood of slow samples it should have. You measured the system&#39;s good behavior and called it the tail.&lt;/p&gt;
&lt;p&gt;The fix is to decouple request scheduling from response timing: issue requests on a fixed schedule and record latency against the time the request &lt;em&gt;should&lt;/em&gt; have started, not when you got around to sending it.&lt;/p&gt;
&lt;h2&gt;Measure the thing you promised&lt;/h2&gt;
&lt;p&gt;A p99 computed over the wrong population is not a conservative estimate — it is a different number that happens to share a name. Before you trust a tail metric, ask what got dropped to produce it.&lt;/p&gt;
</content>
  </entry>
</feed>