Designing Event-Based Systems: Lessons from Human Communication
· 5 min read
When you’re building a system—especially one designed to scale and evolve—you inevitably encounter events. It’s how software components communicate without being locked together.
Think about human communication. A direct call, like a Remote Procedure Call (RPC), is synchronous and coupled: you dial, ask, and wait for an answer. It works—until you need to notify a dozen busy recipients.
That’s where event-based systems shine. Asynchronous (no waiting for replies) and decoupled (components work independently), they’re built for scale. But one question always arises: What should go in the message?
It’s like writing a note to many future selves, each with unique contexts and needs. Do you send a nudge, a full update, or just what changed?
Every event you emit is a design decision—not just for your system, but for every consumer that builds on top of it. That single message shapes what their architecture needs to look like, how much state they have to manage, and how brittle or flexible their solution becomes.
The Pager: A Simple Nudge #
Let’s start with the lightest possible message: something happened.
You include the entity ID, maybe the event type ("organization.updated"), and perhaps a timestamp. That’s it. This message doesn’t contain the change—it just announces it.
It’s like a pager: it buzzes, and you know something’s up. But it’s on you to figure out what.
{
"entityId": "8b0a723a-d287-02d0-0000-000000000000",
"eventType": "organization.updated",
"timestamp": "2025-01-01T00:00:00Z"
}
This approach works well in smaller systems, or tightly integrated ones, where consumers already hold the relevant state or can fetch it easily. Producers stay lean—they ring the bell and move on, without worrying about payload shape or who’s listening.
The simplicity is appealing, but it comes at a cost. Pager-style messages push complexity onto the consumers. They need to be online, know how to fetch the data, and handle cases where the state has already moved on—or the fetch fails entirely.
It scales the producer. It burdens the consumer. Sometimes that’s a fair trade. Often, it’s a trap.
The Email: Everything You Need #
Unlike a lightweight announcement, this kind of message is packed with everything the recipient needs—like a self-contained email. No lookups. No follow-ups.
{
"entityId": "8b0a723a-d287-02d0-0000-000000000000",
"eventType": "OrganizationUpdated",
"timestamp": "2025-01-01T00:00:00Z",
"payload": {
"id": "8b0a723a-d287-02d0-0000-000000000000",
"name": "Acme Corp",
"tier": "enterprise",
"timezone": "America/Los_Angeles"
}
}
This isn’t just saying an organization changed—it’s showing what changed: the name, tier, timezone, and anything else that matters. The message doesn’t just buzz. It speaks—with enough context for the consumer to act immediately.
This approach shines when there are many consumers or when the producer doesn’t know who’s listening. For example: an Organization emits full events when it updates. Downstream, Workspaces—which belong to that organization—materialize key fields like name and tier into their own local state. So when a Workspace emits its own event, it carries those org details forward. No lookups. No joins. Just one flattened, ready-to-use message.
This kind of flattening can be a superpower. It makes dependencies explicit. Consumers don’t fetch—they observe. New services are easy to onboard, and the system becomes more resilient by reducing reliance on real-time queries.
But it comes at a cost. Full-payload events require discipline. A small schema change—say, a field like tier changing type—can break downstream consumers unless you’ve built for compatibility. And when changes cascade through multiple entities, event traffic grows noisier. The more you flatten, the more you need to be intentional about versioning and propagation.
There’s a whole conversation to be had about how to do that well—how to evolve full messages without breaking consumers, and how to manage propagation at scale.
We’ll get into that in a future post.
The SMS: Just the Change #
Now imagine a system handling millions of events per second. Sending full payloads for every change would swamp the network, overwhelm storage, and slow everything down. So instead, you send just the change—like an SMS: short, precise, and to the point.
{
"entityId": "8b0a723a-d287-02d0-0000-000000000000",
"eventType": "OrganizationUpdated",
"timestamp": "2025-01-01T00:00:00Z",
"diff": {
"tier": { "old": "community", "new": "enterprise" }
}
}
This “diff” style is hyper-efficient. Only the updated fields are included, reducing data transfer to the bare minimum. It shines in high-throughput environments—think sensor streams, telemetry feeds, or financial tickers—where the volume is relentless and every byte counts.
But that efficiency comes with a cost. Consumers can’t just read the message—they have to rebuild state from it. That means strict event ordering, bulletproof replay logic, and durable event stores. Miss an event, or process them out of order, and things break in subtle, painful ways.
It eases the network—but puts pressure on your infrastructure. Now your system needs to retain years of events, handle rehydration correctly, and guard against every edge case in the playback path.
It’s a powerful pattern when scale demands it. But it’s not the default.
You’ve made the system lean—but only by moving the weight somewhere else.
What They Need to Hear #
In the end, building good event-based systems is less about how you publish, and more about how you listen. About understanding who’s on the other end—and what they need from you.
What do they know? What do they need? What can they rely on?
That’s the real design work. Not picking Kafka or SQS or whatever messaging layer is popular this week.
So the next time you’re designing an event message, start by asking: What should we put in it?
And then ask a better question: Who’s listening?