Bluesky's source code is widely open source, so you can run your own social network with it. What's missing? A performant DataPlane implementation. Closing this gap would be an important step towards building digital independence. We wanted to contribute our share and decided to work on a performant DataPlane for Bluesky. When we started the project, we expected it to be a Go, Rust or even Node project. Instead, we landed with Elixir. Here is why and how we made that decision.
Four Languages, One DataPlane: How We Picked
We set out to evaluate Go, Rust, Node and Elixir for a from-scratch implementation of the Bluesky AppView DataPlane, fully expecting one of the usual suspects to win. The outcome surprised us. This is how we reasoned from the workload - and how we ended up somewhere we didn't anticipate when we started.
TL;DR - The DataPlane's workload splits cleanly in two: a hot path (timeline reads, served from memory, concurrency-bound) and a cold path (records, threads and profiles, I/O-bound). We expected Go, Rust or Node to win; Elixir fit best. Its one real weakness - raw per-core compute - is localized to the follower-graph set operations, which we offload to a small Rust NIF (Native Implemented Function). What's left is high-concurrency serving and a burst-absorbing fan-out queue, which the BEAM lets us build in-process instead of bolting on Redis or Kafka. The rest of this post is how we reasoned our way there.
The component nobody talks about
Bluesky's infrastructure is, refreshingly, mostly open source. There's one notable exception: the DataPlane, a part of the AppView. It exists publicly only as a Node-and-Postgres reference implementation, while the production Bluesky network runs on a dedicated, ScyllaDB-backed, closed-source DataPlane (as documented across Jaz's blog and the Pragmatic Engineer's deep-dive on Bluesky's architecture).
That gap is exactly where things get interesting. The reference implementation tells you what the DataPlane does; it doesn't tell you how to make it survive contact with real traffic. If you want to run your own, you have to answer the scaling question yourself - and the first step is understanding the workload well enough to stop treating it as one thing.

ScyllaDB is Bluesky's operational choice, not part of the interface. The DataPlane's contract is a gRPC service that answers high-volume, low-complexity queries and returns skeletons - lists of IDs, counts, booleans - which a higher layer later hydrates into full views. What sits behind that contract is entirely up to you: the language, and the datastore. So before picking either, we spent our time on the only thing that actually constrains the choice: the shape of the load.
A tale of two workloads
Here is the central observation. The network's historical data is enormous - terabytes of it. But when a user opens the app and looks at their timeline, they almost never scroll back to the dawn of time. They read some tens of posts, get distracted and wander off to engage with a post, inspect a profile or follow a thread.
A note on specifics: it's been observed that Bluesky's timeline doesn't serve much beyond the last day or two of content, and that deeper cursor positions tend to fill with very recent posts rather than true history. Treat the exact window as illustrative unless you've measured it on your own deployment - the architectural point holds regardless of the precise number.
This creates a tension. Most “historic” data - meaning anything more than a few days old - is never looked at again in the timeline. When old data matters, it's almost always in a different context: someone inspecting a profile, or following up on a past thread. Yet to compile a timeline, the reference implementation has to perform joins against potentially terabyte-heavy tables. That neither performs nor scales well - and timeline requests are the lion's share of everything the DataPlane is asked to do.

The conclusion writes itself: timeline generation deserves a fundamentally different treatment from the use cases that retrieve individual records or threads. Conflating them is what makes the naive implementation hurt. Once you separate them, you find they don't just differ in degree - they have opposite resource profiles, and they want different things from the runtime underneath.
| Hot path - timelines | Cold path - records, threads, profiles | |
|---|---|---|
| Data age | Recent (last day or two) | Historic (anything older) |
| Data volume | A tiny sliver | Terabytes |
| Request share | The dominant workload | Comparatively rare |
| Bound by | Memory + compute | I/O |
| Lives in | Memory | Database, fetched on demand |
| Strategy | Fan-out + bounded timeline length | Swappable datastore behind an interface |
The hot path: timelines, served from memory
Most social platforms solve timeline fan-out with a hybrid strategy, and for good reason. A post from an account with a few thousand followers is distributed immediately - fan-out on write, pushed into followers' timelines as it arrives. A post from an account with millions of followers is not fanned out; it's added to a follower's timeline only when that follower actually requests it - fan-in on read. The first keeps write amplification bounded for ordinary accounts; the second avoids the thundering-herd write storm that a celebrity post would otherwise cause.
There's a second simplification hiding in the timeline-length limit. Because timelines are bounded, a user who follows a very large number of accounts will never see all of them in their timeline anyway. So it's entirely legitimate to limit distribution into any single timeline. You are not obligated to deliver every post to every follower; you're obligated to deliver a good, recent, bounded timeline.
Put those together and the hot path stops looking like a database problem at all:
- Recent posts - the overwhelming majority of what timelines are made of - can live and be served from memory.
- Fan-out can be deferred: as long as posts land in followers' timelines within minutes, and the backlog of fan-out jobs doesn't outgrow available resources, nobody notices the delay.
- Older content, when it's genuinely needed, is safe to fetch from the database on demand.
This is the move that changes everything downstream. The dominant request - timeline reads - shifts from I/O-bound (wait on a giant join) to memory-and-compute-bound (manipulate in-memory structures quickly). That single inversion is what reopens the language question, because it changes what “fast” even means here.
It also suggests abstracting the concrete database behind an interface, so the backing store for the cold path can be swapped without touching the rest of the system. The hot path barely touches the database; the cold path is the only place that really leans on it. Keeping that boundary clean keeps your options open.
The follower graph: in memory, but not naively
There's a catch. Fan-out and timeline assembly both need to answer questions about the follow graph - who follows whom, and the intersections and unions of those sets. Holding hundreds of millions of follow relationships in memory the naive way would be ruinously wasteful.
This is well-trodden ground. Jaz's “GraphD” series documents the journey directly: an in-memory graph store that originally used hash maps and hash sets to track each user's followers and follows, supporting the bidirectional lookups, intersections and unions the workload needs. The breakthrough was switching to Roaring Bitmaps, a compressed bitmap structure built for exactly this. The result is striking - the entire Bluesky follow graph fits in roughly 6.5 GB of RAM, sits at about 1.6 GB on disk and loads in around 20 seconds.
Crucially, Jaz also names the two cost modes that map precisely onto our hot/cold split: paging over all of a user's follows is the expensive operation, done in a paginated way or as an async job during fan-out; whereas on-demand set intersection - “which people I follow also follow this person” - has to run at interactive speed. The architecture isn't an invention so much as a recognition of a distinction the data was already making.
So what do we actually need from a language?
With the workload pinned down, we can finally state the requirements honestly - and they pull in two directions:
- Fast, compute-bound set operations over a large, long-lived, in-memory graph (roaring-bitmap intersections and unions). This is per-core compute, byte-level work, cache sensitivity.
- High-concurrency, memory-resident request serving - the timeline reads - with bounded, predictable response sizes and tight tail-latency expectations.
- A deferrable, burst-absorbing fan-out queue that delivers within minutes, applies backpressure and degrades gracefully rather than falling over when traffic spikes.
- A clean datastore boundary for the cold path: individual records, threads, profiles - a comparatively ordinary I/O-bound workload.
The first three are the hot path; the fourth is the cold path. No single language is the obvious winner across all four - so this is the scorecard we judged each candidate against.
The four candidates
At bitcrowd we work with Elixir, Go and Rust day to day, with the occasional Node project on the side. So this wasn't a contest between a favourite and a lineup of strangers - we had hands-on experience to weigh on every side of the comparison.
Go
Go is the natural incumbent - it's what Bluesky's own production DataPlane is written in, and the fit is genuinely strong. Goroutines and channels map cleanly onto “fan out concurrent work, gather results, respond.” The in-memory and bitmap work is perfectly comfortable in Go (GraphD itself is Go). gRPC support is first-class, deployment is a single static binary, and the operational tooling is mature. In-process fan-out is very doable: worker pools draining buffered channels.
The costs are at the edges. At extreme request rates Go's garbage collector can start stealing CPU under heavy allocation, and its network backend can bottleneck on syscalls when juggling huge numbers of sockets - both are solvable with runtime-level tuning, but they're real work at the top end. And the in-process fan-out you build lacks a supervision-and-isolation layer out of the box: a panicking worker can take the process down, and you hand-roll the backpressure and lifecycle management yourself. Go's in-process story is closer to the BEAM's than people assume; the gap is in framework-provided safety, not raw capability.
Rust
Rust is the answer if you want the highest ceiling and the tightest control. No garbage collector means none of the GC-stealing-CPU behaviour and none of the tail-latency pauses. Tokio handles enormous concurrency with low per-task overhead, tonic gives you solid gRPC, and the roaring-bitmap and datastore libraries are excellent. For the compute-bound half of our workload, nothing beats it.
The cost is development velocity and the rope to hang yourself with. For a service this thin on business logic, you pay Rust's tax - the borrow checker, async Rust's sharp edges, longer iteration - on every line, while capturing relatively little of its safety upside, because there's so little logic to protect. In-process fan-out is achievable with Tokio tasks and channels and performs beautifully, but you build the lifecycle, backpressure and supervision entirely yourself. It's the most powerful and the most “you're on your own” of the options.
Node / TypeScript
Node deserves real consideration because it's the language of the reference DataPlane and the rest of the atproto stack - PDS, AppView frontend, lexicons. One language across the codebase, shared types from lexicon definitions, the biggest hiring pool, the fastest iteration. For a reference implementation and for modest-scale self-hosting, it's a sensible default, and for the cold path's ordinary I/O-bound record fetches it's perfectly fine.
But once the dominant workload turns memory-and-concurrency-bound, Node becomes the odd one out. A single event loop per process means in-memory queues and request serving compete for the same loop, and any CPU-bound work blocks it. Using multiple cores means running multiple processes with no shared memory - which reintroduces exactly the cross-process coordination the in-process design was meant to eliminate, and makes a large shared in-memory graph awkward. It's the right tool for the reference implementation and a strained one for a throughput-oriented production server.
Elixir
Elixir runs on the BEAM, a runtime built specifically for massive numbers of cheap, isolated, preemptively-scheduled processes handling concurrent work. Per-process garbage collection means no global stop-the-world pauses, so tail latency stays consistent under load. Supervision trees give fault isolation and self-healing essentially for free. For high-concurrency request serving and for a deferrable, backpressured, burst-absorbing fan-out queue, it's arguably the most naturally suited runtime of the four.
It has one well-known weakness, and we have to be honest about it: raw per-core compute. The BEAM optimises for concurrency and consistency, not single-threaded number-crunching, and byte-level work - like the set operations over a large follower graph - is exactly where it's slowest. Taken at face value, that's a serious strike against Elixir for a workload that includes heavy bitmap operations.
Every candidate has a flaw - which ones can you fix?
At this point we had four reasonable options, each with one thing standing between it and a clean fit. So rather than argue it on paper, we took each candidate's flaw and started building toward the remedy - far enough to see whether it could be engineered away or whether it was structural, baked into the runtime in a way no amount of cleverness escapes. That question turned out to be the whole decision.
Go has a GC that steals CPU under heavy allocation and a network backend that bottlenecks on syscalls at extreme socket counts. We started tuning around it - runtime flags, tighter allocation discipline - and it worked, up to a point; Bluesky's own production DataPlane proves you can push it far. But every gain was a knob we'd have to keep turning forever. The flaw moves out; it never leaves.
Node has a single event loop per process. We looked at the only fix available - run more processes and coordinate across them - and hit the wall immediately: that reintroduces exactly the cross-process overhead and awkward shared state we were trying to design out. The cure is the disease; you can't escape it without leaving the language.
Rust has no performance flaw to fix - but we felt the other cost the moment we started building. For a service this thin on logic, we were hand-rolling concurrency, lifecycle and backpressure machinery on every path, paying Rust's tax on each line while capturing little of its safety payoff. That's not a flaw you patch; it's engineering time you spend indefinitely.
And then Elixir. Its flaw - per-core compute on tight loops - is real, and we hit it right where you'd expect: the follower-graph set operations. But when we went looking for the remedy, the problem stayed put. It wasn't smeared across the service; it sat in one well-defined place we could draw a box around - and a flaw you can box is a flaw you can lift out. That's what sent us back to Elixir for a second, harder look - not because its weakness was smaller, but because it was the only one shaped like something we could excise without rewriting the program around it.
Resolving the Elixir paradox
So we looked harder at whether that box could actually be lifted out - and it can, more cleanly than we expected. What makes it work isn't a fact about Elixir; it's the shape of our own architecture.
First, a clarification that defuses half the concern. “The BEAM is slow at CPU work” is a statement about per-core throughput on tight loops - not about real-world resource efficiency. In production, most cost isn't a tight loop; it's the coordination around keeping many concurrent requests flowing smoothly. The BEAM spends remarkably little CPU on that coordination, its per-process GC avoids global pauses, and its predictable tail latency lets you run nodes hotter for the same latency target - which is why teams routinely observe Elixir services using less hardware than their Go or Node equivalents, sometimes even rivalling Rust, despite losing every microbenchmark. Resource efficiency under real concurrency and peak per-core throughput are different things, and production rewards the former.
But our workload genuinely does contain the one thing the BEAM is bad at: the roaring-bitmap set operations. So we don't hand-wave it - we route around it.
The follower graph lives in a Rust implementation of Roaring Bitmaps, called from Elixir as a NIF. And the reason this is the right NIF, rather than a generic “wrap the slow part in Rust” patch, is that it fits the data-flow asymmetry our architecture already has:
- The input to the boundary is tiny - a user ID, or a small set of IDs. Bytes.
- The heavy data never crosses it. The entire graph - the millions-of-edges structure - stays on the Rust side, in native memory, in compact bitmaps. The BEAM never holds it, never copies it, never garbage-collects it.
- The expensive compute happens entirely inside Rust - the intersections and unions, at native speed, exactly the per-core work the BEAM is worst at.
- Only the result crosses back - and because timelines are length-limited, that result set is small and bounded by construction.
The usual objection to NIFs is the copy cost at the boundary. Our design satisfies the ideal condition for avoiding it: small data crosses in both directions, while the big structure and the heavy computation stay native. That isn't a lucky accident - it falls directly out of the timeline-length limit we'd already established. The architecture's natural funnel is the NIF's efficiency condition.
The effect on the evaluation is that it subtracts the BEAM's weakness from the hot path while keeping its strengths. The compute-bound half runs in Rust, regardless of host language. What's left for Elixir is the orchestration-and-concurrency half - high-concurrency request serving, and the fan-out queue - which is precisely the regime where the BEAM shines and where our own production experience says it's most efficient.
We're honest about the trade this NIF brings: a NIF runs inside the BEAM's memory space, so a crash in the Rust code can take down the VM, and a long-running call can stall a scheduler - the price is some of the BEAM's “let it crash” isolation, exactly at the native boundary. The mitigations fit us well: the calls are short (small in, bounded compute, small out), and the Rust surface is small, stable and the kind of code that, once written, rarely changes. Yes, it's two languages - but it's a clean, minimal split, not a rewrite, and Rustler makes the boundary about as ergonomic as native interop gets. Of the available evils, maintaining a small bounded Rust core is the least.
The deciding factor: fan-out as code, not infrastructure
If the NIF makes Elixir competitive, the fan-out requirement is what tipped it to chosen.
Recall what we need: a deferrable queue that absorbs bursts, applies backpressure, delivers within minutes and degrades gracefully when the backlog grows. In the Go, Rust and Node worlds, that almost always becomes an external system - Redis, NATS, RabbitMQ, Kafka - because the language doesn't give you the primitives to do it safely in-process. The moment you need backpressure, supervision and “don't fall over under a spike,” people reach for infrastructure.
On the BEAM, those primitives are the language. Processes are the workers, mailboxes are the queues, supervisors handle recovery, and libraries like GenStage and Broadway provide explicit backpressure - all inside the same VM, no network hop, nothing extra to deploy. The list of costs that simply don't exist is what sold us:
- No serialisation across a queue boundary. This ties straight back to the workload analysis. An external queue means serialising every fan-out job to push it and deserialising to pop it - reintroducing exactly the per-event (de)serialisation cost we worked to keep off the hot path. In-process, a job is just a message passed between processes. We removed a whole serialisation surface, not optimised it.
- No second system to operate - no separate scaling story, no “is Redis the bottleneck now,” no split-brain between the service's view of the backlog and the queue's view.
- A unified failure model. Supervision covers the fan-out workers the same way it covers everything else. There's no seam between “the service crashed” and “the queue is in a weird state.”
- In-band backpressure. Producer and consumer share a runtime, so the producer can actually feel the consumer falling behind and respond - rather than discovering it via queue-depth metrics after the fact.
Our fan-out spec - deferrable within minutes, degrade gracefully under bursts, never pile up beyond resources - reads almost like the design brief for GenStage-style backpressure. The requirement and the tool are unusually well matched, and the tool runs in-VM.
The honest counter-weight: in-process means in-memory means lost on crash. If a node dies with a fan-out backlog pending, that backlog dies with it. For us that's acceptable - fan-out is best-effort timeline population, timelines are bounded and ageing, and the fan-in/pull path covers anything that goes missing for a window. But it's a deliberate choice of “fast, simple, lossy-on-crash” over “durable, external, heavier,” and it's the right call only because the rest of the architecture makes the loss cheap. If you needed durability here, much of the in-process advantage would narrow - so this is the assumption to check against your own tolerances.
Why Elixir, in one paragraph
We didn't set out to pick Elixir, and it doesn't win every category - it doesn't. We landed on it because, for this specific workload, it sits at a sweet spot we didn't see coming. The one axis it loses on - raw per-core compute for graph operations - is the one axis we cleanly offload to a small Rust NIF, with a data-flow shape that makes the offload nearly free. Everything that remains is concurrency, coordination, predictable tail latency under load and a burst-absorbing fan-out queue that the BEAM lets us build in the program itself instead of bolting on as infrastructure. Go would have been the pragmatic, proven middle; Rust would have given us the highest ceiling at the cost of velocity and a lot of hand-rolled concurrency machinery; Node was right for the reference implementation and wrong for a memory-bound production server. Elixir plus a thin Rust core gave us the best of both halves of a workload that genuinely has two halves.
What next?
Most of this is architecture-level reasoning informed by the workload's shape and each runtime's known characteristics - not head-to-head benchmarks against a live network. The pieces we lean on hardest are well-sourced: Jaz's GraphD work for the in-memory graph and Roaring Bitmaps, and the documented existence of a fan-in/fan-out split between expensive paging and interactive-speed set intersection. The pieces we're least certain about - the exact timeline window, the precise per-request CPU breakdown - we've flagged as such.
The right next step isn't more reasoning; it's building a skeleton DataPlane that implements a handful of the real RPCs, wiring up the Rust-NIF graph and an in-process fan-out queue, and load-testing the burst-and-drain and crash-recovery behaviours directly.
Measurements beat priors. But as a starting hypothesis for where to place our bet, Elixir-plus-Rust is the configuration the workload kept pointing us toward.
