Thread-Coordinated Ractors: The Pattern That Delivers

Thread-Coordinated Ractors

The Pattern That Delivers

Maciej Mensfeld

RubyKaigi 2026

Maciej Mensfeld

Karafka, Shoryuken, PGMQ-Ruby, COI (AI Sandbox)

RubyGems Security Team

Mend.io

@maciejmensfeld

A Long Wait

Waiting for Ractors since they were called Guilds.

Today's Plan

The Problem - why Ractors?
The Investigation - what works, what doesn't
Patterns - building the Ractor pool
Production - benchmarks, limits, takeaways

10+ Years of Background Processing

Fibers. Threads. Processes.

All useful. None of them parallelize CPU-bound work in a single process.

I Hit This Wall in Karafka

Deserialization runs on every message. Business logic runs on a fraction.

Where Does Time Actually Go?

Bigger Payloads = More Time in Deserialization

If I can speed up deserialization...

...I speed up everything.

I've Tried Everything

Threads - useless for CPU
Async - useless for parsing
Forking - per-process throughput unchanged

The GVL blocks parallelism for pure-Ruby CPU work.

Ractors

Each Ractor has its own GVL. True parallel execution.

It's Not That Simple

You need the right use-case
You need the right data shape
You need to account for framework overhead
You need realistic expectations

Making Ractors deliver is an engineering challenge, not a flag you flip.

A Library Author's Perspective

I build middleman layers - between infrastructure and your code.

I control the plumbing, not the business logic.

I can't tell users to switch runtimes, gems, or architectures.

~95% run CRuby. I have to solve it here.

The Unlock

We can't assume user code is Ractor-safe
We can't require them to make it so
Library authors can isolate pure input/output operations

You don't need a Ractor-safe app.
You need Ractor-safe building blocks.

"Ractors are slow. Everyone knows that."

Claude Code says Ractors won't work with Karafka

3x slower

JSON parsing with Ractors was painfully slow

Bug #19288

I reported that bug myself.

RubyKaigi 2025

I talked with the core team about Ractor pain points

And then they went and fixed things

What Changed in 2025

A year of serious investment into Ractors:

Ractors no longer crash the VM
Ractor::Port - targeted messaging
move: true - zero-copy transfer
JSON.parse(..., freeze: true) - no longer crashes in Ractors

And more. Time to try again.

The Investigation

Attempt 1: Naive Deep Copy

result = Ractor.new(payload) { |p| JSON.parse(p) }.take

Parse JSON in Ractors, send results back

0.29x

3.4x slower than sequential. Ouch.

Attempt 2: `make_shareable`

results = payloads.map { |p| JSON.parse(p) }
Ractor.make_shareable(results)

Deep-freeze everything so it can be shared

0.42x

Still slower than sequential.

JSON.parse is a C extension.

What if it could freeze objects while creating them?

The Breakthrough

JSON.parse(payload, freeze: true)

Returns objects that are already Ractor-shareable

Transfer Strategy Comparison (65KB payload)

2.3x

From 0.29x to 2.3x. Same Ractors. One flag.

Frozen Input = Zero-Copy In

payload.freeze at the source
Frozen strings are Ractor-shareable for free

Frozen in, freeze: true out. Zero-copy both ways.

But Which Ractor Pattern?

There's more than one way to coordinate Ractors

4 Patterns Tested

Pattern	How it works
A) Thread-Coordinated	Readiness signaling, Mutex dispatch
B) Pre-Partitioned	Blast-send N chunks, no protocol
C) Ephemeral	Fresh Ractors per batch
D) Ractor Supervisor	Coordinator Ractor in the middle

Single Producer Results

All patterns win. But which one handles contention?

Narrowing Down

Ephemeral - 0.88x on small payloads
Ractor Supervisor - 10% overhead on large payloads

Two remain: Thread-Coordinated vs Pre-Partitioned

In Production, It's Never Just One

Multiple consumers share a single Ractor pool

Single-producer benchmarks hide this. Production doesn't.

Under Contention

Tail Latency Tells the Story

Thread-Coord: predictable. Pre-Part: 40x tail spike.

Separation of Concerns

Callers never block
Coordinator matches work to available Ractors
Saturated? Work queues up; callers keep going

Coordinator Thread Overhead?

~0%

The Pattern

Thread-Coordinated Ractor Pool

Design Principles

Principle	How
Persistent pool	Create once, reuse forever
Non-blocking dispatch	Callers never wait
Zero-copy both ways	`move: true` in, `freeze: true` out
Per-message rescue	Errors never kill workers

Start Work Early

# Step 1: Dispatch to Ractors immediately
future = pool.dispatch_async(messages, deserializer)

# Step 2: Job sits in thread pool queue...
#         Ractors are already working

# Step 3: When a thread picks it up, results are waiting
results = future.retrieve

Ractors parse while the job waits in the thread pool queue.

Crossing the Ractor Boundary

# Mutable objects are rejected
ractor.send(message)  # => Ractor::IsolationError!

# Extract only what's needed into a frozen Data class
MessageProxy = Data.define(:raw_payload)
proxy = MessageProxy.new(raw_payload: payload)
ractor.send(proxy)    # works

The Worker (inside the Ractor)

loop do
  coordinator_port.send({ worker_id: wid, port: my_port })
  msg = my_port.receive
  results = msg[:data].map do |payload|
    parsed = JSON.parse(payload, freeze: true)
    validate(parsed)
    parsed
  rescue StandardError => e
    error_marker  # never crash the worker
  end
  msg[:result_port].send({ batch_index: msg[:bi], results: results })
end

Dispatch (coordinator thread)

# Caller side - non-blocking push
@work_queue.push({ data: batch, result_port: rp, batch_index: idx })

# Coordinator thread - match work to a ready worker
loop do
  work = @work_queue.pop               # blocks on work
  ready = @coordinator_port.receive          # blocks on worker
  ready[:port].send(work, move: true)  # zero-copy dispatch
end

Parse + Validate (500 msgs, 5 Ractors)

Best Result

~3.5x

5.1KB at 8+ Ractors

Move More Into the Ractor

Any isolated, CPU-bound work can move inside

More work per message = bigger gains

Mixed Realistic Workload

524B–78KB shuffled, parse + validate

Workers	Speedup
5 threads	1.16x
1 ractor	0.69x
2 ractors	1.33x
5 ractors	2.59x
8 ractors	2.82x

Why Not Just Threads?

Threads vs Ractors

Threads vs Ractors: Scaling

Threads plateau at ~1x.

Ractors bypass the GVL: 3.2x at 8 workers.

CPU Efficiency

Real parallelism, not just overhead.

Beyond Microbenchmarks

Microbenchmarks lie. Production doesn't.

End-to-End Setup

Karafka → broker → Karafka → Ractor pool
6 topics: 1, 5, 10, 25, 50, 100 KB
Payload shapes collected from real Karafka users
4 Ractors

JSON Crossover (End-to-End)

Wins 1.35x–1.75x across every size.

More Threads Don't Help (25 KB)

Inline: GVL-flat. Pool: scales and holds.

Avro Crossover (End-to-End, Patched Gem)

Same pattern, inverted shape.

Why the Different Shape?

	JSON	Avro
Parse cost / byte	High (text scanning)	Low (binary, schema-driven)
Sweet spot	Mid-range (5-25 KB)	Small (1-10 KB)
Drops off at	100 KB+	50 KB+

Speedup tracks parse-to-overhead ratio, not payload size alone.

Ractors vs "Just Use More Threads"

Payload	Best of 5-10 threads	1 thread + 8 Ractors	Speedup
small (1.7 KB)	6,755 msg/s	15,049 msg/s	2.2x
medium (6 KB)	1,680 msg/s	9,553 msg/s	5.7x
large (64 KB)	188 msg/s	255 msg/s	1.4x
xlarge (163 KB)	47 msg/s	139 msg/s	3.0x

1 thread + Ractors beats 5-10 threads at every size.

The GVL Contention Trap: Threads + Ractors

Payload	R=4, c=1	R=4, c=5	R=4, c=10	Degradation
small (1.7 KB)	13,072	12,683	10,339	−3% → −21%
medium (6 KB)	3,826	1,495	1,665	−61%
large (64 KB)	248	93	59	−62% → −76%

Sweet spot: concurrency=1 + Ractors.

The Honest Takeaway

Ractors: 1.4x - 5.7x faster than adding threads.

1.35x - 1.75x faster than sequential baseline.

The GVL bottleneck is gone. Framework overhead remains - separate problem, separate fix.

Production Concerns

The part skeptics care about

Memory Overhead

Pool size	Idle ΔRSS	Active ΔRSS	Per worker
5 workers	~0.2 MB	~1.5 MB	~0.3 MB
10 workers	~0.3 MB	~3.0 MB	~0.3 MB
20 workers	~0.4 MB	~6.0 MB	~0.3 MB

~0.3 MB per active worker. Linear, not multiplicative.

A Karafka consumer typically uses hundreds of MB. The pool is a rounding error.

Ractor Compatibility

Library	Works?
JSON (stdlib)	YES
CSV (stdlib)	YES
YAML/Psych	YES
ERB (stdlib)	YES
MessagePack	NO - UnsafeError
Avro	ALMOST - with patches
dry-validation	NO - Proc closures
json_schemer	NO - "not supported yet"

Only stdlib is Ractor-safe today.

Cost for Non-Ractor Users?

Extra work per batch	Cost
`parallel?` cached ivar check	~17 ns
`Immediate#retrieve` → nil	~39 ns
BatchMetadata extra field	~0 ns
Total overhead per batch	~56 ns

0.075% of a typical batch. Effectively zero.

Decision Table

Payload	Min messages to win
18KB+	25
5KB+	50
500B	100
< 500B, < 50 msgs	Don't bother

Enabling It

Karafka::App.setup do |config|
  config.deserializing.parallel.active = true
  config.deserializing.parallel.concurrency = 4
  config.deserializing.parallel.min_payloads = 50
end

Per-topic opt-in:

topic :events do
  deserializer MyDeserializer.new.freeze
  deserializing(parallel: true)
end

Beyond Deserialization: ERB Templates

ERB is 100% pure Ruby.

Threads: GVL. Ractors: parallel.

ERB: 5 Complex Templates in Parallel

Approach	Time	vs Linear
Linear (sequential)	2.28 ms	1.00x
Thread pool (5 workers)	2.66 ms	0.86x (slower!)
Ractor pool (5 workers)	1.24 ms	1.84x

Ractors 1.84x faster. Threads make it slower.

ERB Scaling: N Report Templates

N	Linear	Threads (5w)	Ractors (5w)	Speedup
5	3.1 ms	3.3 ms	1.4 ms	2.16x
10	6.2 ms	6.6 ms	2.2 ms	2.82x
20	11.7 ms	12.9 ms	4.3 ms	2.69x
50	31.2 ms	33.8 ms	13.5 ms	2.32x

2–3x. Threads never help.

The Pattern Generalizes

Template rendering (ERB, Haml, Slim)
Report generation
Data transformation pipelines
Batch email rendering

Frozen in → Ractor → Frozen out

But: I/O-Heavy Consumers

One 100ms DB call → deserialization is <2%
Use threads/fibers there instead

Complementary, not competing.

When NOT to Use Ractors

Small batches (< 50 msgs)
Tiny payloads (< 500B)
I/O-heavy consumers
Non-stdlib deserializers
Mutation-heavy consumer code

The One Slide Summary

	Threads	Ractors
GVL	Blocked	Bypassed
Scaling (5 workers)	0.82x	2.5x
Scaling (8 workers)	0.81x	3.17x
Memory overhead	~0	~1.5 MB
Error recovery	Same	Same (~6%)
Ecosystem	Everything works	Stdlib only (today)

What's Next

Karafka ships this as opt-in
Exploring lazy result consumption for better overlap
Making Avro, MessagePack Ractor-safe
Gem authors: make just your hot path Ractor-safe

The Takeaway

Ractors work. Today.

Frozen in → Ractor → Frozen out.

Where to Find This


Karafka	`karafka` gem ≥ 2.5
Source	`github.com/karafka/karafka`
JSON `freeze: true`	`json` gem ≥ 2.7
`Ractor::Port`	Ruby 4.0+

Thanks

Jean Boussier
Luke Gruber
John Hawthorn
Aaron Patterson
Koichi Sasada
Peter Zhu
The Ruby core team
Karafka supporters

None of this exists without their work.

Talk to Me About

Karafka, Kafka, Ruby, Ractors
Stream processing at scale
Making your gems Ractor-safe

THX

@maciejmensfeld

github.com/karafka

github.com/mensfeld