Thread-Coordinated Ractors

The Pattern That Delivers

Maciej Mensfeld

RubyKaigi 2026

Maciej Mensfeld

Karafka, Shoryuken, PGMQ-Ruby, COI (AI Sandbox)

RubyGems Security Team

Mend.io

@maciejmensfeld

A Long Wait

Waiting for Ractors since they were called Guilds.

Today's Plan

  • The Problem - why Ractors?
  • The Investigation - what works, what doesn't
  • Patterns - building the Ractor pool
  • Production - benchmarks, limits, takeaways

10+ Years of Background Processing

Fibers. Threads. Processes.

All useful. None of them parallelize CPU-bound work in a single process.

I Hit This Wall in Karafka

Raw bytes Deserialize 10,000 messages Filter keep 50 Business logic 50 messages

Deserialization runs on every message. Business logic runs on a fraction.

Where Does Time Actually Go?

Msg building Filtering Deserialization Validation User consume Offset mgmt 5.7% 1.0% 86.0% 3.5% 1.1% 2.6% CPU-only pipeline. No DB calls, no network I/O.

Bigger Payloads = More Time in Deserialization

0% 25% 50% 75% 100% 42% 86% 95% 99% 524B 5.1KB 18.8KB 78KB CPU-only pipeline. When your consumer hits a DB for 100ms, deserialization drops to <2%.

If I can speed up deserialization...

...I speed up everything.

I've Tried Everything

  • Threads - useless for CPU
  • Async - useless for parsing
  • Forking - per-process throughput unchanged

The GVL blocks parallelism for pure-Ruby CPU work.

Ractors

Look at me, I'm the thread now

Each Ractor has its own GVL. True parallel execution.

It's Not That Simple

  • You need the right use-case
  • You need the right data shape
  • You need to account for framework overhead
  • You need realistic expectations

Making Ractors deliver is an engineering challenge, not a flag you flip.

A Library Author's Perspective

I build middleman layers - between infrastructure and your code.

I control the plumbing, not the business logic.

I can't tell users to switch runtimes, gems, or architectures.

~95% run CRuby. I have to solve it here.

The Unlock

  • We can't assume user code is Ractor-safe
  • We can't require them to make it so
  • Library authors can isolate pure input/output operations

You don't need a Ractor-safe app.
You need Ractor-safe building blocks.

"Ractors are slow. Everyone knows that."

Claude Code says Ractors won't work with Karafka

3x slower

JSON parsing with Ractors was painfully slow

Bug #19288

I reported that bug myself.

RubyKaigi 2025

I talked with the core team about Ractor pain points

And then they went and fixed things

What Changed in 2025

A year of serious investment into Ractors:

  • Ractors no longer crash the VM
  • Ractor::Port - targeted messaging
  • move: true - zero-copy transfer
  • JSON.parse(..., freeze: true) - no longer crashes in Ractors

And more. Time to try again.

The Investigation

Attempt 1: Naive Deep Copy

result = Ractor.new(payload) { |p| JSON.parse(p) }.take

Parse JSON in Ractors, send results back

0.29x

3.4x slower than sequential. Ouch.

Attempt 2: make_shareable

results = payloads.map { |p| JSON.parse(p) }
Ractor.make_shareable(results)

Deep-freeze everything so it can be shared

0.42x

Still slower than sequential.

JSON.parse is a C extension.

What if it could freeze objects while creating them?

The Breakthrough

JSON.parse(payload, freeze: true)

Returns objects that are already Ractor-shareable

Transfer Strategy Comparison (65KB payload)

0x 1.0x 2.0x sequential 0.29x 0.42x 2.3x Deep copy make_shareable freeze: true

2.3x

From 0.29x to 2.3x. Same Ractors. One flag.

Frozen Input = Zero-Copy In

  • payload.freeze at the source
  • Frozen strings are Ractor-shareable for free

Frozen in, freeze: true out. Zero-copy both ways.

But Which Ractor Pattern?

There's more than one way to coordinate Ractors

4 Patterns Tested

PatternHow it works
A) Thread-CoordinatedReadiness signaling, Mutex dispatch
B) Pre-PartitionedBlast-send N chunks, no protocol
C) EphemeralFresh Ractors per batch
D) Ractor SupervisorCoordinator Ractor in the middle

Single Producer Results

0x 1.0x 2.0x 1.5x avg Thread-Coord 1.6x avg Pre-Partitioned 1.6x avg Ephemeral 1.6x avg Ractor Sup. 524B 5.1KB 18.8KB 78KB

All patterns win. But which one handles contention?

Narrowing Down

  • Ephemeral - 0.88x on small payloads
  • Ractor Supervisor - 10% overhead on large payloads

Two remain: Thread-Coordinated vs Pre-Partitioned

In Production, It's Never Just One

Multiple consumers share a single Ractor pool

Single-producer benchmarks hide this. Production doesn't.

Under Contention

0 30K 60K 90K 110K msgs/s 1 3 5 8 concurrent consumers 48K 93K 78K 76K 76K 100K 62K 42K Thread-Coordinated Pre-Partitioned

Tail Latency Tells the Story

0ms 25ms 50ms 75ms 100ms 11ms 27ms 2.5ms 100ms! Thread-Coordinated Pre-Partitioned p50 p99

Thread-Coord: predictable. Pre-Part: 40x tail spike.

Separation of Concerns

  • Callers never block
  • Coordinator matches work to available Ractors
  • Saturated? Work queues up; callers keep going

Coordinator Thread Overhead?

~0%

The Pattern

Thread-Coordinated Ractor Pool

Listener 1 Listener N ... Work Queue non-blocking push Coordinator Thread waits for ready worker + work Ractor Worker 1 - parse + validate Ractor Worker 2 - parse + validate ... Ractor Worker N - parse + validate move: true signal ready results via Future (already shareable - freeze: true)

Design Principles

PrincipleHow
Persistent poolCreate once, reuse forever
Non-blocking dispatchCallers never wait
Zero-copy both waysmove: true in, freeze: true out
Per-message rescueErrors never kill workers

Start Work Early

# Step 1: Dispatch to Ractors immediately
future = pool.dispatch_async(messages, deserializer)

# Step 2: Job sits in thread pool queue...
#         Ractors are already working

# Step 3: When a thread picks it up, results are waiting
results = future.retrieve

Ractors parse while the job waits in the thread pool queue.

Crossing the Ractor Boundary

# Mutable objects are rejected
ractor.send(message)  # => Ractor::IsolationError!

# Extract only what's needed into a frozen Data class
MessageProxy = Data.define(:raw_payload)
proxy = MessageProxy.new(raw_payload: payload)
ractor.send(proxy)    # works

The Worker (inside the Ractor)

loop do
  coordinator_port.send({ worker_id: wid, port: my_port })
  msg = my_port.receive
  results = msg[:data].map do |payload|
    parsed = JSON.parse(payload, freeze: true)
    validate(parsed)
    parsed
  rescue StandardError => e
    error_marker  # never crash the worker
  end
  msg[:result_port].send({ batch_index: msg[:bi], results: results })
end

Dispatch (coordinator thread)

# Caller side - non-blocking push
@work_queue.push({ data: batch, result_port: rp, batch_index: idx })

# Coordinator thread - match work to a ready worker
loop do
  work = @work_queue.pop               # blocks on work
  ready = @coordinator_port.receive          # blocks on worker
  ready[:port].send(work, move: true)  # zero-copy dispatch
end

Parse + Validate (500 msgs, 5 Ractors)

524B 5.1KB 18.8KB 78KB 1.0x 1.8x 2.6x 1.9x 1.9x

Best Result

~3.5x

5.1KB at 8+ Ractors

Move More Into the Ractor

Any isolated, CPU-bound work can move inside

More work per message = bigger gains

Mixed Realistic Workload

524B–78KB shuffled, parse + validate

WorkersSpeedup
5 threads1.16x
1 ractor0.69x
2 ractors1.33x
5 ractors2.59x
8 ractors2.82x

Why Not Just Threads?

Threads vs Ractors

0x 1.0x 2.0x 3.0x 4.0x 5.0x 1.0x 1.09x 2.43x Parse+validate 5 workers 0.81x 3.17x Scaling 8 workers 0.79x 2.27x Sustained continuous load 1.02x 2.44x Contention 5 consumers Threads Ractors

Threads vs Ractors: Scaling

0x 1.0x 2.0x 3.0x 4.0x 5.0x speedup 1 3 5 8 workers 1.0x 0.81x 3.17x Threads Ractors

Threads plateau at ~1x.

Ractors bypass the GVL: 3.2x at 8 workers.

CPU Efficiency

0x 1.0x 2.0x 3.0x 1.8x 1.4x 2.6x 1.5x 1.9x 1.3x 1.9x 1.7x 524B 5.1KB 18.8KB 78KB Wall speedup Total CPU cost

Real parallelism, not just overhead.

Beyond Microbenchmarks

Microbenchmarks lie. Production doesn't.

End-to-End Setup

  • Karafka → broker → Karafka → Ractor pool
  • 6 topics: 1, 5, 10, 25, 50, 100 KB
  • Payload shapes collected from real Karafka users
  • 4 Ractors

JSON Crossover (End-to-End)

0x 0.5x 1.0x 1.5x 2.0x seq 1.44x 1.57x 1.35x 1.75x 1.55x 1.36x 1 KB 5 KB 10 KB 25 KB 50 KB 100 KB 500 msgs per topic, 4 Ractors

Wins 1.35x–1.75x across every size.

More Threads Don't Help (25 KB)

0 250 500 750 1000 msgs/s 1 2 4 8 concurrency level 509 509 549 584 736 834 898 840 Threads (GVL-bound) Ractors

Inline: GVL-flat. Pool: scales and holds.

Avro Crossover (End-to-End, Patched Gem)

0x 0.5x 1.0x 1.5x 2.0x 2.5x seq 1.87x 1.64x 1.69x 1.62x 1.36x 0.99x 1 KB 5 KB 10 KB 25 KB 50 KB 100 KB 500 msgs, 4 Ractors, Avro gem + ~20-line Ractor-compat patch

Same pattern, inverted shape.

Why the Different Shape?

JSONAvro
Parse cost / byteHigh (text scanning)Low (binary, schema-driven)
Sweet spotMid-range (5-25 KB)Small (1-10 KB)
Drops off at100 KB+50 KB+

Speedup tracks parse-to-overhead ratio, not payload size alone.

Ractors vs "Just Use More Threads"

PayloadBest of 5-10 threads1 thread + 8 RactorsSpeedup
small (1.7 KB) 6,755 msg/s 15,049 msg/s 2.2x
medium (6 KB) 1,680 msg/s 9,553 msg/s 5.7x
large (64 KB) 188 msg/s 255 msg/s 1.4x
xlarge (163 KB) 47 msg/s 139 msg/s 3.0x

1 thread + Ractors beats 5-10 threads at every size.

The GVL Contention Trap: Threads + Ractors

PayloadR=4, c=1R=4, c=5R=4, c=10Degradation
small (1.7 KB) 13,072 12,683 10,339 −3% → −21%
medium (6 KB) 3,826 1,495 1,665 −61%
large (64 KB) 248 93 59 −62% → −76%

Sweet spot: concurrency=1 + Ractors.

The Honest Takeaway

Ractors: 1.4x - 5.7x faster than adding threads.

1.35x - 1.75x faster than sequential baseline.

The GVL bottleneck is gone. Framework overhead remains - separate problem, separate fix.

Production Concerns

The part skeptics care about

Memory Overhead

Pool sizeIdle ΔRSSActive ΔRSSPer worker
5 workers~0.2 MB~1.5 MB~0.3 MB
10 workers~0.3 MB~3.0 MB~0.3 MB
20 workers~0.4 MB~6.0 MB~0.3 MB

~0.3 MB per active worker. Linear, not multiplicative.

A Karafka consumer typically uses hundreds of MB. The pool is a rounding error.

Ractor Compatibility

LibraryWorks?
JSON (stdlib)YES
CSV (stdlib)YES
YAML/PsychYES
ERB (stdlib)YES
MessagePackNO - UnsafeError
AvroALMOST - with patches
dry-validationNO - Proc closures
json_schemerNO - "not supported yet"

Only stdlib is Ractor-safe today.

Cost for Non-Ractor Users?

Extra work per batchCost
parallel? cached ivar check~17 ns
Immediate#retrieve → nil~39 ns
BatchMetadata extra field~0 ns
Total overhead per batch~56 ns

0.075% of a typical batch. Effectively zero.

Decision Table

PayloadMin messages to win
18KB+25
5KB+50
500B100
< 500B, < 50 msgsDon't bother

Enabling It

Karafka::App.setup do |config|
  config.deserializing.parallel.active = true
  config.deserializing.parallel.concurrency = 4
  config.deserializing.parallel.min_payloads = 50
end

Per-topic opt-in:

topic :events do
  deserializer MyDeserializer.new.freeze
  deserializing(parallel: true)
end

Beyond Deserialization: ERB Templates

ERB is 100% pure Ruby.

Threads: GVL. Ractors: parallel.

ERB: 5 Complex Templates in Parallel

ApproachTimevs Linear
Linear (sequential) 2.28 ms 1.00x
Thread pool (5 workers) 2.66 ms 0.86x (slower!)
Ractor pool (5 workers) 1.24 ms 1.84x

Ractors 1.84x faster. Threads make it slower.

ERB Scaling: N Report Templates

NLinearThreads (5w)Ractors (5w)Speedup
5 3.1 ms 3.3 ms 1.4 ms 2.16x
10 6.2 ms 6.6 ms 2.2 ms 2.82x
20 11.7 ms 12.9 ms 4.3 ms 2.69x
50 31.2 ms 33.8 ms 13.5 ms 2.32x

2–3x. Threads never help.

The Pattern Generalizes

  • Template rendering (ERB, Haml, Slim)
  • Report generation
  • Data transformation pipelines
  • Batch email rendering

Frozen in → Ractor → Frozen out

But: I/O-Heavy Consumers

  • One 100ms DB call → deserialization is <2%
  • Use threads/fibers there instead

Complementary, not competing.

When NOT to Use Ractors

  • Small batches (< 50 msgs)
  • Tiny payloads (< 500B)
  • I/O-heavy consumers
  • Non-stdlib deserializers
  • Mutation-heavy consumer code

The One Slide Summary

ThreadsRactors
GVLBlockedBypassed
Scaling (5 workers)0.82x2.5x
Scaling (8 workers)0.81x3.17x
Memory overhead~0~1.5 MB
Error recoverySameSame (~6%)
EcosystemEverything worksStdlib only (today)

What's Next

  • Karafka ships this as opt-in
  • Exploring lazy result consumption for better overlap
  • Making Avro, MessagePack Ractor-safe
  • Gem authors: make just your hot path Ractor-safe

The Takeaway

Ractors work. Today.

Frozen in → Ractor → Frozen out.

Where to Find This

Karafkakarafka gem ≥ 2.5
Sourcegithub.com/karafka/karafka
JSON freeze: truejson gem ≥ 2.7
Ractor::PortRuby 4.0+

Thanks

  • Jean Boussier
  • Luke Gruber
  • John Hawthorn
  • Aaron Patterson
  • Koichi Sasada
  • Peter Zhu
  • The Ruby core team
  • Karafka supporters

None of this exists without their work.

Talk to Me About

  • Karafka, Kafka, Ruby, Ractors
  • Stream processing at scale
  • Making your gems Ractor-safe

THX

@maciejmensfeld

github.com/karafka

github.com/mensfeld

QR code to Ractor PR