Karafka – Ruby framework for building Kafka message based applications

Maciej Mensfeld
twitter: @maciejmensfeld
www: mensfeld.pl
e-mail: maciej@mensfeld.pl

A bit about me

  • I'm from Poland (Europe)
  • 9 years of commercial exp in IT
  • 7 years with Ruby and Rails
  • I find Ruby quite useful
  • I love open-source
  • Interested in quality-assurance automation tools
  • Running a blog mostly about Ruby related stuff

Please notify me if...

  • I speak 2 fast
  • I should repeat something
  • I should explain something better
  • You have any questions

What is Apache Kafka?

  • Kafka is a high-throughput distributed messaging system
  • Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization
  • It can be elastically and transparently expanded without downtime
  • It provides broadcasting to many applications
  • Allows to build systems that are event based

Who uses Apache Kafka?

  • Linkedin
  • Yahoo
  • Twitter
  • Netflix
  • Square
  • Spotify
  • Pinterest
  • Uber
  • Tumblr
  • Cisco
  • Foursquare
  • Shopify
  • Oracle
  • Urban Airship
  • OVH
  • And many more...

What is Karafka?

  • Karafka = Kafka + Ruby => KaR(uby)afka
  • It is a microframework
  • It was designed to simplify Kafka based applications development
  • It allows developers to build "Rails like" apps that consume and produce messages

Why we developed Karafka?

  • We've needed a tool that would allow us to build applications faster
  • We've needed a tool that would allow us to process faster
  • We've needed a tool that would allow us to handle events and messages from many sources and process them the same way
  • Because single message can be automatically delivered to many Karafka applications

Why even bother with messaging when there is HTTP and REST?

  • HTTP does not provide broadcasting
  • We often need to trigger many actions based on a single event
  • We don't want to maintain internal API clients
  • With a message broker you can replace microservices transparently
  • You can obtain better microservices isolation
  • Because you can create new microservices that use multiple different events from many sources

It really is about messaging

Real life is asynchronous

Microservices without broadcasting

Without a broker you need to add code to both ends of your SOA system

Microservices with broadcasting

With a broker all you need to know is topic on which you want to listen and a message format

Karafka uses goods that are already well known

  • Ruby-Kafka
  • Celluloid to introduce sockets clustering inside threads
  • Sidekiq to support background data processing
  • Rails app structure concept for bigger apps
  • Sinatra app structure concept for small apps

Karafka ecosystem

Each part can be used independently

  • Karafka Framework - Engine to process incoming messages
  • WaterDrop - Ruby-Kafka based library for outgoing messages
  • Worker Glass - Worker wrapper that provides optional timeout and after failure (reentrancy)

Karafka framework components

Apart of the implementation details, Karafka is combined from few logical parts:

  • Messages Consumer (Karafka::Connection::Consumer)
  • Router (Karafka::Routing::Router)
  • Base Controller (Karafka::BaseController)
  • Base Worker (Karafka::BaseWorker)
  • CLI (Karafka::Cli)

Karafka framework components

How can I start using it?


# Gemfile
source 'https://rubygems.org'

gem 'karafka', github: 'karafka/karafka'

bundle install
bundle exec karafka install

Then open app.rb and update configuration settings

All the configutation options are described here:
github.com/karafka/karafka

Karafka conventions and features

Karafka conventions and features

Karafka has a routing engine similar to the Rails one (just much smaller)


App.routes.draw do
  topic :incoming_messages do
    group :composed_application
    controller Videos::DetailsController
    worker Workers::DetailsWorker
    parser Parsers::BinaryToJson
    interchanger Interchangers::Binary
  end

  # If you work with JSON data, only controller is required
  topic :new_videos do
    controller Videos::NewVideosController
  end
end

Karafka conventions and features


NewVideosController  #=> NewVideosWorker
Users::PaymentsController #=> Users::PaymentsWorker

By default Karafka builds a worker class per controller based on a controller name. This will allow you to prioritize (if needed) Sidekiq workers

Karafka conventions and features

You can overwrite all of the default behaviours


  # If you work with JSON data, only controller is required
  topic :new_videos do
    controller Videos::NewVideosController
    # Instead of a default Videos::NewVideosWorker
    worker Videos::DifferentWorker
  end
end

Karafka conventions and features

Karafka controllers are simple. All you need is a #perform method that will be executed asynchronously in response to an incoming message


class CreateVideosController < Karafka::BaseController
  def perform
    Video.create!(params[:video])
  end
end

Karafka conventions and features

#before_enqueue filter that acts in a similar way to Rails #before_action


class CreateVideosController < Karafka::BaseController
  before_enqueue -> {
    # Reject old incoming messages
    # When before_enqueue throws false,
    # task won't be send to Sidekiq
    throw(:abort) if params[:sent_at] < 1.minute.ago
  }
end

It can be used to provide first layer data filtering. If it returns false, Sidekiq task won't be scheduled

Karafka conventions and features

There are also few usefull CLI commands available:

bundle exec karafka [COMMAND]

console # Start the Karafka console (short-cut alias: "c")
flow    # Print application data flow (incoming => outgoing)
help    # Describe available commands or one specific command
info    # Print configuration details and other options
install # Install all required things for Karafka application
routes  # Print out all defined routes in alphabetical order
server  # Start the Karafka server (short-cut alias: "s")
worker  # Start the Karafka Sidekiq worker (short-cut alias: "w")

Karafka performance

  • Is strongly dependent on what you do in your code
  • Redis performance (for Sidekiq) is a factor as well
  • Message size is a factor
  • Single process can handle around 30 000 messages/sec
  • Less than a ms to send a message with the slowest (secure) mode (Kafka request.required.acks -1)
  • Less than 1/10 of a ms to send a message with in the 0 mode (Kafka request.required.acks 0)

Karafka framework scalability

Each scaling strategy targets a different problem

Scaling strategies can be combined

Following strategies are available:

  • Scaling using multiple Karafka threads
  • Scaling using Kafka partitions
  • Scaling using Karafka clusterization (in progress)

Scaling using multiple threads

  • Good when you have multiple topics that are not 100% utilized
  • Good when you want to provide paralleism but still have a single process running
  • Generally the easiest way to have multiple controllers listening at the same time

Scaling using multiple threads

Scaling using Kafka partitions

  • Topic partition is the unit of parallelism in Kafka
  • Partitions are an answer to heavy duty topics
  • Karafka processes automatically rebalances between available partitions
  • Karafka requires topics partitioning when you want to handle more than 30 000 messages per second per topic

Scaling using Kafka partitions

Scaling using Karafka processes clustering

  • Single Karafka process can handle up to 30 000 messages per second (total)
  • It means that the bigger your application is, the slower it gets (per controller)
  • Thanks to process clustering, each Karafka process will listen only to a selected part of topics
  • That way with a 10 process cluster, we can increase throughput to more than 300 000 messages per second

Scaling using Karafka processes clustering

I want to integrate it with my Rails/Sinatra app

  • The best approach is to start generating messages from your current applications via WaterDrop
  • With WaterDrop you can tell your Karafka apps what your other Ruby components are doing

def create
  video = Video.create!(params[:video])
  WaterDrop::Message.new(:video_created, video.to_json).send!
  respond_with video
end
  • Once you start sending messages, you can extract functionalities and responsibilities and move them to Karafka based applications

WANT TO CONTRIBUTE?

github.com/karafka

  • The more people star it, the more people use it
  • The more people use it, the more people star it
  • There are many issues you can help us fix
  • We use Code Climate and Travis with many QA tools to maintain the quality

READ MORE

THE END - Q & A