World beyond HTTP

Maciej Mensfeld
twitter: @maciejmensfeld
www: mensfeld.pl
e-mail: maciej@mensfeld.pl

$ whoami

9 years of commercial exp in IT
7 years with Ruby and Rails
Krakow Ruby User Group (KRUG) organizer
Interested in quality-assurance automation tools
Running a blog about programming
Did a lot of management stuff
Currently building something on my own

Please notify me if...

I speak 2 fast
I should repeat something
I should explain something better
You have any questions

Almighty HTTP

Before we go beyond something, it might be good to understand what it is

“The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web.”
Wikipedia

I <3 HTTP

HTTP is simple and flexible

We all use it all the time

It is the underlying cause for HTTP Status Cats API Website

http://http.cat

We could not live without it as we do

If it is that good and convenient, why go beyond that?

Familiarity can blind you

And this is what happened to me...

Ladies and gentlemen: the mess

First you get the feeling, then you figure out why

Why should I even care?

(apart from the obvious business reasons)

Quality matters

Care and Quality are internal and external aspects of the same thing. A person who sees Quality and feels it as he works is a person who cares. A person who cares about what he sees and does is a person who’s bound to have some characteristic of quality.
Robert M. Pirsig

When I have to work with people that don't care about quality at all, I want to do that...

But this was meant to be a technical talk, not a philosophical one...

But programming is a form of art and before you fix something, you need to know what the problems are

Performance
Eventual consistency (that might never happen)
Operational complexity
Multiple API clients overhead

There is an infinite number of solutions for each problem

But you need to pick one

and sometimes picking a proper solution to a problem, can solve many more

Many problems with SOA come from heavily relying on HTTP and synchronius requests that...

aren't really synchronius

I would not call chain of requests that get buffered somewhere or that trigger background jobs synchronous

Often HTTP in SOA is like some talks we have in our lifes

Life is not synchronous

So why our software should be?

Let's clean this mess by going beyond HTTP

Finally...

It really is about messaging

What is Apache Kafka?

Kafka is a high-throughput distributed messaging system
Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization
It can be elastically and transparently expanded without downtime
It provides broadcasting to many applications
Allows to build systems that are event based

Who uses Apache Kafka?

Linkedin (650 terabytes of consumed messages daily), Yahoo, Twitter, Netflix, Square, Spotify, Pinterest, Uber, Onet, Tumblr, Cisco, Foursquare, Shopify, Oracle, Urban Airship, OVH, ...

Multiple API clients overhead

Without a broker you need to add code to both ends of your SOA system

Multiple API clients overhead

With a broker all you need to know is topic on which you want to listen and a message format

Kafka + Ruby

What is Karafka?

Karafka = Kafka + Ruby => KaR(uby)afka
A microframework
It was designed to simplify Kafka based applications development
It allows developers to build "Rails like" apps that consume and produce messages

Why Karafka was developed?

We needed a tool that would allow us to build applications faster
We needed a tool that would allow us to process faster
We needed a tool that would allow us to handle events and messages from many sources and process them the same way
Because single message can be automatically delivered to many Karafka applications

But we have HTTP and REST, why go beyond?

o provide broadcasting, not being a part of HTTP standard
To trigger many actions based on a single event
To not maintain internal API clients
Without a broker it is harder to replace microservices transparently
Better microservices isolation
You can create new microservices that use multiple different events from many sources

Karafka uses goods that are already well known

Poseidon as a consumer (soon 2 be switched to Ruby-Kafka)
Celluloid introducing sockets clustering inside threads
Sidekiq to support background data processing
Rails app structure concept for bigger apps
Sinatra app structure concept for small apps

How can I start using it?


# Gemfile
source 'https://rubygems.org'

gem 'karafka', github: 'karafka/karafka'


bundle install
bundle exec karafka install

Then open app.rb and update configuration settings

All the configutation options are described here:
github.com/karafka/karafka

Karafka conventions and features

Karafka has a routing engine similar to Rails


App.routes.draw do
  topic :incoming_messages do
    group :composed_application
    controller Videos::DetailsController
    worker Workers::DetailsWorker
  end

  # If you work with JSON data, only controller is required
  topic :new_videos do
    controller Videos::NewVideosController
  end
end

Karafka controllers and workers


NewVideosController  #=> NewVideosWorker
Users::PaymentsController #=> Users::PaymentsWorker

Karafka by default builds a worker class per controller based on a controller name. This allows to prioritize (if needed) Sidekiq workers

You can overwrite default behaviours


  # If you work with JSON data, only controller is required
  topic :new_videos do
    controller Videos::NewVideosController
    # Instead of a default Videos::NewVideosWorker
    worker Videos::DifferentWorker
  end
end

#perform

Karafka controllers are simple. All you need is a #perform method that will be executed asynchronously in response to an incoming message


class CreateVideosController < Karafka::BaseController
  def perform
    Video.create!(params[:video])
  end
end

#before_enqueue filter

Acts in a similar way to Rails #before_action


class CreateVideosController < Karafka::BaseController
  before_enqueue -> {
    # Reject old incoming messages
    # When before_enqueue returns false,
    # task won't be send to Sidekiq
    return false if params[:sent_at] < 1.minute.ago
  }
end

It can be used to provide first layer data filtering. If it returns false, Sidekiq task won't be scheduled

CLI

There are also few usefull CLI commands available:


bundle exec karafka [COMMAND]

help    # Describe available commands or one specific command
console # Start the Karafka console (short-cut alias: "c")
info    # Print configuration details and other options
install # Install all required things for Karafka application
routes  # Print out all defined routes in alphabetical order
server  # Start the Karafka server (short-cut alias: "s")
topics  # List all topics available on Karafka server
worker  # Start the Karafka Sidekiq worker

Karafka performance

Is strongly dependent on what you do in your code
Redis performance (for Sidekiq) is a factor as well
Message size is a factor
Single process can handle around 30 000 messages/sec
Less than a ms to send a message with the slowest (secure) mode (Kafka request.required.acks -1)
Less than 1/10 of a ms to send a message with in the 0 mode (Kafka request.required.acks 0)

Karafka framework scalability

Each scaling strategy targets a different problem

Scaling strategies can be combined

Following strategies are available:

Scaling using multiple Karafka threads
Scaling using Kafka partitions
Scaling using Karafka clusterization (in progress)

I want to integrate it with my Rails/Sinatra app

The best approach is to start generating messages from your current applications
Messaging allows you to tell your Karafka apps what your other Ruby components are doing


def create
  video = Video.create!(params[:video])
  WaterDrop::Message.new(:video_created, video.to_json).send!
  respond_with video
end

Once you start sending messages, you can extract functionalities and responsibilities and move them to Karafka based applications

so back to the big question:

Why to go beyond?

One geometry cannot be more true than another; it can only be more convenient. Geometry is not true, it is advantageous.
Robert M. Pirsig

and the same applies to IT technologies

Going beyond your comfort zone can provide you with better solutions for current and upcoming problems.

WANT TO CONTRIBUTE?

github.com/karafka

The more people star it, the more people use it
The more people use it, the more people star it
There are many issues you can help us fix
We use many QA tools to maintain the quality

World beyond HTTP

$ whoami

Please notify me if...

Almighty HTTP

I <3 HTTP

HTTP is simple and flexible

We all use it all the time

It is the underlying cause for HTTP Status Cats API Website

We could not live without it as we do

If it is that good and convenient, why go beyond that?

Familiarity can blind you

And this is what happened to me...

Ladies and gentlemen: the mess

First you get the feeling, then you figure out why

Why should I even care?

Quality matters

But this was meant to be a technical talk, not a philosophical one...

But programming is a form of art and before you fix something, you need to know what the problems are

There is an infinite number of solutions for each problem

Often HTTP in SOA is like some talks we have in our lifes

Life is not synchronous

Let's clean this mess by going beyond HTTP

It really is about messaging

What is Apache Kafka?

Who uses Apache Kafka?

Multiple API clients overhead

Multiple API clients overhead

Kafka + Ruby

What is Karafka?

Why Karafka was developed?

But we have HTTP and REST, why go beyond?

Karafka uses goods that are already well known

How can I start using it?

Karafka conventions and features

Karafka has a routing engine similar to Rails

Karafka controllers and workers

You can overwrite default behaviours

#perform

#before_enqueue filter

CLI

Karafka performance

Karafka framework scalability

I want to integrate it with my Rails/Sinatra app

Why to go beyond?

WANT TO CONTRIBUTE?

READ MORE

THE END - Q & A