You are on page 1of 33

Scalable Ruby Processing

with EventMachine
Mike Perham
And you are...?

Developer at OneSpot
memcache-client maintainer
data_fabric author
Scalable Processing?

Map/Reduce (Hadoop)
Message Queues
Efficient Processing!

Focus on maximizing machine utilization


Google tries for ~80% utilization
Status Quo

Typical Message Queue processing in Ruby:


Single Threaded
200MB (or more!) to process one message/sec?!
Load Average: 0.10 0.12 0.09
Blocking IO sucks
Rule of Thumb

Your code will spend 90% waiting for IO, 10% doing
actual work
Blocking

Why do you add indexes to a database table?


Why do you put data in memcached?
Blocking IO
File
Database
memcached
Net::HTTP
DNS lookups
system()
Solutions?

How do we maximize the blue?


Threading?

Create 10 threads, each process a message


concurrently
10% CPU * 10 = 100% CPU!
Java: good at threading
Ruby: not so much...
Threading?

Thread-unsafe extensions / libraries


Poor thread implementation
Ruby 1.8: Green Threads
Ruby 1.9: GIL
JRuby: the only good threading solution
Alternative?

What if we could...
Have Ruby work on one operation while another
waited on I/O?
Fill in the green gaps?
Without threads?
Evented IO rules
a concurrent programming pattern

EventMachine
for handling service requests
delivered concurrently to a service
handler by one or more inputs. The
service handler then demultiplexes
the incoming requests and
dispatches them synchronously to
the associated request handlers.

Ruby implementation of the Reactor pattern


Single threaded by default
Allows us to interleave multiple IO ops and a single
CPU op simultaneously
How does it work?
IO.select(rd, wr, ex)
select
epoll on Linux 2.6
kqueue on BSD
/dev/poll on Solaris
All bets are off on Windows
Issues
Inversion of Control

Application code becomes callbacks


makes error handling difficult
somewhat solved by Fibers
Inversion of Control
Without Fibers With Fibers
Coding

Difficult to understand
Little, poor documentation
Learning curve for newbies
Testing

Global context: reactor


Each test must setup/teardown a reactor
Whack-A-Mole

Blocking IO is everywhere
Easy to lose parallelism
Code
Evented

My EventMachine sample code repository


http://github.com/mperham/evented
Thumbnailer

Rack middleware to dynamically create thumbnails


Thin, EventMachine, ImageScience, em-http-request
Thumbnailer
Qanat

SQS processing daemon


Event-based S3, SimpleDB and SQS APIs
Uses Fibers with Ruby 1.9
EventMagick

system ==> EM.system


Execute ‘identify <JPEG>’ 640 times
system: 10 sec
EM.system: 5 sec
Example: system()
em_postgresql

ActiveRecord driver for Postgresql with EM


http://github.com/mperham/em_postgresql
Requires Ruby 1.9
Mysql? Use mysqlplus.
em_postgresql
Conclusions

Threading sucks
Blocking IO is everywhere
Use EM for IO to peg a single core
Use multiple processes for multi-core
Ruby 1.9 makes evented code nicer
Thank you!
Questions?
@mperham

mperham@gmail.com

You might also like