Valued Lessons

I've learned. I'll share.

June 27, 2008

Message Passing Conccurrency (Actor Model) in Python

As I wrote previously, I'm a fan of message-passing concurrency. But to use it in python, I had to write my own infrastructure. It's far from perfect, but it's working for me. I've simplified it a bit and prepared some code to share with you.

The basic idea of my framework is to be and simple and easy to use as possible. There's no Erlang-style pattern matching or process linking. There isn't even any built-in error handling. But, there's a lot of support for the typical "react loop" style of message handling.

The architecture is very simple. There's one thread per actor. An actor is just an object you can send a message to. The actor receives the messages in the order they were sent. You always know the sender of the message; it's built right in. There's also "Bridge" that allows you to interact with actors from non-actor code.

The only tricky thing you'll see is a "handles" decorator. This is used to help you identify how you want an actor to handle a given message type. It helps you avoid re-creating a message loop and dispatcher yourself for every actor.

I've included a little example of how to use the framework. It obviously quite simple, but keep in mind that I used little more for the foundation of a rather complex file synchronization application. A little bit goes a long way.

So, here it is. Cut, paste, and run.


from threading import Thread, Event
from Queue import Queue, Empty

class IActor:
pass
# send(message, sender)

class HandlerRegistry(dict):
# should be used as a decorator
def __call__(self, typ):
def register(func):
self[typ] = func
return func
return register

OtherMessage = "Other Message"

class Stop(Exception):
def __repr__(self):
return "Stop()"

class Stopped:
def __repr__(self):
return "Stopped()"

class ActorNotStartedError(Exception):
def __init__(self):
Exception.__init__(self, "actor not started")

class Actor(IActor):
@classmethod
def spawn(cls, *args, **kargs):
self = cls(*args, **kargs)
self.mailbox = Mailbox()
start_thread(target = self.act, as_daemon = True)
return self

def send(self, message, sender):
if self.mailbox is None:
raise ActorNotStartedError()
else:
self.mailbox.send(message, sender)

def receive(self):
if self.mailbox is None:
raise ActorNotStartedError()
else:
return self.mailbox.receive()

# override if necessary
def act(self):
self.handleMessages()



handles = HandlerRegistry()

@classmethod
def makeHandles(*classes):
return HandlerRegistry((typ, handler) for cls in classes for (typ, handler) in cls.handles.iteritems())

def handleMessages(self):
try:
while True:
message, sender = self.receive()
self.handleMessageWithRegistry(message, sender)
except Stop:
pass

def handleMessageWithRegistry(self, message, sender):
registry = self.__class__.handles
handler = registry.get(message.__class__) or registry.get(OtherMessage)
if handler is not None:
handler(self, message, sender)

@handles(OtherMessage)
def onOther(self, message, sender):
pass

@handles(Stop)
def onStop(self, message, sender):
sender.send(Stopped(), self)
raise message

def start_thread(target, as_daemon, name = None):
thread = Thread(target = target)
if name:
thread.setName(name)
thread.setDaemon(as_daemon)
thread.start()
return thread

class Mailbox:
def __init__(self):
self.mailbox = Queue()

def send(self, message, sender):
self.mailbox.put((message, sender), block = False)

def receive(self, timeout = None):
return self.mailbox.get(block = True, timeout = timeout)

class Bridge(IActor):
def __init__(self):
self.mailbox = Mailbox()

def send(self, message, sender):
self.mailbox.send(message, sender)

def call(self, target, request, timeout, default = None):
self.sendRequest(target, request)
return self.receiveResponse(timeout, default)

# targeted_requests can be an iterator
def multiCall(self, targeted_requests, timeout, default = None):
count = 0
for target, request in targeted_requests:
self.sendRequest(target, request)
count += 1

for _ in xrange(count):
yield self.receiveResponse(timeout, default)

def stop(self, actors, timeout):
stop = Stop()
return list(self.multiCall(((actor, stop) for actor in actors), timeout, default = None))

def sendRequest(self, target, request):
target.send(request, self)

def receiveResponse(self, timeout, default):
try:
message, sender = self.mailbox.receive(timeout = timeout)
return message
except Empty:
return default


if __name__ == "__main__":
import time

class GetInventory:
pass

class Task:
def __init__(self, input, destination):
self.input = input
self.destination = destination

class Worker(Actor):
handles = Actor.makeHandles()

def __init__(self, skill):
self.skill = skill

@handles(Task)
def onTask(self, task, sender):
output = self.skill(task.input)
task.destination.send(output, self)

class Warehouse(Actor):
handles = Actor.makeHandles()

def __init__(self):
self.inventory = []

@handles(GetInventory)
def onGetInventory(self, message, sender):
# copy the inventory to avoid anyone mutating it
sender.send(list(self.inventory), self)

@handles(OtherMessage)
def onTaskResult(self, result, sender):
self.inventory.append(result)

worker = Worker.spawn(lambda x : x * 2)
positives = Warehouse.spawn()
negatives = Warehouse.spawn()
bridge = Bridge()

for val in [1, 2, 3, -2, -4, -6]:
warehouse = positives if val >= 0 else negatives
worker.send(Task(val, warehouse), sender = None)

print bridge.call(positives, GetInventory(), 1.0) #should be [ 2, 4, 6]
print bridge.call(negatives, GetInventory(), 1.0) #should be [-4, -8, -12]
print bridge.stop([worker, positives, negatives], 1.0) #should be [Stopped(), Stopped(), Stopped()]


class Start:
def __init__(self, target):
self.target = target

class Ping:
def __repr__(self):
return "Ping()"

class Pong:
def __repr__(self):
return "Pong()"

class Pinger(Actor):
handles = Actor.makeHandles()

@handles(Start)
def onStart(self, start, sender):
start.target.send(Ping(), self)

@handles(Pong)
def onPong(self, pong, sender):
print "-",
sender.send(Ping(), self)

class Ponger(Actor):
handles = Actor.makeHandles()

@handles(Ping)
def onPing(self, ping, sender):
print "+",
sender.send(Pong(), self)

# should print lots of +-+-+-
pinger = Pinger.spawn()
ponger = Ponger.spawn()
pinger.send(Start(ponger), sender = None)
time.sleep(0.1)
bridge.stop([pinger, ponger], 1.0)

My Experience with Message Passing Concurrency

I'm working on a peer-to-peer file synchronization program. It's really concurrent and distributed, and it's forced me to
learn a thing or two about the concurrency models that we use as programmers. Over the past few years, I've tried both the shared-state model common to C, C++, Java, C#, Python, etc, and the message-passing model unique to Erlang and Scala. In my experience, the message-passing model (aka Actor Model) is far superior to the shared-state model for writing distributed applications. After having used both extensively, I'd go so far as to say that for distributed programming, shared-state concurrency is upside down and backwards.

Here's my informal proof that message-passing concurrency is necessary in a distributed system:


  1. Since the system in distributed, real shared state is impossible.

  2. The only way for the distributed components of an application to
    communicate is by sending a message from one to another.

  3. Everything else is an abstraction on top of message passing.



HTTP is an abstraction on top of message-passing. AJAX is an abstraction on top of HTTP on top of message-passing. XML-RPC is an abstraction on top of HTTP on top of message-passing. SOAP is an abstraction on top of an abstraction on top of an abstraction on top of HTTP on top of message-passing. Any RPC mechanism is an abstraction on top of message-passing. SQL queries to a remote database are an abstraction on top of message-passing. CORBA is a nasty, tangled mess on top of a foundation of message-passing.

See a pattern?

Not only are all of these abstractions, but they are leaky abstractions. Just about all RPC (Remote Procedure Call) frameworks try to pretend that remote objects or local. But the facade is impossible to keep from leaking. Calls to a remote object might fail or take arbitrarily long. If you want to make the same call to two different remote nodes, those calls must be made synchronously and sequentially; in order to call them in parallel on the remote nodes, they most be called in parallel on the local node. If a call to a remote node triggers a call back to the local node, which may trigger a call to a third node, you end up with a huge spaghetti mess of calls and threads.

I've been down that road. It wasn't pretty. There is a better way.

I think AJAX has opened our eyes a bit. It's a lot closer to message-passing than it is to RPC. In a REST architecture, you "send" messages by POSTing to a URL and you "receive" messages by GETing a URL. All messages are clearly local copies that were serialized and deserialized from the remote data. There isn't any leaky abstraction of data or classes pretending to be in two places at once. Data, timeouts, and failures are in your face and you have to deal with them. So, AJAX is a lot less leaky.

But AJAX is far from perfect. I can't help but think of the Greenspun's Tenth Rule of Programming with a twist on distributed programming: "Any sufficiently complicated distributed platform contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a message-passing system (Erlang)". Once you get message-passing in your head, you can't help but think of AJAX in that way.

I've been down a better road. It was much more pleasant.

About six months ago, I rewrote major portions of our application with message-passing concurrency. I took a long time. It was tricky. It hurt my head. But it worked. It's a success. It's capable of things that the shared-state system could never do.

Having done it, I can emphatically say that if I could start all over, I would go with message-passing. Most of the work was building the infrastructure and wrapping my head around a new way of thinking. But now that I've done both of those, I've paid the costs and I'm reaping the rewards. We're moving the application in directions that would have been nearly in possible with the old concurrency model.

In my experience, message-passing concurrency is the best way to write distributed applications. But it isn't an easy road to go down. Support for it just isn't there in most programming languages and environments. So far, Erlang has been the pioneer. I would humbly agree that the way I've implemented message-passing in Python, compared to Erlang, looks like an ad-hoc, bug-ridden half-implementation. But for various reasons, I can't use Erlang. I'm hoping for someone to create Erlang++ or E# or Eython or something that combines the concurrency model of Erlang with a modern programming language. Until then, I'll just keep on cobbling together what I can onto the programming language I happen to be using.

More on that in another article.

May 23, 2008

Wii Fit From A Programmer's Perspective (after one day)

I bought Wii Fit. It came in the mail yesterday and I had my first try at it last night and my first workout this morning. Although I haven't had a lot of time with it so far, I think I've had enough that I can share some observations.

I have read a lot about Wii Fit from a video game or fitness perspective, but I've never read about if from the perspective of a software developer. I think as software developers, we should have extra interest in it for three reasons:


  1. Most software developers are physically weak.

  2. The Wii Fit Balance Board is a new and unique user interface device.

  3. There might be some money to be made!



Wii Fit as an antidote to programmer physical weakness



I talked myself into buying Wii Fit because of #1. As a programmer, I sit all day with little or no physical activity. Most of my life, in fact, has involved sitting with little or no physical activity. If you're reading this, I'm guessing that you probably don't get much physical activity either. Well guess what: that makes us weak!

I know not all programmers are weak. My good friend Wes is a programmer, and he's pretty buff. But I'm weak, and I'm sure I'm not the only one. I didn't understand how weak until I played Wii Fit the first time. Just about every activity (and there are many) involved a seldom-used muscle. After a short while, I realized that I have a lot of unused muscles.

Will Wii Fit make me strong? I don't know. I suppose that depends on how much I use it. But if it pushes me to use lots of muscles that I never did, then it can only make me stronger, right? In other words, if I keep using it, my strength will increase monotonically. Eventually it will plateau, and it's limit is probably lower than going to a gym. But I'm not going to a gym, and this is sure better than nothing.

Plus, it's surprisingly enjoyable! I enjoyed my first workout this morning so much that an hour sped by before I looked at the time. Part of that is novelty, I'm sure, but there's something more to it, and I'm optimistic that I'll keep it up. I'm already looking forward to my next round tomorrow.

Wow. Did you hear what I just said? I said I'm looking forward to getting up early to have time to excersize. Either I got hit in the head with a coconut or Wii Fit is really on to something.


Wii Fit Balance Board as a user interface device.



Although I bought for the exercise, I'm interested in the potential of the Balance Board that comes with Wii Fit. It's one of the most
unique user interface devices I've ever used.

Nintendo seems to be on a role with user interface devices. It's success at turning a gyroscope into sports simulation led to the huge success of the Wii. The availability of wiimotes led to an explosion of interesting
uses
and interest in hacking it. It's certainly an interface device success story.

The big question is: how useful with the Balance Board be? In Wii Fit, there seem to be two kinds of uses:


  1. A measuring device.

  2. A control device.



As a measure device, I think it's very useful. By "measure", I mean that the board measures how well you do a task. For example, I was doing "lunges", and it noticed that I was bending my leg too far, and told me so. That's pretty impressive. When I doing push-ups, it had a rhythm to follow and it really helped to know that someone was "watching"; I couldn't wimp out. Even just standing on it let me know that my whole life I've been standing too much on my heals. How's that for some realization!

Just like Wii Sports was the perfect application for the Wiimote, I think Wii Fit is the perfect application for the Balance Board. I can't think of an activity better suited to it's strong point: measuring.

But what else can it do? When we think of a user interface device, we usually think of controlling something, whether it be a web browser or a game character. As a developer, this is the possibility I was most excited about. After having used it, how well does the Balance Board work as a controlling device? I'm not so sure.

Wii Fit comes with a couple of "balance games" showing various ways of using the board to control something. There's leaning left and right to ski, crouching and standing to ski jump, and leaning in all
directions to roll a ball into a hole. Some of them are actually quite impressive. But I still doubt the board's effectiveness for precise control.

The problem is I find it's really hard to control my balance. I'm not kidding. Just rolling a little ball down into the hole was
really hard. My first time on the ski slope was a disaster. I read that a skateboard game is going to support the board. That might end up being so realistic that I can't control it, just like in real life!

I think a big part of it is that controlling my balancing is something I've never really done before. Imagine handing someone who has never played video games an Xbox controller and watching them play Halo. It won't be pretty. Similarly, I remember when I was a kid and the mouse was a new input device. My teacher at school couldn't handle it. She'd never had to control anything like it. Like her, I am facing a new control device, and there's a learning curve.

Long story short: The Wii Fit Balance Board is hard to use for precise control, but maybe it's just me. Hopefully, my skill with it will improve, and we as software developers will think of
interesting ways of using it.

Profit



There's definetely some money to be made writing software for the Balance Board. How so?


  • Millions of households already have a Balance Board

  • Millions more will soon.

  • They will show it to everyone they know.

  • Many will want more beyond Wii Fit.



Even though it's only one player at a time, it a hilarious game to play in a group. Some family happened to be over last night, and they couldn't get enough of it. It was a surprisingly fun watching each other try and hula hoop and dodge shoes. The first decent, cheap party game featuring compatibility with the Balance Board is going to sell like hot cakes (which...sell fast, right?).

I'm not saying who is going to make a lot of money. Nintendo might make it all. But there's probably room for little guys too.
With WiiWare, games like LostWinds have shown that a small developer can take a simple idea for a new interface, add a little bit of style, and make a great game. If someone made a game like LostWinds for the Balance Board for $10, I
wouldn't hesitate to buy it.

I'd be excited just to have some Linux drivers and an interesting open source hack to control my computer somehow, even if it's something silly like that MacBook-turned-into-lightsaber trick. Someone might even think of way to really "surf the web". I don't think there's much many in that, though :).



So far, I'm glad I bought Wii Fit. I hope to keep it up and strengthen my muscles. It's a very fun party game, which should lead to widespread adoption and an opportunity to create interesting uses for the Balance Board. It's hard to control precisely, but I hope it will get easier as we learn to control our balance (something that some of us have never done before).

April 28, 2008

Events in Python

I'm a huge fan of the Actor Model . I think that for most applications, it's the best way to do concurrency. As CPUs get more cores, concurrency becomes more important for programmers, and I think the Actor Model will become more important, too.

But, sometimes you need something more light-weight for handling "events". For example, imagine you have some code listening for file changes in a particular directory. What you'd like to do is make an "event" that is "fired" whenever a file change is detected. When fired, there may be a "handler" or "listener" which is notified of the event. That "handler" is ultimately just some code which is executed when the event occurs.

A while ago, I wanted an event system like this for Python. I didn't see anything builtin or any library available, so I decided to write my own. I'd like to share what I created with you.

But first, I want to follow an example through other programming languages to give you an idea of what I was trying to accomplish and how it compares with what's out there. After that, I'll give you my implemenation in Python. Our example will be listening for changes on the file system. We want to keep the "watcher" code decoupled from the rest of the code, so we use events.

The best implementation of events that I've used is in C#, so we'll start there. In C# 1.0, our file watcher would looks something like this:


public delegate void FileChangeHandler(string source_path);

class FileWatcher
{
public event FileChangeHandler FileChanged;

public void WatchForChanges()
{
...
if(FileChanged != null)
{
FileChanged(source_path);
}
...
}
}

class FileChangeLogger
{
public void OnFileChanged(string source_path)
{
Console.WriteLine(String.Format("{0} changed.", source_path));
}
}

watcher = FileWatcher();
logger = FileChangeLogger();
watcher.FileChanged += new FileChangeHandler(logger.OnFileChanged);
watcher.WatchForChanges();




It's pretty nice. The best part is at the end, where you can write "watcher.FileChange += ...". But, you need to type the completely useless "new FileChangeHandler", and you also need to wrap it all in a separate FileChangeLogger class. Luckily, in C# 2.0, they added Anonymous Delegates, which makes this much nicer:


watcher.FileChanged += delegate(string source_path)
{
Console.WriteLine(String.Format("{0} changed.", source_path));
}


And in C# 3.0, they've made it even nicer!


watcher.FileChanged += (source_path => Console.WriteLine(String.Format("{0} changed.", source_path)));


C# 3.0 has an event system that's downright slick, with type-inferencing and everything. I don't think it gets much better than that.

Actually, there is one thing. If there's no handler registered with the event, calling "FileChanged()" will blow up because it sees FileChanged == null until a handler is registered. This means that you have to write "if(Event != null){Event(...):}" every single time you fire the event. Every single time. If you forget, your code will seem to work fine until there's no handler, in which case it will blow up and you'll smack your forhead because you forgot that you have to repeat that line of code every single time you fire an event. I really mean every single time. It's by far the worst wart on an otherwise elgant system. I have no idea why the designers of C# thought this was a good idea. When would you ever want to fire and event and have it blow up in your face?


Anyway, let's try a different programming language, perhaps Java. It has the worst implementation of events I've ever seen. Here's our FileWatcher example:

...

...

Ok, nevermind, I don't have the heart. I can imagine the code full of IListeners, and implementations, and keeping an array of them, and iterating over them, etc, and I just can't do it. I got seriously upset at one useless line in C#. In Java, it's at least 10 times worse. If you really have the stomach for it, go look at http://java.sun.com/docs/books/tutorial/uiswing/events/index.html. To me, it appears that in Java, events are one giant hack around the lack of first-class functions. If Java had first-class functions, none of that nonsense would be necessary.


Now that we've seen a good implementation of events in C# and avoided a bad one in Java, let's make one for Python. I'd rather it be like the C# event system, so let's see what our example would look like:


from Event import Event

class FileWatcher:
def __init__(self):
self.fileChanged = Event()

def watchFiles(self):
...
self.fileChanged(source_path)
...

def log_file_change(source_path):
print "%r changed." % (source_path,)

watcher = FileWatcher()
watcher.fileChanged += log_file_change



I think that looks pretty good. So what does the implementation of Event look like?



class Event(IEvent):
def __init__(self):
self.handlers = set()

def handle(self, handler):
self.handler.add(handler)
return self

def unhandle(self, handler):
try:
self.handlers.remove(handler)
except:
raise ValueError("Handler is not handling this event, so cannot unhandle it.")
return self

def fire(self, *args, **kargs):
for handler in self.handlers:
handler(*args, **kargs)

def getHandlerCount(self):
return len(self.handlers)

__iadd__ = handle
__isub__ = unhandle
__call__ = fire
__len__ = getHandlerCount


Wow. That was pretty short. Actually, this is one of the reasons I love Python. If the language doesn't have a feature, we can probably add it. We just added one of C#'s best features to Python in 26 lines of code. More importantly, we now have a nice, light-weight, easy-to-use event system for Python.

For all of you how like full examples that you can cut and paste, here is one that you can run. Enjoy!



class Event:
def __init__(self):
self.handlers = set()

def handle(self, handler):
self.handlers.add(handler)
return self

def unhandle(self, handler):
try:
self.handlers.remove(handler)
except:
raise ValueError("Handler is not handling this event, so cannot unhandle it.")
return self

def fire(self, *args, **kargs):
for handler in self.handlers:
handler(*args, **kargs)

def getHandlerCount(self):
return len(self.handlers)

__iadd__ = handle
__isub__ = unhandle
__call__ = fire
__len__ = getHandlerCount

class MockFileWatcher:
def __init__(self):
self.fileChanged = Event()

def watchFiles(self):
source_path = "foo"
self.fileChanged(source_path)

def log_file_change(source_path):
print "%r changed." % (source_path,)

def log_file_change2(source_path):
print "%r changed!" % (source_path,)

watcher = MockFileWatcher()
watcher.fileChanged += log_file_change2
watcher.fileChanged += log_file_change
watcher.fileChanged -= log_file_change2
watcher.watchFiles()

April 2, 2008

Banana Cream Pie Shake in 30 Seconds

I don't intend this blog to be purely about programming matters. I intend it to be about anything I've learned and would like to share. Today I'd like to share with you an incredibly delicious and easy desert recipe that I've never seen before recently but stumbled upon. You'll love it. But if you are really only interested in programming topics, then I suggest you change your bookmark to a programming-label-only link or your feed reader to a programming-label-only feed to enjoy just my posts about programming (at least I hope you enjoy them :).

No about that recipe... If you like banana cream pie or a banana shake, this recipe is right up your alley. It's a banana cream pie shake. It tastes exactly like a banana cream pie, except that you can drink it, it's healthy for you, and it only takes 30 seconds. Here's the recipe:


2 frozen bananas (I chop them a bit to blend easier)
1 cup of liquid (I use soy milk)
blend until smooth


I'm still astounded that something that costs so little (clearly less than $1) and takes so little time (a few minutes if you are slow) is so delicious and healthy.

Finally, a little FAQ:


Q: Does it really only takes 30 seconds?

A: It only takes 30 seconds to blend, but obviously bananas take a few hours to freeze, and you may spend a minute or two cutting, pouring, and cleaning up. Not counting the freezing, the whole process is just a few minutes.


Q: What kind of blender do I need?

I have a Blendtec TotalBlender. Have you seen them blend an iPhone? That's my blender. I'll hopefully review it for you someday; it really is that good. It's some of the best $400 I've ever spent. Frozen bananas are no challenge for it.

If you have a lesser blender and it doesn't work at first, I'd recommend that you let the bananas thaw for a little while.


Q: Does it really taste that good?

Yes. It's so good I'm about to stop writing this to make some more. I drank the last one so fast it didn't even make it through this whole post.


Q: Is it really healthy?

If the ingredients are healthy, it's healthy. Bananas are healthy. Just use healthy liquid. You can even toss in extra healthy ingredients like flax seed and make it even healthier without altering the taste much. Adding chocolate might make it more tasty, but it will also make it less healthy. I prefer it without chocolate anyway.


Q: I tried it and it's too thick and creamy.

A: Add ice. It dilutes it while keeping it cold and smooth.



Enjoy!


Update: I made it again with a different batch of bananas, and it tasted different. It appears that the ripeness of the bananas makes a big difference. More ripe is more tasty. So, wait until the bananas are very ripe before freezing them.

Update 2: I sprinkled on a little cinnamon, and it tastes like a mix between a shake, banana bread, and even egg nog. It's delicious!

April 1, 2008

You Could Have Invented Monadic Parsing

Even though Pysec (and functional programming in python in general) turned out to be 2-3x slower, monadic parsing is still great for certain tasks. After I wrote about Pysec the first time, I received comments regarding pyparsing, which is a great parsing library that looks very similar. Today, I'd like to show you a very simple example where monadic parsing really shines above all other parsing techniques, including pyparsing. This is a real-world need that I had, not just an academic thought-experiment.

The situation is that I'm writing a peer-to-peer file synchronizer and I need to serialize binary data in an efficient way. I've choosen to use what I call "sized binary", which is basically netstrings without the comma or bencode's byte string. The byte-string "ABC", for example, would be encoded as "3:ABC", or "Hello World!" would be "12:Hello World!". This is very efficient because "ABC" or "Hello World!" can be any bytes \x00-\xFF, without any sort of encoding or decoding like uuencode.

Stop and think for a minute how you would define a grammer to parse "3:ABC". It's so simple, right? Well, try and write a grammar for it. Try using BNF, EBNF, yacc/bison, pyparsing, SimpleParse, or any parsing libarary you know of, in any programming language. If you manage, please write me an email and let me know how you did it, because I'm not so sure it's even possible. Maybe it is in the Perl6 grammars; they're throwing the kitchen sink into that thing :). The best I've seen is that the creator of pyparsing hacked a grammar object to change how it parses the "ABC" part after a different object parsed the "3" part. This basically uses the grammar object as a global variable. It works in a single-core world, but it won't work in the many-core world we're headed for.

But now you're getting bored with me because it's silly ringing my hands over such a simple format. I bet you could parse this format in just 4 lines of code:


def parse_sized_binary(stream):
size_bytes, stream = stream.readUntil(":")
size = int(size_bytes)
bytes, stream = stream.read(size)
return bytes, stream


You're right, it's really simple, at least until you have to embed this little "sized binary language" inside of a bigger language. Perhaps, like me, you need to embed binary data inside of a bigger data structure, and so you need to define a serialization for things like ints, floats, lists, and hash tables. For that stuff, you can define a grammer for a language like JSON or YAML and hand it over to a grammar-based parser library. But now you need a way to tell the library, "OK, when you see my binary data, hand control over to my four lines of code, and then I'll hand control back". You might even say "You know, why don't you just let me hand you any function of the form stream -> (value, stream) and let me control the parsing for a while".

Congratulations, you just invented monadic parsing. The function you wrote, parse_sized_binary, of the form stream -> (bytes, stream), is a Parser Monad. Monadic parsing is just combining lots of these little parsers together. And so, it's incredibly easy to insert arbitrary parsing code, like parse_sized_binary, into any parser.

As a complete example, using Pysec, here's a full parser for a language just like bencode but with support for floats and unicode. Notice that our parse_size_binary is renamed to bencode_sized and is given a more functional definition.


from Pysec import Stub, until, lift, read, many_until, branch

bencode_any = Stub()
bencode_sized = until(":") >> lift(int) >> read
bencode_unsized = until("e")
bencode_list = many_until(bencode_any, "e")
bencode_any.set(branch({"s" : bencode_sized,
"u" : bencode_sized >> lift(lambda bytes : bytes.decode("utf8")),
"i" : bencode_unsized >> lift(int),
"f" : bencode_unsized >> lift(float),
"l" : bencode_list,
"d" : bencode_list >> lift(dict)}))


I think this is sort of like Greenspun's Tenth Rule. Any implementation of netstrings/sized-binary-data probably has an ad-hoc implementation of monadic parsing in it. Even my own non-monadic version of this parser (which I use for performance reasons) has an ad-hoc implementation of monadic parsing. I'm convinced that if you write a parser for a similar format, you'll implement/invent monadic parsing as well, even if you don't know it.

March 19, 2008

Why are my monads so slow?

If you've read the last few posts, you know that I choose to use monadic parsing for some application code I am writing. At an abstract level, it turned out beautifully. I was able to write the parser very elgantly and compactly despite the serialization format having some unique requirements. But at a practical level, it failed pretty miserably. It was horribly slow. With a deadline looming, I had to abandon my monadic parser and go back to using an older serialization format. But not content to give up so easily, I decided to figure out why are my monads so slow? I hope that my investigation may be of help to you if you choose to use similar techniques in your code.

First, I tried running a profiler like cProfile. While such profilers normally work quite well, they don't work so well with lots of first-class functions. And since monadic code is full of first-class functions, the profiler didn't give much valuable information.

So, I had to do optimizing the "old fashioned" way. I wrote a benchmark and ran both of my serializers against it. To start, I had two data points: one in traditional imperitave code and one in fully monadic code, and this is what I got:


imperative: 0.465 seconds
monadic: 3.893 seconds


Monadic code was 8x slower! But why? To narrow it down, I added some more data points by implementing the serializer in gradually more monadic ways. For example, I wrote a serializer in a functional/immutable style that passes around a "stream" rather than a string, and then another that passed around "state" in a very monadic way, but withoug using the monad class that I used in Pysec. Then I got this:


imperative: 0.465 seconds
functional with stream: 1.018 seconds
almost monadic with state: 2.450 seconds
monadic: 3.893 seconds


This gave some answers. One answer it gave was that passing around an immutable "stream" object is twice as expensive as merely passing around a string. Also, passing around a state object on top of that is even more expensive. With these clues, I was able to optimize in certain ways and reduce it to this:


imperative: 0.465 seconds
functional with stream: 1.018 seconds
almost monadic with state: 1.098 seconds
monadic: 1.283 seconds


That's a nice 3x improvement. What things were I able to do? By far the biggest cost that I was able to eliminate was object creation. I flattened two layers of abstraction into one, and thus split the number of objects created in half. The second thing I did was I created a "read until" operation that could be used more efficiently in the "grammar" of the parser. This is a form of pushing performance-critical code down the layers of abstraction. Finally, I didn't used decorators, for some often-called functions.

In the end, it looks like monadic code is about 2-3x slower in Python. Almost all of that is actually the result of just doing things in a functional/immutable way. In particular, creating data structures appears to be the main bottleneck. In functional languagues, these types of operations are very common and optimized heavily, and so are typically very fast. It looks like it's not the same in Python. Just like you have to avoid recursion because there's no tail recursion, it looks like you have to avoid a functional/immutable coding style if you care about performance because object creation is so slow. On the other hand, if you don't mind the peformance hit, it makes the code much more elegant, just like recursion usually does.

One thing of interest is that there is essentially no performance penatly for using monads over using a functional/immutable code style. The 20% penatly seen between "almost monadic" and "monadic" is only because I'm wrapping the monad in a Parser class which allows nice operator overloading.

Here's a summary of what you can do to speed up any functional/immutable-style code, including monadic code when writing in Python:


  • Make object creation as fast as possible. Don't do anything fancy in __init__.

  • Use as few layers of abstraction as possible, especially when there is an object created in each layer.

  • Push common or expensive operations down the layers of abstaction, especially if it avoids creating objects.

  • Avoid using decorators for heavily used functions.

  • Don't use wrapper classes if you don't have to.



As a final thought, I'd like to mention that while there's currently a substantial performance penalty for using immutable data structures, that style is going to become increasingly important as we enter the many-core era. No matter what style of concurrency you like, immutable data is always easier to work with when concurrency is involved. Concurrency and mutable data are just a bad combo. I think that it's going to be very important for language designers to address this when working on the peformance of their languages. I certainly hope future versions Python are much faster with immutable data. If they are, then the peformance penatly of using Monads will almost disappear.

Blog Archive