I've learned. I'll share.

April 28, 2008

Events in Python

I'm a huge fan of the Actor Model . I think that for most applications, it's the best way to do concurrency. As CPUs get more cores, concurrency becomes more important for programmers, and I think the Actor Model will become more important, too.

But, sometimes you need something more light-weight for handling "events". For example, imagine you have some code listening for file changes in a particular directory. What you'd like to do is make an "event" that is "fired" whenever a file change is detected. When fired, there may be a "handler" or "listener" which is notified of the event. That "handler" is ultimately just some code which is executed when the event occurs.

A while ago, I wanted an event system like this for Python. I didn't see anything builtin or any library available, so I decided to write my own. I'd like to share what I created with you.

But first, I want to follow an example through other programming languages to give you an idea of what I was trying to accomplish and how it compares with what's out there. After that, I'll give you my implemenation in Python. Our example will be listening for changes on the file system. We want to keep the "watcher" code decoupled from the rest of the code, so we use events.

The best implementation of events that I've used is in C#, so we'll start there. In C# 1.0, our file watcher would looks something like this:

public delegate void FileChangeHandler(string source_path);

class FileWatcher
{
    public event FileChangeHandler FileChanged;

    public void WatchForChanges()
    {
       ...
       if(FileChanged != null)
       {
          FileChanged(source_path);
       }
       ...
    }
}

class FileChangeLogger
{
    public void OnFileChanged(string source_path)
    {
        Console.WriteLine(String.Format("{0} changed.", source_path));
    }
}

watcher = FileWatcher();
logger  = FileChangeLogger();
watcher.FileChanged += new FileChangeHandler(logger.OnFileChanged);
watcher.WatchForChanges();


It's pretty nice. The best part is at the end, where you can write "watcher.FileChange += ...". But, you need to type the completely useless "new FileChangeHandler", and you also need to wrap it all in a separate FileChangeLogger class. Luckily, in C# 2.0, they added Anonymous Delegates, which makes this much nicer:

watcher.FileChanged += delegate(string source_path)
{
    Console.WriteLine(String.Format("{0} changed.", source_path));
}
And in C# 3.0, they've made it even nicer!
watcher.FileChanged += (source_path => Console.WriteLine(String.Format("{0} changed.", source_path)));
C# 3.0 has an event system that's downright slick, with type-inferencing and everything. I don't think it gets much better than that.

Actually, there is one thing. If there's no handler registered with the event, calling "FileChanged()" will blow up because it sees FileChanged == null until a handler is registered. This means that you have to write "if(Event != null){Event(...):}" every single time you fire the event. Every single time. If you forget, your code will seem to work fine until there's no handler, in which case it will blow up and you'll smack your forhead because you forgot that you have to repeat that line of code every single time you fire an event. I really mean every single time. It's by far the worst wart on an otherwise elgant system. I have no idea why the designers of C# thought this was a good idea. When would you ever want to fire and event and have it blow up in your face?

Anyway, let's try a different programming language, perhaps Java. It has the worst implementation of events I've ever seen. Here's our FileWatcher example:

...

...

Ok, nevermind, I don't have the heart. I can imagine the code full of IListeners, and implementations, and keeping an array of them, and iterating over them, etc, and I just can't do it. I got seriously upset at one useless line in C#. In Java, it's at least 10 times worse. If you really have the stomach for it, go look at http://java.sun.com/docs/books/tutorial/uiswing/events/index.html. To me, it appears that in Java, events are one giant hack around the lack of first-class functions. If Java had first-class functions, none of that nonsense would be necessary.

Now that we've seen a good implementation of events in C# and avoided a bad one in Java, let's make one for Python. I'd rather it be like the C# event system, so let's see what our example would look like:

from Event import Event

class FileWatcher:
    def __init__(self):
        self.fileChanged = Event()

    def watchFiles(self):
        ...
        self.fileChanged(source_path)
        ...

def log_file_change(source_path):
    print "%r changed." % (source_path,)

watcher              = FileWatcher()
watcher.fileChanged += log_file_change

I think that looks pretty good. So what does the implementation of Event look like?


class Event(IEvent):
    def __init__(self):
        self.handlers = set()

    def handle(self, handler):
        self.handler.add(handler)
        return self

    def unhandle(self, handler):
        try:
            self.handlers.remove(handler)
        except:
            raise ValueError("Handler is not handling this event, so cannot unhandle it.")
        return self

    def fire(self, *args, **kargs):
        for handler in self.handlers:
            handler(*args, **kargs)

    def getHandlerCount(self):
        return len(self.handlers)

    __iadd__ = handle
    __isub__ = unhandle
    __call__ = fire
    __len__  = getHandlerCount
Wow. That was pretty short. Actually, this is one of the reasons I love Python. If the language doesn't have a feature, we can probably add it. We just added one of C#'s best features to Python in 26 lines of code. More importantly, we now have a nice, light-weight, easy-to-use event system for Python.

For all of you how like full examples that you can cut and paste, here is one that you can run. Enjoy!


class Event:
    def __init__(self):
        self.handlers = set()

    def handle(self, handler):
        self.handlers.add(handler)
        return self

    def unhandle(self, handler):
        try:
            self.handlers.remove(handler)
        except:
            raise ValueError("Handler is not handling this event, so cannot unhandle it.")
        return self

    def fire(self, *args, **kargs):
        for handler in self.handlers:
            handler(*args, **kargs)

    def getHandlerCount(self):
        return len(self.handlers)

    __iadd__ = handle
    __isub__ = unhandle
    __call__ = fire
    __len__  = getHandlerCount

class MockFileWatcher:
    def __init__(self):
        self.fileChanged = Event()

    def watchFiles(self):
        source_path = "foo"
        self.fileChanged(source_path)

def log_file_change(source_path):
    print "%r changed." % (source_path,)

def log_file_change2(source_path):
    print "%r changed!" % (source_path,)

watcher              = MockFileWatcher()
watcher.fileChanged += log_file_change2
watcher.fileChanged += log_file_change
watcher.fileChanged -= log_file_change2
watcher.watchFiles()

April 1, 2008

You Could Have Invented Monadic Parsing

Even though Pysec (and functional programming in python in general) turned out to be 2-3x slower, monadic parsing is still great for certain tasks. After I wrote about Pysec the first time, I received comments regarding pyparsing, which is a great parsing library that looks very similar. Today, I'd like to show you a very simple example where monadic parsing really shines above all other parsing techniques, including pyparsing. This is a real-world need that I had, not just an academic thought-experiment.

The situation is that I'm writing a peer-to-peer file synchronizer and I need to serialize binary data in an efficient way. I've choosen to use what I call "sized binary", which is basically netstrings without the comma or bencode's byte string. The byte-string "ABC", for example, would be encoded as "3:ABC", or "Hello World!" would be "12:Hello World!". This is very efficient because "ABC" or "Hello World!" can be any bytes \x00-\xFF, without any sort of encoding or decoding like uuencode.

Stop and think for a minute how you would define a grammer to parse "3:ABC". It's so simple, right? Well, try and write a grammar for it. Try using BNF, EBNF, yacc/bison, pyparsing, SimpleParse, or any parsing libarary you know of, in any programming language. If you manage, please write me an email and let me know how you did it, because I'm not so sure it's even possible. Maybe it is in the Perl6 grammars; they're throwing the kitchen sink into that thing :). The best I've seen is that the creator of pyparsing hacked a grammar object to change how it parses the "ABC" part after a different object parsed the "3" part. This basically uses the grammar object as a global variable. It works in a single-core world, but it won't work in the many-core world we're headed for.

But now you're getting bored with me because it's silly ringing my hands over such a simple format. I bet you could parse this format in just 4 lines of code:

def parse_sized_binary(stream):
    size_bytes, stream = stream.readUntil(":")
    size               = int(size_bytes)
    bytes, stream      = stream.read(size)
    return bytes, stream
You're right, it's really simple, at least until you have to embed this little "sized binary language" inside of a bigger language. Perhaps, like me, you need to embed binary data inside of a bigger data structure, and so you need to define a serialization for things like ints, floats, lists, and hash tables. For that stuff, you can define a grammer for a language like JSON or YAML and hand it over to a grammar-based parser library. But now you need a way to tell the library, "OK, when you see my binary data, hand control over to my four lines of code, and then I'll hand control back". You might even say "You know, why don't you just let me hand you any function of the form stream -> (value, stream) and let me control the parsing for a while".

Congratulations, you just invented monadic parsing. The function you wrote, parse_sized_binary, of the form stream -> (bytes, stream), is a Parser Monad. Monadic parsing is just combining lots of these little parsers together. And so, it's incredibly easy to insert arbitrary parsing code, like parse_sized_binary, into any parser.

As a complete example, using Pysec, here's a full parser for a language just like bencode but with support for floats and unicode. Notice that our parse_size_binary is renamed to bencode_sized and is given a more functional definition.

from Pysec import Stub, until, lift, read, many_until, branch

bencode_any     = Stub()
bencode_sized   = until(":") >> lift(int) >> read
bencode_unsized = until("e")
bencode_list    = many_until(bencode_any, "e")
bencode_any.set(branch({"s" : bencode_sized,
                        "u" : bencode_sized >> lift(lambda bytes : bytes.decode("utf8")),
                        "i" : bencode_unsized >> lift(int),
                        "f" : bencode_unsized >> lift(float),
                        "l" : bencode_list,
                        "d" : bencode_list >> lift(dict)}))
I think this is sort of like Greenspun's Tenth Rule. Any implementation of netstrings/sized-binary-data probably has an ad-hoc implementation of monadic parsing in it. Even my own non-monadic version of this parser (which I use for performance reasons) has an ad-hoc implementation of monadic parsing. I'm convinced that if you write a parser for a similar format, you'll implement/invent monadic parsing as well, even if you don't know it.

Blog Archive

Google Analytics