I've learned. I'll share.

March 19, 2008

Why are my monads so slow?

If you've read the last few posts, you know that I choose to use monadic parsing for some application code I am writing. At an abstract level, it turned out beautifully. I was able to write the parser very elgantly and compactly despite the serialization format having some unique requirements. But at a practical level, it failed pretty miserably. It was horribly slow. With a deadline looming, I had to abandon my monadic parser and go back to using an older serialization format. But not content to give up so easily, I decided to figure out why are my monads so slow? I hope that my investigation may be of help to you if you choose to use similar techniques in your code.

First, I tried running a profiler like cProfile. While such profilers normally work quite well, they don't work so well with lots of first-class functions. And since monadic code is full of first-class functions, the profiler didn't give much valuable information.

So, I had to do optimizing the "old fashioned" way. I wrote a benchmark and ran both of my serializers against it. To start, I had two data points: one in traditional imperitave code and one in fully monadic code, and this is what I got:

imperative: 0.465 seconds
monadic:    3.893 seconds
Monadic code was 8x slower! But why? To narrow it down, I added some more data points by implementing the serializer in gradually more monadic ways. For example, I wrote a serializer in a functional/immutable style that passes around a "stream" rather than a string, and then another that passed around "state" in a very monadic way, but withoug using the monad class that I used in Pysec. Then I got this:

imperative:                0.465 seconds
functional with stream:    1.018 seconds
almost monadic with state: 2.450 seconds
monadic:                   3.893 seconds
This gave some answers. One answer it gave was that passing around an immutable "stream" object is twice as expensive as merely passing around a string. Also, passing around a state object on top of that is even more expensive. With these clues, I was able to optimize in certain ways and reduce it to this:

imperative:                0.465 seconds
functional with stream:    1.018 seconds
almost monadic with state: 1.098 seconds
monadic:                   1.283 seconds

That's a nice 3x improvement. What things were I able to do? By far the biggest cost that I was able to eliminate was object creation. I flattened two layers of abstraction into one, and thus split the number of objects created in half. The second thing I did was I created a "read until" operation that could be used more efficiently in the "grammar" of the parser. This is a form of pushing performance-critical code down the layers of abstraction. Finally, I didn't used decorators, for some often-called functions.

In the end, it looks like monadic code is about 2-3x slower in Python. Almost all of that is actually the result of just doing things in a functional/immutable way. In particular, creating data structures appears to be the main bottleneck. In functional languagues, these types of operations are very common and optimized heavily, and so are typically very fast. It looks like it's not the same in Python. Just like you have to avoid recursion because there's no tail recursion, it looks like you have to avoid a functional/immutable coding style if you care about performance because object creation is so slow. On the other hand, if you don't mind the peformance hit, it makes the code much more elegant, just like recursion usually does.

One thing of interest is that there is essentially no performance penatly for using monads over using a functional/immutable code style. The 20% penatly seen between "almost monadic" and "monadic" is only because I'm wrapping the monad in a Parser class which allows nice operator overloading.

Here's a summary of what you can do to speed up any functional/immutable-style code, including monadic code when writing in Python:

  • Make object creation as fast as possible. Don't do anything fancy in __init__.
  • Use as few layers of abstraction as possible, especially when there is an object created in each layer.
  • Push common or expensive operations down the layers of abstaction, especially if it avoids creating objects.
  • Avoid using decorators for heavily used functions.
  • Don't use wrapper classes if you don't have to.

As a final thought, I'd like to mention that while there's currently a substantial performance penalty for using immutable data structures, that style is going to become increasingly important as we enter the many-core era. No matter what style of concurrency you like, immutable data is always easier to work with when concurrency is involved. Concurrency and mutable data are just a bad combo. I think that it's going to be very important for language designers to address this when working on the peformance of their languages. I certainly hope future versions Python are much faster with immutable data. If they are, then the peformance penatly of using Monads will almost disappear.

6 comments:

  1. I'd be curious to know if the JIT optimization performed by Psyco reduces the relative penalty for working with immutable objects.

    Seeing what PyPy does to this code would be even more interesting, but that project is so bleeding edge, I have no idea what other issues you would stumble on.

    ReplyDelete
  2. One thing worth keeping in mind about Python is that function calls are expensive in time, more so than most other languages. So to get the fastest Python code, write in an imperative style and rely heavily on builtins.

    Not good news for someone who wants to write fast functional code in Python. :D

    ReplyDelete
  3. Very interesting. D language version 2.x has true closures, I'd like to see how your code works there.

    ReplyDelete
  4. Python doesn't do any optimization for them at all. Monads are incredibly flexible, but as any decent C.S. major knows, everything comes at a price. Monads give a high level of abstraction and can model a huge variety of computations. There's no reason to expect them to be on par with simpler, to-the-point styles of programming.

    Monads have their place in programming, but that place is Haskell and other pure languages.

    ReplyDelete
  5. AFAIK, decorators only work at function creation time and do not penalize actual calls, except for an extra function call.
    Function calls are relatively expensive in Python, though.

    ReplyDelete
  6. When I was optimizing pyparsing, I found the @profile decorator described on the Python wiki to be very helpful.

    The most primitive monads (such as pyparsing's Literal and Word) are those that benefit most from code optimization.

    Function calls are peformance death in python, but some operations are more expensive then function calls. When optimizing Literal, I found that replacing

    if instring[curLoc:curLoc+literalLength]==literalstring:

    with

    if literalstring.startswith(instring, curloc):

    was quite a healthy win. I also apparently invented a cacheing system for exceptions that is unseen elsewhere in Python. Exceptions are used everywhere to signal when one of several parsing options doesn't match - this means that thousands of Exception instances are created and almost immediately discarded. Each monad ends up raising the same exception (with some minor, er, exceptions), so I create the Exception at monad creation time, and then just keep raising the cached Exception instead of creating a new one.

    ReplyDelete

Blog Archive

Google Analytics