Valued Lessons

I've learned. I'll share.

October 22, 2009

Introducing pyrec: The cure to the bane of __init__

I finally discovered how cool github is, and have started putting some code up there. My first entry is Record.py. I'm calling it the cure to the bane of __init__" because

  1. Mutable data structures are a bane to concurrent (multi-threaded) code.
  2. Writing self.foo = foo, self.bar = bar, etc, is a huge waste of time.
  3. When you have lots of data structures in memory, tuple-based data uses 1/4 the memory of class-based data.
As you can probably tell from my blog, I've experimented with a lot of wacky programming ideas and bending Python in ways it was never intended. But if there's one experiment that's been a success, it's Record.py. I use it for almost all of my classes. It's just so easy to use. I'm calling it "pyrec" because it's easier to write, google, etc.

So, go to the github repo or use it by following the really easy steps:

  1. Download Record.py from http://github.com/pthatcher/pyrec/blob/master/Record.py
  2. put from Record import Record at the top of your code.
  3. make a class by saying something like class Person(Record("name", "age"))
  4. Never write __init__ again (unless you want mutability).
Enjoy!

Update: A commenter (thanks Dan!) pointed out that this is a lot like namedtuple, added in python 2.6. He asked why use this instead of namedtuple. Well, I have to admit that I probably would have never created pyrec if namedtuple existed 3 years ago. I try to avoid NIH syndrome. But it didn't exist, so I wrote pyrec. But I've been using pyrec for three years, so I have some experience on some little things that make a big difference (to me). Here are a few advantages pyrec has over namedtuple:

  1. It has a nicer interface. I prefer new(val1, val2) to _make([val1, val2]), alter to _update, and class Person(Record("name", "age")) to Person = namedtuple("Person", "name, age")
  2. I added the setField methods. That's what I use 90% of the time. Only about 10% of the time do I use alter. setField is a lot more convenient.
  3. With pyrec, you can safely override __iter__ and __getitem__. For example, in Record.py, you'll see the implementation of a LinkedList. I tried doing that with namedtuple, but the overidden __getitem__ clobers the name lookup and __iter__ the tuple unpacking.
  4. You can use tuple.__iter__(rec) to get around the latter, but pyrec's .values is a lot nicer.
  5. pyrec has .namedValues for ordered (field, value) pairs, unlike _asdict() which throws out the order. For many things I use pyrec for, this matters.
  6. You can improve it! Have looked at the code for namedtuple? Ugly. This is pretty clean, so you can improve it very easily if you need additional functionality which will work with all of your records.
If you don't care about those things, use namedtuple. It's still way better than mutable classes. But having used pyrec for three years, these little things matter to me, and so I'm still going to use pyrec. But if you want most of both worlds, I added NamedTuple to pyrec, which is a subclass of namedtuple which adds most of the pyrec goodness (everything but safe __getitem__ overloading). Thanks for update, Dan.

August 26, 2009

SQL is now turing complete!

David Fetter at Oscon 2009 just made a stunning claim: With CTE and Windowing, SQL is Turing Complete. He even offers a proof and an amazing example (although I'm not sure if the example requires turing completeness, it's still amazing).

This could be a big deal. Why? It means that you can do anything in an SQL query. While drawing the mandlebrot set is a nice demo, the first "killer app" will be tree traversal. Lots of relational tables end up looking like trees and are a royal pain to deal with in normal SQL. After that, only our imagination (and performance) is the limit. You can theoretically create any data view that runs in the DB all in one shot, without any of the troubles of stored procedures.

I see a similarity between turing-complete SQL in the DB and JavaScript on the browser. For years, we used JavaScript to do silly stuff, but then a few bright minds created amazing tools like gmail and google maps and our view of JavaScript was forever changed. Now JavaScript is everywhere and there's a race to make the best development framework and engines fastest engines.

Might the same thing happen with turning-complete SQL? is the race on to be the first programming language to compile down to turing-complete SQL?.

Honestly, if I were a betting man, I'd say it won't come to anything significant. But I'll also guess that the lure of compiling down to SQL will eventually capture someone, somewhere. It's only a matter of time. In fact, if you're a programming language geek, the sirens are probably calling to you right now :). When it does happen, I expect the first language to be a functional one, especially one with a small core, like a variant of lisp. Once we get clojure in clojure, maybe it will be a candidate. Or maybe C# will end up with LINQ-to-turning-complete-SQL (SQLINQ?).

What do you think?

August 14, 2009

The Code for Reactive Programming in Python

I packaged some python code that you can run for each of the steps I've shown in my articles about Reactive Programming, how you could have invented it and more ways to do it. I apologize that it's not in a more usable form. Go ahead and copy and paste to wherever you like. If you put it online somewhere more convenient, just put a link in the comments. I put the control of which example runs at the very end. Just comment the appropriate line for the example you want to run. And, here it is:

import time

## simplified event stuff 
class Event:
    def __init__(self):
        self.handlers = []

    def handle(self, handler):
        self.handlers.append(handler)
        return self #so += will work

    def fire(self, val = None):
        for handler in self.handlers:
            handler(val)

def echo(val):
    print val
    return val

def simple_click_event_example():
    click = Event()
    click.handle(echo)
    click.fire("left") #prints "left"

def click_event_manipulation_example():
    def left_double_click_from_click(click, threshold):
        dlclick = Event()
        last_lclick_time = [0] #closure hack

        def click_handler(click_value):
            if click_value == "left":
                lclick_time = time.time()
                if (lclick_time - last_lclick_time[0]) < threshold:
                    dlclick.fire("double left")
                last_lclick_time[0] = lclick_time
        
        click.handle(click_handler)
        return dlclick


    click   = Event()
    dlclick = left_double_click_from_click(click, .01)
    dlclick.handle(echo)

    click.fire("left")
    time.sleep(.02)
    click.fire("left")
    click.fire("right")
    click.fire("left") #prints "double left"
    
class EventFireRecord:
    def __init__(self, value, time):
        self.value = value
        self.time  = time

def click_event_maniuplation_refactored_example():
    def doubleize_event(evt, threshold, combine):
        double_evt = Event()

        last_fire  = EventFireRecord(None, 0)
        def evt_handler(value):
            fire_time = time.time()
            if (fire_time - last_fire.time) < threshold:
                double_evt.fire(combine(last_fire.value, value))
            last_fire.__init__(value, fire_time)
        
        evt.handle(evt_handler)
        return double_evt

    def filter_event(evt, predicate):
        filtered_evt = Event()

        def evt_handler(value):
            if predicate(value):
                filtered_evt.fire(value)

        evt.handle(evt_handler)
        return filtered_evt

    click  = Event()
    lclick = filter_event(click, lambda value : value == "left")
    dlclick = doubleize_event(lclick, .01, lambda click1, click2 : "double left")
    dlclick.handle(echo)

    click.fire("left")
    time.sleep(.02)
    click.fire("left")
    click.fire("right")
    click.fire("left") #prints "double click"


def click_event_handle_with_result_example():
    def handle_with_result(evt, handler_with_result):
        evt_out = Event()

        def handler(value):
            result = handler_with_result(value)
            if result is not None:
                evt_out.fire(result)

        evt.handle(handler)
        return evt_out

    def doubleize_r(evt, threshold):
        last_fire = EventFireRecord(None, 0)

        def handler(value):
            fire_time = time.time()
            try:
                if (fire_time - last_fire.time) < threshold:
                    return (last_fire.value, value)
            finally:
                last_fire.__init__(value, fire_time)

        return handle_with_result(evt, handler)

    def filter_r(evt, predicate):
        def handler(value):
            if predicate(value):
                return value

        return handle_with_result(evt, handler)


    clicks   = Event()
    dlclicks = doubleize_r(filter_r(click, lambda value : value == "left"), .01)
    dlclicks.handle(echo)

    clicks.fire("left")
    time.sleep(.02)
    clicks.fire("left")
    clicks.fire("right")
    clicks.fire("left") #prints ("left", "left")


def click_event_choosing_by_returning_event_example():
    def handle_with_result(evt, handler_with_result):
        evt_out = Event()

        def handler(value):
            result = handler_with_result(value)
            if result is None:
                pass #ignore
            elif isinstance(result, Event):
                result.handle(evt_out.fire)
            elif isinstance(result, list):
                for value_out in result:
                    evt_out.fire(value_out)
            else:
                evt_out.fire(result)
                    

        evt.handle(handler)
        return evt_out

    def filter_r(evt, predicate):
        def handler(value):
            if predicate(value):
                return value

        return handle_with_result(evt, handler)

    def value_filter_r(evt, value):
        return filter_r(evt, lambda val : val == value)

    def click_choose_r(keys, clicks):
        def key_handler(char):
            #TODO: unsubscribe from event after either "l" or "r"
            if char == "l":
                return value_filter_r(clicks, "left")
            elif char == "r":
                return value_filter_r(clicks, "right")
            elif char == "f":
                return ["fake", "fake"]

        return handle_with_result(keys, key_handler)

    keys   = Event()
    clicks = Event()
    choosen_clicks = click_choose_r(keys, clicks)

def click_event_looks_like_streams_example():
    class Event:
        def __init__(self):
            self.handlers = []

        def handle(self, handler):
            self.handlers.append(handler)
            return self #so += will work

        def fire(self, val = None):
            for handler in self.handlers:
                handler(val)

        def bind(evt, handler_with_result):
            evt_out = Event()

            def handler(value):
                result = handler_with_result(value)
                if result is not None:
                    Event.unit(result).handle(evt_out.fire)

            evt.handle(handler)
            return evt_out

        @classmethod
        def unit(cls, val):
            if isinstance(val, cls):
                return val
            elif isinstance(val, list):
                return MockEvent(*val)
            else:
                return MockEvent(val)

        __rshift__ = bind

    class MockEvent:
        def __init__(self, *vals):
            self.vals = vals

        def handle(self, handler):
            for val in self.vals:
                handler(val)  

    def doublize_r(threshold, combine):
        last_fire = EventFireRecord(None, 0)

        def handler(value):
            fire_time = time.time()
            try:
                if (fire_time - last_fire.time) < threshold:
                    return combine(last_fire.value, value)
            finally:
                last_fire.__init__(value, fire_time)

        return handler

    def filter_r(predicate):
        def handler(value):
            if predicate(value):
                return value

        return handler

    def value_filter_r(value):
        return filter_r(lambda val : val == value)

    def click_choose_r(**clicks_by_char):
        def key_handler(char):
            return clicks_by_char.get(char)

        return key_handler


    clicks   = Event()
    keys     = Event()
    dlclicks = clicks >> value_filter_r("left") >> doublize_r(.01, lambda l1, l2: "double left")
    keys >> click_choose_r(d = dlclicks, f = ["fake", "fake"]) >> echo

    clicks.fire("left")
    clicks.fire("left")
    keys.fire("f") #prints "fake" and then "fake" again
    keys.fire("d")
    clicks.fire("right")
    clicks.fire("right")
    time.sleep(.02)
    clicks.fire("left")
    clicks.fire("left") #print ("double left")


## basic consumer (receiver) using generators

receive = object()

def receiver_example():
    def receiver(gen_rcvr):
        def gen_and_start_rcvr(*args, **kargs):
            rcvr = gen_rcvr(*args, **kargs)
            rcvr.send(None)
            return rcvr
        return gen_and_start_rcvr

    @receiver
    def sum_r(title):
        total = 0
        while True:
            total += yield receive
            print "%s: %d" % (title, total)

    @receiver
    def count_r(title):
        count = 0
        while True:
            yield receive
            count += 1
            print "%s: %d" % (title, count)

    num_key      = Event()
    sum_nums     = sum_r("total nums")
    num_key.handle(sum_nums.send)

    num_key.fire(1) #prints "total nums: 1"
    num_key.fire(2) #prints "total nums: 3" 
    num_key.fire(3) #prints "total nums: 6"

## make retiterators that can also output values via an event fire
def remitter_example():
    class Remitter:
        def __init__(self, receiver_from_event_out):
            self.receiverFromEventOut = receiver_from_event_out

        def __rrshift__(self, event_in):
            event_out = Event()
            rcvr      = self.receiverFromEventOut(event_out)
            event_in.handle(rcvr.send)
            return event_out

    def remitter(gen_rcvr):
        def gen_remitter(*args, **kargs):
            def receiver_from_event_out(event_out):
                rcvr = gen_rcvr(event_out, *args, **kargs)
                rcvr.send(None)
                return rcvr
            return Remitter(receiver_from_event_out)
        return gen_remitter

    @remitter
    def double_detect_r(double_click_event, threshold):
        last_click_time = 0
        while True:
            yield receive
            current_click_time = time.time()
            if (current_click_time - last_click_time) < threshold:
                double_click_event.fire()
            last_click_time = current_click_time

    @remitter
    def print_r(_, message):
        while True:
           val = yield receive
           print message

    mouse_click  = Event()

    mouse_click >> print_r("left")
    mouse_click >> double_detect_r(.01) >> print_r("double left")

    mouse_click.fire() #prints "left"
    time.sleep(.02)
    mouse_click.fire() #prints "left"
    mouse_click.fire() #prints "left" and "double left"

## make retiterators out of generators that can send and receive

def remitter_example_yield_out():
    from collections import defaultdict

    class Remitter:
      def __init__(self, ritr):
          self.ritr     = ritr
          self.eventOut = Event()

      def send(self, val_in):
          ritr      = self.ritr
          event_out = self.eventOut

          while True:
              val_out = ritr.send(val_in)
              if val_out is receive:
                  break
              else:
                  event_out.fire(val_out)            

      def handle(self, handler):
          self.eventOut.handle(handler)

      def handlein(self, *events):
          for event in events:
              event.handle(self.send)

      def __rrshift__(self, event_in):
          try:
              self.handlein(*event_in)
          except:
              self.handlein(event_in)
          return self


    def remitter(gen_rcvr):
        def gen_remitter(*args, **kargs):
            ritr = gen_rcvr(*args, **kargs)
            ritr.send(None)
            return Remitter(ritr)
        return gen_remitter


    @remitter
    def double_detect_r(threshold):
        last_click_time = 0
        while True:
            yield receive
            current_click_time = time.time()
            if (current_click_time - last_click_time) < threshold:
                yield
            last_click_time = current_click_time

    @remitter
    def map_r(f, *args, **kargs):
        while True:
           val = yield receive
           yield f(val, *args, **kargs)

    @remitter
    def print_r(format):
        while True:
           val = yield receive
           print message % val

    def label_r(label):
        return map_r(lambda val : (label, val))

    @remitter
    def label_count_r():
        count_by_label = defaultdict(int)
        while True:
            (label, val) = yield receive
            count_by_label[label] += 1
            yield count_by_label.copy()

    def fix_click_counts(count_by_label, single_label, double_label):
        count_by_label[single_label] -= (count_by_label[double_label] * 2) #every double left "cancels" a single click
        return count_by_label.copy()

    def print_label_counts(count_by_label, *labels):
        print ", ".join("%d %s" % (count, label) for (label, count) in count_by_label.iteritems())


    mouse_clicks = Event()

    ([mouse_clicks >> label_r("single"),
      mouse_clicks >> double_detect_r(.01) >> label_r("double")] 
     >> label_count_r() >> map_r(fix_click_counts, "single", "double") >> map_r(print_label_counts))

    #prints
    #0 double, 1 single
    #0 double, 2 single
    #0 double, 3 single
    #1 double, 1 single
    mouse_clicks.fire() 
    time.sleep(.02)
    mouse_clicks.fire() 
    mouse_clicks.fire()


def remitter_without_yield_in_hack_example():
    class Receive:
        def __init__(self, val = None):
            self.d = val

    class Remitter:
      def __init__(self, receive, ritr):
          self.receive  = receive
          self.ritr     = ritr
          self.eventOut = Event()

      def send(self, val_in):
          self.receive.d = val_in

          ritr      = self.ritr
          event_out = self.eventOut
          while True:
              val_out = ritr.next()
              if isinstance(val_out, Receive):
                  break
              else:
                  event_out.fire(val_out)

      def handle(self, handler):
          self.eventOut.handle(handler)

      def handlein(self, *events):
          for event in events:
              event.handle(self.send)

      def __rrshift__(self, event_in):
          try:
              self.handlein(*event_in)
          except:
              self.handlein(event_in)
          return self

    def remitter(gen_rcvr):
        def gen_remitter(*args, **kargs):
            receive = Receive()
            ritr = gen_rcvr(receive, *args, **kargs)
            ritr.send(None)
            return Remitter(receive, ritr)
        return gen_remitter

    @remitter
    def double_detect_r(receive, threshold):
        last_click_time = 0
        while True:
            yield receive
            current_click_time = time.time()
            gap                = current_click_time - last_click_time
            if gap < threshold:
                yield gap
            last_click_time = current_click_time

    @remitter
    def average_r(receive):
        total   = 0.0
        count   = 0
        while True:
            yield receive
            total += receive.d
            count += 1
            yield total/count

    @remitter
    def print_r(receive, format):
        while True:
            yield receive
            print format % (receive.d)

    mouse_clicks = Event()
    mouse_clicks >> double_detect_r(.05) >> average_r() >> print_r("double left; average gap is %s seconds")

    mouse_clicks.fire() 
    time.sleep(.1)
    mouse_clicks.fire() 
    time.sleep(.01)
    mouse_clicks.fire() #prints #double left; average gap is 0.01... seconds
    time.sleep(.02) 
    mouse_clicks.fire() #double left; average gap is 0.015... seconds

if __name__ == "__main__":
    #simple_click_event_example()
    #click_event_manipulation_example()
    #click_event_maniuplation_refactored_example()
    #click_event_handle_with_result_example()
    #click_event_choosing_by_returning_event_example()
    #click_event_looks_like_streams_example()
    #remitter_example()
    #remitter_example_yield_out()
    remitter_without_yield_in_hack_example()

August 12, 2009

More ways to do Reactive Programming in Python

In my last post, I covered a little bit of Rx and how you could a have invented it. But you might invent a different way of doing the same thing. And since most languages don't have anything like LINQ, you might be interested in ways to do things in your programming language that don't require monads.

Let's explore some other ways to do Reactive Programming (Rx).

What is Rx?

Just to remind you what we're trying to accomplish, Rx builds event handlers. The LINQ version of Rx works by making an event look like a query (or stream).

It makes sense if you think about it. An event is a stream, of event occurences. A list or enumerator or iterator is also a stream, of values. So if you squint hard enough, you see that events and enumerators are sort of the same thing. In fact, lists, streams, enumerators, events, channels, pipes, sockets, file handles, actors sending you messages are all pretty much the same thing: they are all producers of values.

Consumers and Receivers

Now what do you do with producers of values? You consume them, of course! Usually with something that looks like this (in python):
sum = 0
for val in vals:
   sum += val
   print sum
What we've created here is a consumer of vals. We can write it this way, as ordinary code, because vals is very flexible: it's anything that's iterable/enumerable. But what if instead of forcing the producer to be flexible, we forced the consumer to be flexible? What if we could write the consumer like this:
total = 0
while True:
  total += receive
  print total
Hmm... it sort of looks like the opposite of an iterator/generator/enumerator block. A mathematician might say something about "duals" at this point, but I'm not mathematician, so let's just go ahead and try and implement it. In fact, we'll use python generators to do just that. We'll call this a "receiver" and we'll spell "receive" as "yield receive":
class Event:
    def __init__(self):
        self.handlers = []

    def handle(self, handler):
        self.handlers.append(handler)

    def fire(self, val = None):
        for handler in self.handlers:
            handler(val)

receive = object()

def receiver(gen_rcvr):
    def gen_and_start_rcvr(*args, **kargs):
        rcvr = gen_rcvr(*args, **kargs)
        rcvr.send(None)
        return rcvr
    return gen_and_start_rcvr

@receiver
def sum_r(title):
    total = 0
    while True:
        total += yield receive
        print "%s: %d" % (title, total)

@receiver
def count_r(title):
    count = 0
    while True:
        yield receive
        count += 1
        print "%s: %d" % (title, count)


num_key  = Event()
sum_nums = sum_r("total nums")
num_key.handle(sum_nums.send)

num_key.fire(1) #prints "total nums: 1"
num_key.fire(2) #prints "total nums: 3" 
num_key.fire(3) #prints "total nums: 6"
It actually works! And because our consumer is very flexible, any producer, like an event, can use it. In fact, it's just a fancy event callback, just like everyrthing else in Rx land.

Remitters

But if we take this one step further and make a receiver wrap an event, we can make a receiver that's also a producer. We'll call it a "remitter", which is sort of like a receiver and an emitter.
class Remitter:
    def __init__(self, receiver_from_event_out):
        self.receiverFromEventOut = receiver_from_event_out

    def __rrshift__(self, event_in):
        event_out = Event()
        rcvr      = self.receiverFromEventOut(event_out)
        event_in.handle(rcvr.send)
        return event_out

def remitter(gen_rcvr):
    def gen_remitter(*args, **kargs):
        def receiver_from_event_out(event_out):
            rcvr = gen_rcvr(event_out, *args, **kargs)
            rcvr.send(None)
            return rcvr
        return Remitter(receiver_from_event_out)
    return gen_remitter

@remitter
def double_detect_r(double_click_event, threshold):
    last_click_time = 0
    while True:
        (yield)
        current_click_time = time.time()
        if (current_click_time - last_click_time) < threshold:
            double_click_event.fire()
        last_click_time = current_click_time

@remitter
def print_r(_, message):
    while True:
       val = (yield)
       print message

mouse_click  = Event()

mouse_click >> print_r("left")
mouse_click >> double_detect_r(.01) >> print_r("double left")

mouse_click.fire() #prints "left"
time.sleep(.02)
mouse_click.fire() #prints "left"
mouse_click.fire() #prints "left" and "double left"
Great. But it is kind of annoying passing in the event like that. What if we had the remitter yield values out and yield values in?

Remitters that yield out and in

We could do that using little state machines built from python generators. "yield receive" will mean receive and "yield" of anything else will mean "emit".
from collections import defaultdict

class Remitter:
  def __init__(self, ritr):
      self.ritr     = ritr
      self.eventOut = Event()

  def send(self, val_in):
      ritr      = self.ritr
      event_out = self.eventOut

      while True:
          val_out = ritr.send(val_in)
          if val_out is receive:
              break
          else:
              event_out.fire(val_out)            

  def handle(self, handler):
      self.eventOut.handle(handler)

  def handlein(self, *events):
      for event in events:
          event.handle(self.send)

  def __rrshift__(self, event_in):
      try:
          self.handlein(*event_in)
      except:
          self.handlein(event_in)
      return self


def remitter(gen_rcvr):
    def gen_remitter(*args, **kargs):
        ritr = gen_rcvr(*args, **kargs)
        ritr.send(None)
        return Remitter(ritr)
    return gen_remitter


@remitter
def double_detect_r(threshold):
    last_click_time = 0
    while True:
        yield receive
        current_click_time = time.time()
        if (current_click_time - last_click_time) < threshold:
            yield
        last_click_time = current_click_time

@remitter
def map_r(f, *args, **kargs):
    while True:
       val = yield receive
       yield f(val, *args, **kargs)

@remitter
def print_r(format):
    while True:
       val = yield receive
       print message % val

def label_r(label):
    return map_r(lambda val : (label, val))
        
@remitter
def label_count_r():
    count_by_label = defaultdict(int)
    while True:
        (label, val) = yield receive
        count_by_label[label] += 1
        yield count_by_label.copy()

def fix_click_counts(count_by_label, single_label, double_label):
    count_by_label[single_label] -= (count_by_label[double_label] * 2) #every double click "cancels" a single click
    return count_by_label.copy()

def print_label_counts(count_by_label, *labels):
    print ", ".join("%d %s" % (count, label) for (label, count) in count_by_label.iteritems())


mouse_clicks = Event()

([mouse_clicks >> label_r("single"),
  mouse_clicks >> double_detect_r(.01) >> label_r("double")] 
 >> label_count_r() >> map_r(fix_click_counts, "single", "double") >> map_r(print_label_counts))
 
#prints
#0 double, 1 single
#0 double, 2 single
#0 double, 3 single
#1 double, 1 single
mouse_clicks.fire() 
time.sleep(.02)
mouse_clicks.fire() 
mouse_clicks.fire()
Sweet. That looks pretty nice. But, it relies on the fact that Python allows you to yield values in to a generator. What if we have a programming language that only allows yielding values out (like any enumerator)?

Remitters that yield in by yielding out

We'll introduce a simple hack to work around that. We'll yield out a mutable "receive" value that "receives" in the value for us.
class Receive:
    def __init__(self, val = None):
        self.d = val

class Remitter:
  def __init__(self, receive, ritr):
      self.receive  = receive
      self.ritr     = ritr
      self.eventOut = Event()

  def send(self, val_in):
      self.receive.d = val_in

      ritr      = self.ritr
      event_out = self.eventOut
      while True:
          val_out = ritr.next()
          if isinstance(val_out, Receive):
              break
          else:
              event_out.fire(val_out)

  def handle(self, handler):
      self.eventOut.handle(handler)

  def handlein(self, *events):
      for event in events:
          event.handle(self.send)

  def __rrshift__(self, event_in):
      try:
          self.handlein(*event_in)
      except:
          self.handlein(event_in)
      return self

def remitter(gen_rcvr):
    def gen_remitter(*args, **kargs):
        receive = Receive()
        ritr = gen_rcvr(receive, *args, **kargs)
        ritr.send(None)
        return Remitter(receive, ritr)
    return gen_remitter

@remitter
def double_detect_r(receive, threshold):
    last_click_time = 0
    while True:
        yield receive
        current_click_time = time.time()
        gap                = current_click_time - last_click_time
        if gap < threshold:
            yield gap
        last_click_time = current_click_time

@remitter
def average_r(receive):
    total   = 0.0
    count   = 0
    while True:
        yield receive
        total += receive.d
        count += 1
        yield total/count

@remitter
def print_r(receive, format):
    while True:
        yield receive
        print format % (receive.d)
    
mouse_clicks = Event()
mouse_clicks >> double_detect_r(.05) >> average_r() >> print_r("double click; average gap is %s seconds")
        
mouse_clicks.fire() 
time.sleep(.1)
mouse_clicks.fire() 
time.sleep(.01)
mouse_clicks.fire() #prints #double click; average gap is 0.01... seconds
time.sleep(.02) 
mouse_clicks.fire() #double click; average gap is 0.015... seconds
It works! And it should work in any language with iterator blocks. You could even use this C# instead of using LINQ Rx, but then you'll have to type "yield return receive" :(.

Conclusion

Rx is all about making flexible consumers of values, which basically amounts to making event callbacks. LINQ Rx does so with monads, but that's not the only way. Here, we have shown how we can turn a generator or iterator block inside out and make it consume values rather than produce values. Using these is an alternative to LINQ Rx that might be more appropriate for your programming language. There are lots of other things to work out, like unhandling an event, error handling, catching the end of a stream, etc. But this is pretty good, simple foundation to show that the essense of reactive programming is making it easy to make flexible value consumers (basically event handlers). In both the case of Rx, and the code above, we've done so by making a little DSL in the host language.

Next time...

There are still other ways of making flexible consumers. If we had continuations or used CPS we could just use the current continuation as our consumer. So, scheme and Ruby ought to have easy solutions to this problem. We can do a similar thing with macros in any Lisp language that doesn't have continuations, like Clojure. In fact, I'd like to explore how to do Rx in clojure next time. And at some point, we need to figure out how concurrency fits into all of this.

P.S.

While I was researching all of this stuff, I was surprised to find that my friend, Wes Dyer, is right at the heart of it. You can see a video of him here. That was a surprise because I've never talked with him about this. In fact, I've only heard from him once in the last year because he's "gone dark" . I'm sure his work on Rx has something to do with that :). I just want to make it clear that all of my thoughts are mine alone, and not based on any communication with him. Don't blame him for my crazy stuff :).

Rx Simplified (Reactive Programming in Python)

Lately, there's been interest in "reactive programming", especially with Rx. What is Rx? I've seen descriptions like "reactive programming with LINQ", "the dual of the enumerator/iterator" and even "a variation of the continuation monad". Oh right...uh...monad? dual? what's going on?

If you like things like "monads", "duals", and category theory, go watch this video, especially until the end. It's pretty funny.

But if those things make your eyes glaze over and you're left wondering what Rx really is, I want to give you a simple explanation of what Rx is all about. In fact, I'll show how you could have invented it yourself. We'll do so with simple event-based code written in Python.

Step 1: write simple event handlers

Imagine we have a mouse click event that fires either "left" or "right", and we want to make a new event that fires "double left" when there's a double left click. We might write something like this (including a simple Event class).

import time

class Event:
    def __init__(self):
        self.handlers = []

    def handle(self, handler):
        self.handlers.append(handler)

    def fire(self, val = None):
        for handler in self.handlers:
            handler(val)

def echo(val):
    print val
    return val

def left_double_click_from_click(click, threshold):
    dlclick = Event()
    last_lclick_time = [0] #closure hack

    def click_handler(click_value):
        if click_value == "left":
            lclick_time = time.time()
            if (lclick_time - last_lclick_time[0]) < threshold:
                dlclick.fire("double left")
            last_lclick_time[0] = lclick_time

    click.handle(click_handler)
    return dlclick


click   = Event()
dlclick = left_double_click_from_click(click, .01)
dlclick.handle(echo)

click.fire("left")
time.sleep(.02)
click.fire("left")
click.fire("right")
click.fire("left") #prints "double click"

Step 2: refactor event handlers

It works and it's pretty simple. But, we could refactor quite a bit. If we do so, we might write something like this (notice I like the suffix "_r" for "reactive"):

class EventFireRecord:
    def __init__(self, value, time):
        self.value = value
        self.time  = time

def doubleize_r(evt, threshold, combine):
    double_evt = Event()

    last_fire  = EventFireRecord(None, 0)
    def evt_handler(value):
        fire_time = time.time()
        if (fire_time - last_fire.time) < threshold:
            double_evt.fire(combine(last_fire.value, value))
        last_fire.__init__(value, fire_time)

    evt.handle(evt_handler)
    return double_evt

def filter_r(evt, predicate):
    filtered_evt = Event()

    def evt_handler(value):
        if predicate(value):
            filtered_evt.fire(value)

    evt.handle(evt_handler)
    return filtered_evt

click   = Event()
dlclick = doubleize_r(filter_r(click, lambda value : value == "left"), .01, lambda l1, l2: "double left")
dlclick.handle(echo)

click.fire("left")
time.sleep(.02)
click.fire("left")
click.fire("right")
click.fire("left") #prints "double left"

That looks better and is more generic. The logic of "double click" is now contained all on one line. But, we could do even better. Notice that we repeat ourselves a little with filter_r and doublize_r. The pattern looks like this:
evt_out = Event()

def handler(value):
   ...
   evt_out.fire(value)
   ...

evt_in.handle(handler)
return evt_out
What if we refactor to pull out that common pattern by making a special handler that returns a value and a special "handle_with_result" that looks like this pattern?
def handler(value):
    ...
    return value

evt_out = handle_with_result(evt_in, handler)

Step 3: make a higher-level "handle" function

If we do that, our code might look like this:
def handle_with_result(evt, handler_with_result):
    evt_out = Event()

    def handler(value):
        result = handler_with_result(value)
        if result is not None:
            evt_out.fire(result)

    evt.handle(handler)
    return evt_out

def doubleize_event(evt, threshold, combine):
    last_fire = EventFireRecord(None, 0)

    def handler(value):
        fire_time = time.time()
        try:
            if (fire_time - last_fire.time) < threshold:
                return combine(last_fire.value, value)
        finally:
            last_fire.__init__(value, fire_time)

    return handle_with_result(evt, handler)

def filter_event(evt, predicate):
    def handler(value):
        if predicate(value):
            return value

    return handle_with_result(evt, handler)


click  = Event()
dlclick = doubleize_event(filter_event(click, lambda value : value == "left"), .01, lambda l1, l2 : "double left")
dlclick.handle(echo)

click.fire("left")
time.sleep(.02)
click.fire("left")
click.fire("right")
click.fire("left") #prints "double left"

It works, and our code looks better than ever. handle_with_result is very useful.

But, we are now missing something: what if we want to return multiple values? Or do something more interesting, like listen to an keyboard event and return left-clicks if the user clicks "l" and right clicks if they type "r" and two "fake" clicks if they type "f". We'd like to write something like this:

def choose_clicks(keys, clicks):
    def key_handler(char):
        if char == "l":
            return filter_event("left", clicks)
        elif char == "r":
            return filter_event("right", clicks)
        elif char == "f":
            return ["fake", "fake"]

    retrn handle_with_result(keys, key_handler)
If we change handle_with_result to handle this, we might get something like this:

def handle_with_result(evt, handler_with_result):
    evt_out = Event()

    def handler(value):
        result = handler_with_result(value)
        if result is None:
            pass #ignore
        elif isinstance(result, Event):
            result.handle(evt_out.fire)
        elif isinstance(result, list):
            for value_out in result:
                evt_out.fire(value_out)
        else:
            evt_out.fire(result)


    evt.handle(handler)
    return evt_out

def filter_r(evt, predicate):
    def handler(value):
        if predicate(value):
            return value

    return handle_with_result(evt, handler)

def value_filter_r(evt, value):
    return filter_r(evt, lambda val : val == value)

def choose_clicks(keys, clicks):
    def key_handler(char):
        #TODO: unsubscribe from event after either "l" or "r"
        if char == "l":
            return value_filter_r(clicks, "left")
        elif char == "r":
            return value_filter_r(clicks, "right")
        elif char == "f":
            return ["fake", "fake"]

    return handle_with_result(keys, key_handler)

keys   = Event()
clicks = Event()
choosen_clicks = choose_clicks(keys, clicks)

choosen_clicks.handle(echo)
clicks.fire("left")
keys.fire("a")
clicks.fire("right")
keys.fire("l")
clicks.fire("left") # print "left"
clicks.fire("right")
clicks.fire("left") # print "left"
keys.fire("f") #prints "fake" and then "fake" again

Great. Now if we just add a little bit of syntax sugar to this, we can make events look like streams:

Step 4: add some syntax sugar

class Event:
    def __init__(self):
        self.handlers = []

    def handle(self, handler):
        self.handlers.append(handler)
        return self #so += will work

    def fire(self, val = None):
        for handler in self.handlers:
            handler(val)

    def bind(evt, handler_with_result):
        evt_out = Event()

        def handler(value):
            result = handler_with_result(value)
            if result is not None:
                Event.unit(result).handle(evt_out.fire)

        evt.handle(handler)
        return evt_out

    @classmethod
    def unit(cls, val):
        if isinstance(val, cls):
            return val
        elif isinstance(val, list):
            return MockEvent(*val)
        else:
            return MockEvent(val)

    __rshift__ = bind

class MockEvent:
    def __init__(self, *vals):
        self.vals = vals

    def handle(self, handler):
        for val in self.vals:
            handler(val)  

def doublize_r(threshold, combine):
    last_fire = EventFireRecord(None, 0)

    def handler(value):
        fire_time = time.time()
        try:
            if (fire_time - last_fire.time) < threshold:
                return combine(last_fire.value, value)
        finally:
            last_fire.__init__(value, fire_time)

    return handler

def filter_r(predicate):
    def handler(value):
        if predicate(value):
            return value

    return handler

def value_filter_r(value):
    return filter_r(lambda val : val == value)

def click_choose_r(**clicks_by_char):
    def key_handler(char):
        return clicks_by_char.get(char)

    return key_handler


clicks   = Event()
keys     = Event()
dlclicks = clicks >> value_filter_r("left") >> doublize_r(.01, lambda l1, l2: "double click")
keys >> click_choose_r(d = dlclicks, f = ["fake", "fake"]) >> echo

clicks.fire("left")
clicks.fire("left")
keys.fire("f") #prints "fake" and then "fake" again
keys.fire("d")
clicks.fire("right")
clicks.fire("right")
time.sleep(.02)
clicks.fire("left")
clicks.fire("left") #print ("left", "left")

So what have we made?

Wonderful. We've made events look like streams by making a slick way of creating event handlers. In fact, if you look closely at what I did in that last step, you'll notice that I renamed "handle_with_result" to "bind" and moved some code into a method called "unit". That's all it takes to turn Event into a monad, which is exactly what Rx does. Congratulations, we just reinvented monads and Rx, just by refactoring our event handler code and in the process we've discovered what Rx really is: a fancy way of writing event handlers, specifically event handlers that fire events that trigger other event handlers that fire events, and so on in big chain that looks like a query. So when your eyes glaze over about "duals" and "monads" and "reactive programming", just say to yourself: I'm making a fancy event handler. Because in the end, that's all you're really doing.

In fact, if you want to do so in Python, now you have a basic implementation to start with! Of course, this is just a toy implementation. It lack error handling, unsubscribing, end-of-stream, and concurrency. But it ain't bad for just 50 lines of code. And it lets you see the essence of Rx fairly easily.

Oh, and what's the big deal with monads, you ask? Nothing much. It's just that if you provide "bind" and "unit" (called "select many" and "select" in LINQ, I think), LINQ gives you some nice syntax sugar that makes your event handler look like a query. It's really pretty slick, especially now that they've added "extension methods".

And next time...

In future posts, I'll explore different ways of making slick event handlers, but without monads. And hopefully we'll get make this handle concurrency, which is what asynchronous programming is all about. In fact, I expect we'll start to see a serious blurring of lines between Rx, message passing, and data flow programming.

For now, when you start working with Rx, just remember: I'm making a big, fancy event handler. An "Observable" is just like and event and an "Observer" is just like an event handler.

P.S.

What's with the lame names? They started off with cool names like "Reactive" and "Rx" and then give us Observable, Observer, Subscribe, OnNext, OnDone, and OnError. Yuck. Think what an opportunity they missed! We could have had names like Emitter, Reactor, Chain, Emit, Extinguish, and Explode. Judge for yourself:
observable.Subscribe(observer)
observer.OnNext(val)
observer.OnDone()
observer.OnError(e)
or
emitter.chain(reactor)
reactor.emit("foo")
reactor.extinguish()
reactor.explode(e)

January 14, 2009

Popcount in Python (with benchmarks)

Purpose

A slightly uncommon bit-twiddling operation is "popcount", which counts the number high bits. While uncommon, it's crucial for implementing Array Mapped Tries, which is a way to implement immutable maps and vectors. I'm trying to implement immutable maps and vectors, so I needed to implement popcount.

I found A TON of ways, but had no idea which was fastest. So, I implemented them all and tested them. The short answer is to do this:

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])
And here's the long answer:

Results

I ran popcount on 30,000 random ints between 0 and 231-1. Here are the results from my Linux Desktop with a 3.2Ghz Core 2 Quad:

Name TimeMax BitsURL
c_bagwell 0.003032http://lamp.epfl.ch/papers/idealhashtrees.pdf
c_bsdmacro 0.003032http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
c_parallel 0.003132http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
c_ops64 0.003632http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64
table16 0.009832http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
c_table8 0.011032http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
table+kern 0.0132 -http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
table8 0.016332http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
bsdmacro 0.019032http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
parallel 0.019932http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
ops64 0.020732http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64
bagwell 0.024232http://lamp.epfl.ch/papers/idealhashtrees.pdf
freebsd 0.025732http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
timpeters3 0.027732http://mail.python.org/pipermail/python-list/1999-July/007696.html
timpeters2 0.058032http://mail.python.org/pipermail/python-list/1999-July/007696.html
timpeters 0.0724 ?http://mail.python.org/pipermail/python-list/1999-July/007696.html
kernighan 0.0745 -http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
elklund 0.0889 -http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
naive 0.1519 -http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
seistrup 0.2447 ?http://mail.python.org/pipermail/python-list/1999-July/007696.html

And here are the results for my MacBook with a 2.0Ghz Core 2 Duo:

Name TimeMax BitsURL
table16 0.027932http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
table+kern 0.0372 -http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
table8 0.041732http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
bsdmacro 0.048132http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
bagwell 0.059632http://lamp.epfl.ch/papers/idealhashtrees.pdf
timpeters3 0.074432http://mail.python.org/pipermail/python-list/1999-July/007696.html
parallel 0.080732http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
ops64 0.129032http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64
timpeters2 0.143932http://mail.python.org/pipermail/python-list/1999-July/007696.html
kernighan 0.1527 -http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
timpeters 0.1668 ?http://mail.python.org/pipermail/python-list/1999-July/007696.html
freebsd 0.1772 -http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
elklund 0.1913 -http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
naive 0.2889 -http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
seistrup 0.6106 ?http://mail.python.org/pipermail/python-list/1999-July/007696.html

Observations

  1. Implementations written in C (using Pyrex) are 5-6 times faster than Python.
  2. My 3.2 Ghz Linux Desktop is 200% faster than my 2.0 Ghz MacBook even though it only has a 60% clockspeed advantage. I'm not sure what to make of that. Python doesn't use multiple cores well, so I'm sure it's not that.
  3. If you want to run popcount on 32-bit values in pure Python, table16 is the fastest by far, and only 3 times slower than implementations in C.
  4. If you need to run popcount on arbitrarily large integers, kernighan is the best, but doing a hybrid of table16 and kernighan is probably better.

Conclusion

If you don't mind using about 64KB of memory, here's is the fastest popcount in pure python:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])
If you do mind the memory usage, here's a slighly slower version:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
    POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]

def popcount32_table8(v):
    return (POPCOUNT_TABLE8[ v        & 0xff] +
            POPCOUNT_TABLE8[(v >> 8)  & 0xff] +
            POPCOUNT_TABLE8[(v >> 16) & 0xff] +
            POPCOUNT_TABLE8[ v >> 24        ])
And if you need to handle values of more than 32 bits, one of these two are the best:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
def popcount_kernighan(v):
    c = 0
    while v:
        v &= v - 1
        c += 1
    return c

POPCOUNT32_LIMIT = 2**32-1
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
    POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]

def popcount_hybrid(v):
    if v <= POPCOUNT32_LIMIT:
        return (POPCOUNT_TABLE16[ v        & 0xffff] +
                POPCOUNT_TABLE16[(v >> 16) & 0xffff])
    else:
        c = 0
        while v:
            v &= v - 1
            c += 1
        return c

If it needs to be faster than that, write in C!

Test For Yourself

If you want to test these yourself, here's some code you can run.
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
def popcount_naive(v):
    c = 0
    while v:
        c += (v & 1)
        v >>= 1
    return c

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
def popcount_kernighan(v):
    c = 0
    while v:
        v &= v - 1
        c += 1
    return c

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
    POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]

def popcount32_table8(v):
    return (POPCOUNT_TABLE8[ v        & 0xff] +
            POPCOUNT_TABLE8[(v >> 8)  & 0xff] +
            POPCOUNT_TABLE8[(v >> 16) & 0xff] +
            POPCOUNT_TABLE8[ v >> 24        ])

POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

POPCOUNT32_LIMIT = 2**32-1
def popcount_table16_kernighan(v):
    if v <= POPCOUNT32_LIMIT:
        return (POPCOUNT_TABLE16[ v        & 0xffff] +
                POPCOUNT_TABLE16[(v >> 16) & 0xffff])
    else:
        c = 0
        while v:
            v &= v - 1
            c += 1
        return c


#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64
def popcount32_ops64(v):
    return ((((v & 0xfff) * 0x1001001001001 & 0x84210842108421)            % 0x1f) +
            ((((v & 0xfff000) >> 12) * 0x1001001001001 & 0x84210842108421) % 0x1f) +
            (((v >> 24) * 0x1001001001001 & 0x84210842108421)              % 0x1f))
   

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
#also http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_parallel(v):
    v = v                - ((v >> 1) & 0x55555555)
    v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
    #v = (v & 0x0F0F0F0F) + ((v >> 4) & 0x0F0F0F0F)
    #v = v                + ((v >> 4) & 0x0F0F0F0F)
    v = (v + (v >> 4)) & 0x0F0F0F0F
    v = (v * 0x1010101) >> 24
    return v % 256 #I added %256. I'm not sure why it's needed. It's probably because of signed ints in Python

def popcount32_bagwell(v):
    v = v                - ((v >> 1) & 0x55555555)
    v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
    v = (v & 0x0F0F0F0F) + ((v >> 4) & 0x0F0F0F0F)
    v = v                +  (v >> 8)
    v = (v + (v >> 16)) & 0x3F
    return v

#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount_elklund(v):
    c = 0
    while v:
        v ^= v & -v
        c += 1
    return c

#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_freebsd(v):
    v = (v & 0x55555555) + ((v & 0xaaaaaaaa) >> 1);
    v = (v & 0x33333333) + ((v & 0xcccccccc) >> 2);
    v = (v & 0x0f0f0f0f) + ((v & 0xf0f0f0f0) >> 4);
    v = (v & 0x00ff00ff) + ((v & 0xff00ff00) >> 8);
    v = (v & 0x0000ffff) + ((v & 0xffff0000) >> 16);
    return v

#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_bsdmacro(v):
    #define BITCOUNT(x) (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)
    #define BX_(x) ((x) - (((x)>>1)&) - (((x)>>2)&0x33333333) - (((x)>>3)&0x11111111))
    #
    #def bx(v): (v - ((v >> 1) & 0x77777777) - ((v >> 2) & 0x33333333) - ((v >> 3) & 0x11111111))
    #def bc(v): ((bx(v) + (bx(v) >> 4)) & 0x0F0F0F0F) % 255

    v = (v - ((v >> 1) & 0x77777777) - ((v >> 2) & 0x33333333) - ((v >> 3) & 0x11111111))
    v = ((v + (v >> 4)) & 0x0F0F0F0F)
    return v % 255



#http://mail.python.org/pipermail/python-list/1999-July/007696.html
import marshal, array, struct, string
POPCOUNT8_TRANS_TABLE = "".join(map(chr, POPCOUNT_TABLE8))

#changed by me to match new dumps() and use sum()
def popcount_timpeters(v):
    counts = array.array("B", string.translate(marshal.dumps(v), POPCOUNT8_TRANS_TABLE))
    # overwrite type code
    counts[0] = 0
    return sum(counts)

#changed by me to add loop unrolling and not setting digitcounts[0]
def popcount32_timpeters2(v):
    counts = array.array("B", string.translate(marshal.dumps(v), POPCOUNT8_TRANS_TABLE))
    return counts[1] + counts[2] + counts[3] + counts[4]

#improved by me: no need to translate type char
def popcount32_timpeters3(v):
    dumped = marshal.dumps(v)
    return POPCOUNT_TABLE8[ord(dumped[1])] + POPCOUNT_TABLE8[ord(dumped[2])] + POPCOUNT_TABLE8[ord(dumped[3])] + POPCOUNT_TABLE8[ord(dumped[4])]

#http://mail.python.org/pipermail/python-list/1999-July/007696.html
_run2mask = {1: 0x5555555555555555L,
             2: 0x3333333333333333L,
             4: 0x0F0F0F0F0F0F0F0FL,
             8: 0x00FF00FF00FF00FFL}

def buildmask2(run, n):
    run2 = run + run
    k = (n + run2 - 1) / run2
    n = k * run2
    try:
        answer = _run2mask[run]
        k2 = 64 / run2
    except KeyError:
        answer = (1L << run) - 1
        k2 = 1
    while k > k2:
        k2 = k2 + k2
        if k >= k2:
            answer = answer | (answer << (run * k2))
        else:
            answer = answer | (answer << (run2 * (k - k2/2)))
            k2 = k
    if k2 > k:
        answer = answer >> (run2 * (k2 - k))
    return answer, n

def nbits(v):
    return 32 #???

def popcount_seistrup(v):
    lomask, n = buildmask2(1, nbits(v))
    v = v - ((v >> 1) & lomask)
    target = 2
    while n > target:
        lomask, n = buildmask2(target, n)
        v = (v & lomask) + ((v >> target) & lomask)
        target = target + target
        for i in range(1, target/2):
            if n <= target:
                break
            n = n >> 1
            n = (n + target - 1) / target * target
            v = (v & ((1L <<>> n)
    return int(v)
   

if __name__ == "__main__":
    import time, random

    def time_func(func):
        before = time.time()
        result = func()
        after  = time.time()
        span   = after - before
        return result, span

    popcounts = [popcount_naive,       
                 popcount32_table8,    
                 popcount32_table16,   
                 popcount_table16_kernighan,
                 popcount_kernighan,   
                 popcount32_ops64,     
                 popcount32_parallel,  
                 popcount32_bagwell,   
                 popcount_elklund,     
                 popcount32_freebsd,   
                 popcount32_bsdmacro,  
                 popcount_seistrup,    
                 popcount_timpeters,   
                 popcount32_timpeters2,
                 popcount32_timpeters3,
                 ]

    test_count      = 30000
    max_int         = 2**31 - 2
    ints            = range(0, 257) + [random.randint(0, max_int) for _ in xrange(test_count)] + range(max_int - 100, max_int+1)
    expected_counts = map(popcount_naive, ints)
    for popcount in popcounts:
        counts, timespan = time_func(lambda : map(popcount, ints))
        print "%5.5f %s" % ((timespan if (counts == expected_counts) else -1), popcount.__name__) #-1 means failure

November 11, 2008

Easy Python Decorators with the decorator decorator

Decorators in python are very powerful, but are often a pain to get right, especially when you want to pass arguments to the decorator. Typically, it involves defining a function in a function in a function, etc, up to 4 layers deep. Can we make it easier? What if we even made a decorator to make decorators?

Perhaps we could use it catch and log errors:

@decorator
def log_error(func, *args, **kargs):
   try:
       return func(*args, **kargs)
   except Exception, error:
       print "error occurred: %r" % error

@log_error
def blowup():
   raise Exception("blew up")

blowup() # this gets caught and prints the error

Or maybe we'd like to synchronize a function to make it thread-safe:

@decorator
def synchronized(lock, func, *args, **kargs):
    lock.acquire()
    try:
        return func(self, *args, **kargs)
    finally:
        lock.release()

missle_lock = thread.RLock()

@synchronized(missle_lock)
def launch_missiles():
    pass

Or we could even use arguments to do do some object __init__ trickery (something I've had to do when working with wxpython):

@decorator
def inherits_format(method, *args, **kargs):
    self, parent = args[:2]
    self.format = parent.format
    return method(*args, **kargs)

class Child:
    @inherits_format
    def __init__(self, parent):
        pass

class Parent:
    def __init__(self, format):
        self.format = format

format = object()
child  = Child(Parent(format))
assert child.format is format

If you've never worked python decorators, these are few examples of how powerful they are. But you'll probably never want to write your own because it can be a pain. If that's the case, then the decorator decorator is for you! Here's the code for it. It's a little tricky, but all you have to do is import it, slap @decorator before your decorator, make sure you call func(*args, **kargs), and you're all set. This is as easy as decorators can get.

Update: Dave left a comment and notified me that there is another, much more complete, implementation of the same idea at http://www.phyast.pitt.edu/~micheles/python/documentation.html. So, if you want a very complete module, go there. If you want something small and simple to cut and paste into your own code, use the following.

def decorator(decorator_func):
    decorator_expected_arg_count = decorator_func.func_code.co_argcount - 1
    if decorator_expected_arg_count:
        def decorator_maker(*decorator_args):
            assert len(decorator_args) == decorator_expected_arg_count, "%s expects %d args" % (decorator_func.__name__, decorator_expected_arg_count)

            def _decorator(func):
                assert callable(func), "Decorator not given a function.  Did you forget to give %r arguments?" % (decorator_func.__name__)

                def decorated_func(*args, **kargs):
                    full_args = decorator_args + (func,) + args
                    return decorator_func(*full_args, **kargs)

                decorated_func.__name__ = func.__name__
                return decorated_func
            return _decorator
        return decorator_maker
    else:
        def _decorator(func):
            def decorated_func(*args, **kargs):
                return decorator_func(func, *args, **kargs)

            decorated_func.__name__ = func.__name__
            return decorated_func
        return _decorator
    

Blog Archive

Google Analytics