I've learned. I'll share.

October 22, 2009

Introducing pyrec: The cure to the bane of __init__

I finally discovered how cool github is, and have started putting some code up there. My first entry is Record.py. I'm calling it the cure to the bane of __init__" because

  1. Mutable data structures are a bane to concurrent (multi-threaded) code.
  2. Writing self.foo = foo, self.bar = bar, etc, is a huge waste of time.
  3. When you have lots of data structures in memory, tuple-based data uses 1/4 the memory of class-based data.
As you can probably tell from my blog, I've experimented with a lot of wacky programming ideas and bending Python in ways it was never intended. But if there's one experiment that's been a success, it's Record.py. I use it for almost all of my classes. It's just so easy to use. I'm calling it "pyrec" because it's easier to write, google, etc.

So, go to the github repo or use it by following the really easy steps:

  1. Download Record.py from http://github.com/pthatcher/pyrec/blob/master/Record.py
  2. put from Record import Record at the top of your code.
  3. make a class by saying something like class Person(Record("name", "age"))
  4. Never write __init__ again (unless you want mutability).
Enjoy!

Update: A commenter (thanks Dan!) pointed out that this is a lot like namedtuple, added in python 2.6. He asked why use this instead of namedtuple. Well, I have to admit that I probably would have never created pyrec if namedtuple existed 3 years ago. I try to avoid NIH syndrome. But it didn't exist, so I wrote pyrec. But I've been using pyrec for three years, so I have some experience on some little things that make a big difference (to me). Here are a few advantages pyrec has over namedtuple:

  1. It has a nicer interface. I prefer new(val1, val2) to _make([val1, val2]), alter to _update, and class Person(Record("name", "age")) to Person = namedtuple("Person", "name, age")
  2. I added the setField methods. That's what I use 90% of the time. Only about 10% of the time do I use alter. setField is a lot more convenient.
  3. With pyrec, you can safely override __iter__ and __getitem__. For example, in Record.py, you'll see the implementation of a LinkedList. I tried doing that with namedtuple, but the overidden __getitem__ clobers the name lookup and __iter__ the tuple unpacking.
  4. You can use tuple.__iter__(rec) to get around the latter, but pyrec's .values is a lot nicer.
  5. pyrec has .namedValues for ordered (field, value) pairs, unlike _asdict() which throws out the order. For many things I use pyrec for, this matters.
  6. You can improve it! Have looked at the code for namedtuple? Ugly. This is pretty clean, so you can improve it very easily if you need additional functionality which will work with all of your records.
If you don't care about those things, use namedtuple. It's still way better than mutable classes. But having used pyrec for three years, these little things matter to me, and so I'm still going to use pyrec. But if you want most of both worlds, I added NamedTuple to pyrec, which is a subclass of namedtuple which adds most of the pyrec goodness (everything but safe __getitem__ overloading). Thanks for update, Dan.

5 comments:

  1. Is there a compelling reason to prefer pyrec to collections.namedtuple? The latter is part of the standard library in versions 2.6 and above, and works just fine in 2.5.

    Either way, death to mutability-by-default!

    ReplyDelete
  2. Dan, I updated the post to answer your question. Thanks for brining it up. I'm still in python 2.5 land and had forgotten about additions to 2.6 like namedtuple.

    ReplyDelete
  3. Thanks for the comprehensive reply. pyrec's interface definitely looks nicer, particularly the support for overriding __getitem__ - namedtuple fails rather ungracefully in that department. I'll have to give a pyrec a spin.

    ReplyDelete
  4. Interesting.

    One point: setField instead of set_field:
    1) conflicts with PEP 008
    2) breaks capitalization, breaks consistency. You know, attribute name is 'foo', setter is 'setFoo'.

    I'm using this:

    class Record(dict): pass

    foo = Record(g=50)

    If you change dict to FrozenDict, it will become immutable. But tuples take less memory, yeah.

    P.S.: "Your HTML cannot be accepted: Tag is not allowed: CODE" What's wrong with the most useful tag? And it doesn't accept PRE either. Jesus, even Q is not allowed.

    ReplyDelete
  5. Temoto, good point about the method names. PEP008 specifically says:

    "mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility."

    I've been using mixedCase for all of my code for years because of "prevailing style". And I forgot that most python developers use underscore_case.

    So, in order to fix this, I've changed pyrec from mixedCase to underscore_case go see it at http://github.com/pthatcher/pyrec. Thanks for the heads-up.

    ReplyDelete

Blog Archive

Google Analytics