tag:blogger.com,1999:blog-17001572362062005972009-05-18T03:30:52.639-07:00Valued LessonsI've learned. I'll share.Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-1700157236206200597.post-31956528507004566682009-01-14T12:06:00.000-08:002009-01-14T12:07:03.222-08:00Popcount in Python (with benchmarks)<h3>Purpose</h3>
A slightly uncommon bit-twiddling operation is "popcount", which
counts the number high bits. While uncommon, it's crucial for
implementing Array Mapped Tries, which is a way to implement immutable
maps and vectors. I'm trying to implement immutable maps and vectors,
so I needed to implement popcount.
<p>
I found A TON of ways, but had no idea which was fastest. So, I
implemented them all and tested them.
The short answer is to do this:
<pre class="code">
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]
def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
</pre>
And here's the long answer:
<h3>Results</h3>
I ran popcount on 30,000 random ints between 0 and 2<super>31</super>-1. Here are the results from my
Linux Desktop with a 3.2Ghz Core 2 Quad:
<p>
<div align="center">
<table border="1" cellpadding="4">
<tr><td>Name </td><td>Time</td><td>Max Bits</td><td>URL</td></tr>
<tr><td>c_bagwell </td><td>0.0030</td><td>32</td><td>http://lamp.epfl.ch/papers/idealhashtrees.pdf</td></tr>
<tr><td>c_bsdmacro </td><td>0.0030</td><td>32</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>c_parallel </td><td>0.0031</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel</td></tr>
<tr><td>c_ops64 </td><td>0.0036</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64</td></tr>
<tr><td>table16 </td><td>0.0098</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable</td></tr>
<tr><td>c_table8 </td><td>0.0110</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive</td></tr>
<tr><td>table+kern </td><td>0.0132</td><td> -</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable</td></tr>
<tr><td>table8 </td><td>0.0163</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable</td></tr>
<tr><td>bsdmacro </td><td>0.0190</td><td>32</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>parallel </td><td>0.0199</td><td>32</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>ops64 </td><td>0.0207</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64</td></tr>
<tr><td>bagwell </td><td>0.0242</td><td>32</td><td>http://lamp.epfl.ch/papers/idealhashtrees.pdf</td></tr>
<tr><td>freebsd </td><td>0.0257</td><td>32</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>timpeters3 </td><td>0.0277</td><td>32</td><td>http://mail.python.org/pipermail/python-list/1999-July/007696.html</td></tr>
<tr><td>timpeters2 </td><td>0.0580</td><td>32</td><td>http://mail.python.org/pipermail/python-list/1999-July/007696.html</td></tr>
<tr><td>timpeters </td><td>0.0724</td><td> ?</td><td>http://mail.python.org/pipermail/python-list/1999-July/007696.html</td></tr>
<tr><td>kernighan </td><td>0.0745</td><td> -</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan</td></tr>
<tr><td>elklund </td><td>0.0889</td><td> -</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>naive </td><td>0.1519</td><td> -</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive</td></tr>
<tr><td>seistrup </td><td>0.2447</td><td> ?</td><td>http://mail.python.org/pipermail/python-list/1999-July/007696.html</td></tr>
</table>
</div>
<p>
And here are the results for my MacBook with a 2.0Ghz Core 2 Duo:
<p>
<div align="center">
<table border="1" cellpadding="4">
<tr><td>Name </td><td>Time</td><td>Max Bits</td><td>URL</td></tr>
<tr><td>table16 </td><td>0.0279</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable</td></tr>
<tr><td>table+kern </td><td>0.0372</td><td> -</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable</td></tr>
<tr><td>table8 </td><td>0.0417</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable</td></tr>
<tr><td>bsdmacro </td><td>0.0481</td><td>32</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>bagwell </td><td>0.0596</td><td>32</td><td>http://lamp.epfl.ch/papers/idealhashtrees.pdf</td></tr>
<tr><td>timpeters3 </td><td>0.0744</td><td>32</td><td>http://mail.python.org/pipermail/python-list/1999-July/007696.html</td></tr>
<tr><td>parallel </td><td>0.0807</td><td>32</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>ops64 </td><td>0.1290</td><td>32</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64</td></tr>
<tr><td>timpeters2 </td><td>0.1439</td><td>32</td><td>http://mail.python.org/pipermail/python-list/1999-July/007696.html</td></tr>
<tr><td>kernighan </td><td>0.1527</td><td> -</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan</td></tr>
<tr><td>timpeters </td><td>0.1668</td><td> ?</td><td>http://mail.python.org/pipermail/python-list/1999-July/007696.html</td></tr>
<tr><td>freebsd </td><td>0.1772</td><td> -</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>elklund </td><td>0.1913</td><td> -</td><td>http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html</td></tr>
<tr><td>naive </td><td>0.2889</td><td> -</td><td>http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive</td></tr>
<tr><td>seistrup </td><td>0.6106</td><td> ?</td><td>http://mail.python.org/pipermail/python-list/1999-July/007696.html</td></tr>
</table>
</div>
<p>
<h3>Observations</h3>
<ol>
<li>Implementations written in C (using <a href="http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/">Pyrex</a>) are 5-6 times faster than Python.</li>
<li>My 3.2 Ghz Linux Desktop is 200% faster than my 2.0 Ghz MacBook even though it only has a 60% clockspeed advantage. I'm not sure what to make of that. Python doesn't use multiple cores well, so I'm sure it's not that.</li>
<li>If you want to run popcount on 32-bit values in pure Python, table16 is the fastest by far, and only 3 times slower than implementations in C.</li>
<li>If you need to run popcount on arbitrarily large integers, kernighan is the best, but doing a hybrid of table16 and kernighan is probably better.</li>
</ol>
<p>
<h3>Conclusion</h3>
If you don't mind using about 64KB of memory, here's is the fastest popcount in pure python:
<pre class="code">
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]
def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
</pre>
If you do mind the memory usage, here's a slighly slower version:
<pre class="code">
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]
def popcount32_table8(v):
return (POPCOUNT_TABLE8[ v & 0xff] +
POPCOUNT_TABLE8[(v >> 8) & 0xff] +
POPCOUNT_TABLE8[(v >> 16) & 0xff] +
POPCOUNT_TABLE8[ v >> 24 ])
</pre>
And if you need to handle values of more than 32 bits, one of these two are the best:
<pre class="code">
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
def popcount_kernighan(v):
c = 0
while v:
v &= v - 1
c += 1
return c
POPCOUNT32_LIMIT = 2**32-1
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]
def popcount_hybrid(v):
if v <= POPCOUNT32_LIMIT:
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
else:
c = 0
while v:
v &= v - 1
c += 1
return c
</pre>
If it needs to be faster than that, write in C!
<h3>Test For Yourself</h3>
If you want to test these yourself, here's some code you can run.
<pre class="code">
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
def popcount_naive(v):
c = 0
while v:
c += (v & 1)
v >>= 1
return c
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
def popcount_kernighan(v):
c = 0
while v:
v &= v - 1
c += 1
return c
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]
def popcount32_table8(v):
return (POPCOUNT_TABLE8[ v & 0xff] +
POPCOUNT_TABLE8[(v >> 8) & 0xff] +
POPCOUNT_TABLE8[(v >> 16) & 0xff] +
POPCOUNT_TABLE8[ v >> 24 ])
POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]
def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
POPCOUNT32_LIMIT = 2**32-1
def popcount_table16_kernighan(v):
if v <= POPCOUNT32_LIMIT:
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
else:
c = 0
while v:
v &= v - 1
c += 1
return c
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64
def popcount32_ops64(v):
return ((((v & 0xfff) * 0x1001001001001 & 0x84210842108421) % 0x1f) +
((((v & 0xfff000) >> 12) * 0x1001001001001 & 0x84210842108421) % 0x1f) +
(((v >> 24) * 0x1001001001001 & 0x84210842108421) % 0x1f))
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
#also http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_parallel(v):
v = v - ((v >> 1) & 0x55555555)
v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
#v = (v & 0x0F0F0F0F) + ((v >> 4) & 0x0F0F0F0F)
#v = v + ((v >> 4) & 0x0F0F0F0F)
v = (v + (v >> 4)) & 0x0F0F0F0F
v = (v * 0x1010101) >> 24
return v % 256 #I added %256. I'm not sure why it's needed. It's probably because of signed ints in Python
def popcount32_bagwell(v):
v = v - ((v >> 1) & 0x55555555)
v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
v = (v & 0x0F0F0F0F) + ((v >> 4) & 0x0F0F0F0F)
v = v + (v >> 8)
v = (v + (v >> 16)) & 0x3F
return v
#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount_elklund(v):
c = 0
while v:
v ^= v & -v
c += 1
return c
#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_freebsd(v):
v = (v & 0x55555555) + ((v & 0xaaaaaaaa) >> 1);
v = (v & 0x33333333) + ((v & 0xcccccccc) >> 2);
v = (v & 0x0f0f0f0f) + ((v & 0xf0f0f0f0) >> 4);
v = (v & 0x00ff00ff) + ((v & 0xff00ff00) >> 8);
v = (v & 0x0000ffff) + ((v & 0xffff0000) >> 16);
return v
#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_bsdmacro(v):
#define BITCOUNT(x) (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)
#define BX_(x) ((x) - (((x)>>1)&) - (((x)>>2)&0x33333333) - (((x)>>3)&0x11111111))
#
#def bx(v): (v - ((v >> 1) & 0x77777777) - ((v >> 2) & 0x33333333) - ((v >> 3) & 0x11111111))
#def bc(v): ((bx(v) + (bx(v) >> 4)) & 0x0F0F0F0F) % 255
v = (v - ((v >> 1) & 0x77777777) - ((v >> 2) & 0x33333333) - ((v >> 3) & 0x11111111))
v = ((v + (v >> 4)) & 0x0F0F0F0F)
return v % 255
#http://mail.python.org/pipermail/python-list/1999-July/007696.html
import marshal, array, struct, string
POPCOUNT8_TRANS_TABLE = "".join(map(chr, POPCOUNT_TABLE8))
#changed by me to match new dumps() and use sum()
def popcount_timpeters(v):
counts = array.array("B", string.translate(marshal.dumps(v), POPCOUNT8_TRANS_TABLE))
# overwrite type code
counts[0] = 0
return sum(counts)
#changed by me to add loop unrolling and not setting digitcounts[0]
def popcount32_timpeters2(v):
counts = array.array("B", string.translate(marshal.dumps(v), POPCOUNT8_TRANS_TABLE))
return counts[1] + counts[2] + counts[3] + counts[4]
#improved by me: no need to translate type char
def popcount32_timpeters3(v):
dumped = marshal.dumps(v)
return POPCOUNT_TABLE8[ord(dumped[1])] + POPCOUNT_TABLE8[ord(dumped[2])] + POPCOUNT_TABLE8[ord(dumped[3])] + POPCOUNT_TABLE8[ord(dumped[4])]
#http://mail.python.org/pipermail/python-list/1999-July/007696.html
_run2mask = {1: 0x5555555555555555L,
2: 0x3333333333333333L,
4: 0x0F0F0F0F0F0F0F0FL,
8: 0x00FF00FF00FF00FFL}
def buildmask2(run, n):
run2 = run + run
k = (n + run2 - 1) / run2
n = k * run2
try:
answer = _run2mask[run]
k2 = 64 / run2
except KeyError:
answer = (1L << run) - 1
k2 = 1
while k > k2:
k2 = k2 + k2
if k >= k2:
answer = answer | (answer << (run * k2))
else:
answer = answer | (answer << (run2 * (k - k2/2)))
k2 = k
if k2 > k:
answer = answer >> (run2 * (k2 - k))
return answer, n
def nbits(v):
return 32 #???
def popcount_seistrup(v):
lomask, n = buildmask2(1, nbits(v))
v = v - ((v >> 1) & lomask)
target = 2
while n > target:
lomask, n = buildmask2(target, n)
v = (v & lomask) + ((v >> target) & lomask)
target = target + target
for i in range(1, target/2):
if n <= target:
break
n = n >> 1
n = (n + target - 1) / target * target
v = (v & ((1L <<>> n)
return int(v)
if __name__ == "__main__":
import time, random
def time_func(func):
before = time.time()
result = func()
after = time.time()
span = after - before
return result, span
popcounts = [popcount_naive,
popcount32_table8,
popcount32_table16,
popcount_table16_kernighan,
popcount_kernighan,
popcount32_ops64,
popcount32_parallel,
popcount32_bagwell,
popcount_elklund,
popcount32_freebsd,
popcount32_bsdmacro,
popcount_seistrup,
popcount_timpeters,
popcount32_timpeters2,
popcount32_timpeters3,
]
test_count = 30000
max_int = 2**31 - 2
ints = range(0, 257) + [random.randint(0, max_int) for _ in xrange(test_count)] + range(max_int - 100, max_int+1)
expected_counts = map(popcount_naive, ints)
for popcount in popcounts:
counts, timespan = time_func(lambda : map(popcount, ints))
print "%5.5f %s" % ((timespan if (counts == expected_counts) else -1), popcount.__name__) #-1 means failure
</pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-3195652850700456668?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com0tag:blogger.com,1999:blog-1700157236206200597.post-72040226907486692322008-11-11T11:05:00.000-08:002008-12-09T09:53:20.668-08:00Easy Python Decorators with the decorator decorator<p>
Decorators in python are very powerful, but are often a pain to get
right, especially when you want to pass arguments to the decorator.
Typically, it involves defining a function in a function in a
function, etc, up to 4 layers deep. Can we make it easier? What if
we even made a decorator to make decorators?
<p>
Perhaps we could use it catch and log errors:
<pre class="code">
@decorator
def log_error(func, *args, **kargs):
try:
return func(*args, **kargs)
except Exception, error:
print "error occurred: %r" % error
@log_error
def blowup():
raise Exception("blew up")
blowup() # this gets caught and prints the error
</pre>
<p>
Or maybe we'd like to synchronize a function to make it
thread-safe:
<pre class="code">
@decorator
def synchronized(lock, func, *args, **kargs):
lock.acquire()
try:
return func(self, *args, **kargs)
finally:
lock.release()
missle_lock = thread.RLock()
@synchronized(missle_lock)
def launch_missiles():
pass
</pre>
<p>
Or we could even use arguments to do do some object __init__ trickery
(something I've had to do when working with wxpython):
<pre class="code">
@decorator
def inherits_format(method, *args, **kargs):
self, parent = args[:2]
self.format = parent.format
return method(*args, **kargs)
class Child:
@inherits_format
def __init__(self, parent):
pass
class Parent:
def __init__(self, format):
self.format = format
format = object()
child = Child(Parent(format))
assert child.format is format
</pre>
<p>
If you've never worked python decorators, these are few examples of
how powerful they are. But you'll probably never want to write your
own because it can be a pain. If that's the case, then <b>the decorator
decorator is for you!</b> Here's the code for it. It's a little
tricky, but all you have to do is import it, slap @decorator before your
decorator, make sure you call func(*args, **kargs), and you're all
set. <b>This is as easy as decorators can get.</b>
<p><b>
Update</b>: <a href="http://ndanger.org/blog/">Dave</a> left a comment and notified me that there is another, much more complete, implementation of the same idea at <a href="http://www.phyast.pitt.edu/~micheles/python/documentation.html">http://www.phyast.pitt.edu/~micheles/python/documentation.html</a>. So, if you want a very complete module, go there. If you want something small and simple to cut and paste into your own code, use the following.
<pre class="code">
def decorator(decorator_func):
decorator_expected_arg_count = decorator_func.func_code.co_argcount - 1
if decorator_expected_arg_count:
def decorator_maker(*decorator_args):
assert len(decorator_args) == decorator_expected_arg_count, "%s expects %d args" % (decorator_func.__name__, decorator_expected_arg_count)
def _decorator(func):
assert callable(func), "Decorator not given a function. Did you forget to give %r arguments?" % (decorator_func.__name__)
def decorated_func(*args, **kargs):
full_args = decorator_args + (func,) + args
return decorator_func(*full_args, **kargs)
decorated_func.__name__ = func.__name__
return decorated_func
return _decorator
return decorator_maker
else:
def _decorator(func):
def decorated_func(*args, **kargs):
return decorator_func(func, *args, **kargs)
decorated_func.__name__ = func.__name__
return decorated_func
return _decorator
</pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-7204022690748669232?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com3tag:blogger.com,1999:blog-1700157236206200597.post-26638893667450004712008-10-14T20:30:00.000-07:002008-10-14T20:33:19.242-07:00How to DTrace Python in OSX<p>
<a href="http://en.wikipedia.org/wiki/DTrace">DTrace</a> is an
incredible tool. It basically lets you do profiling of a live
application with no performance penatly. I'm writing a Python that
needed some profiling, and I found the "normal" techniques like the
profile/cProfile module very lacking. Luckily, Mac OSX comes with
DTrace and it even works with Python. The only snag is that it's hard
to find how to use the darn thing. I finally figured it out, so I
figured it pass on the knowledge.
<p>
So, here's how you use dtrace on your python application in Mac OSX:
<ol>
<li>Get <a href="http://opensolaris.org/os/community/dtrace/dtracetoolkit/">DTraceToolkit</a>.</li>
<li>Edit Python/py_cputime.d by replacing "function-entry" with "entry" and "function-return" with "exit".</li>
<li>Call "sudo dtrace -s Python/py_cputime.d"</li>
<li>Let it sit there a while and hit ctrl-c.</li>
<li>Enjoy the results</li>
</ol>
<p>
I can only assume you have to edit the file because of some difference
between Solaris and OSX. You can try files other than py_cputime.d,
but you might have to edit them too. Not all of them work, but most
do.
<p>
The last thing to know is that you have to use the python that comes
with OSX. A custom-built python doesn't seem to work.
<p>
Hope that helps!<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-2663889366745000471?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com1tag:blogger.com,1999:blog-1700157236206200597.post-18730441104388959602008-10-14T17:09:00.000-07:002008-12-09T09:54:26.651-08:00Python Memory Usage: What values are taking up so much memory?<style type="text/css">
.nobr br { display: none }
</style>
<p>
Python seems to use a lot of memory. So what exactly is the overhead
of each type of value? Short answer:
</p>
<div align="center">
<table border="1" cellpadding="4">
<tr><td>int</td> <td> 24</td></tr>
<tr><td>float</td> <td> 24</td></tr>
<tr><td>tuple</td> <td> 63</td></tr>
<tr><td>list</td> <td>101</td></tr>
<tr><td>dict</td> <td>298</td></tr>
<tr><td>old-style class</td> <td>345</td></tr>
<tr><td>new-style class</td> <td>336</td></tr>
<tr><td>subclassed tuple</td> <td> 79</td></tr>
<tr><td>Record</td> <td> 79</td></tr>
<tr><td>Record with old class mixin</td> <td> 79</td></tr>
<tr><td>Record with new class mixin</td> <td> 79</td></tr>
</table>
Measured in bytes using Python 2.5 in 64-bit Ubuntu Linux
</div>
<p>
I measured these by running a simple program that loaded up 1,000,000 values in a list and then did time.sleep(1000). I ran that for different value types and then ran "top" to see how much memory was being used. I took that value, substracted the memory usage for a list of all the same value (14 bytes each), subtracted the value of a child value (usually an int, 24 bytes each), and then divided by 1,000,000. I'll include the code I ran at the end if you want to cut and paste.
So what lessons do we learn from this?
<p>
<ul>
<li>Python objects are very expensive at over 300 bytes each.</li>
<li>Tuples have 1/5 as much overhead.</li>
<li>Records are almost as good as tuples, even when a mixin is added.</li>
</ul>
<p>
So, <b>if you want to have lots of values in memory without using lots of memory, use <a href="http://www.valuedlessons.com/2007/12/immutable-data-in-python-record-or.html">Record</a></b>.
</p>
<p>
If you want to run the test for yourself, here's the code. Just comment out the "make_val" that you want to test.
</p>
<pre class="code">
import time
from Record import Record
class TupleClass(tuple):
pass
class RecordClass(Record("val")):
pass
class OldClass:
def __init__(self, val):
self.val = val
def method(self):
pass
class NewClass(object):
def __init__(self, val):
self.val = val
def method(self):
pass
class RecordWithOldClass(Record("val"), OldClass):
pass
class RecordWithNewClass(Record("val"), NewClass):
pass
make_val = lambda i : 1 #nothing (base overhead)
#make_val = lambda i : i
#make_val = float
#make_val = lambda i : (i,)
#make_val = lambda i : [i]
#make_val = lambda i : {i:i}
#make_val = TupleClass
#make_val = RecordClass
#make_val = OldClass
#make_val = NewClass
#make_val = RecordWithOldClass
#make_val = RecordWithNewClass
count = 1000000
lst = [make_val(i) for i in xrange(count)]
time.sleep(100000)
</pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-1873044110438895960?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com1tag:blogger.com,1999:blog-1700157236206200597.post-9377274089626625342008-06-27T12:43:00.001-07:002008-10-14T19:51:32.585-07:00My Experience with Message Passing ConcurrencyI'm working on a peer-to-peer file synchronization program. It's <i>really</i> concurrent and distributed, and it's forced me to
learn a thing or two about the concurrency models that we use as programmers. Over the past few years, I've tried both the shared-state model common to C, C++, Java, C#, Python, etc, and the message-passing model unique to Erlang and Scala. In my experience, <b>the message-passing model (aka Actor Model) is far superior to the shared-state model for writing distributed applications</b>. After having used both extensively, I'd go so far as to say that <b>for distributed programming, shared-state concurrency is upside down and backwards</b>.
<p>
Here's my informal proof that message-passing concurrency is necessary in a distributed system:
<p>
<ol>
<li>Since the system in distributed, real shared state is impossible.</li>
<li>The only way for the distributed components of an application to
communicate is by sending a message from one to another.</li>
<li>Everything else is an abstraction on top of message passing.</li>
</ol>
HTTP is an abstraction on top of message-passing. AJAX is an abstraction on top of HTTP on top of message-passing. XML-RPC is an abstraction on top of HTTP on top of message-passing. SOAP is an abstraction on top of an abstraction on top of an abstraction on top of HTTP on top of message-passing. Any RPC mechanism is an abstraction on top of message-passing. SQL queries to a remote database are an abstraction on top of message-passing. CORBA is a nasty, tangled mess on top of a foundation of message-passing.
<p>
See a pattern?
<p>
Not only are all of these abstractions, but they are <i>leaky</i> abstractions. Just about all RPC (Remote Procedure Call) frameworks try to pretend that remote objects or local. But the facade is impossible to keep from leaking. Calls to a remote object might fail or take arbitrarily long. If you want to make the same call to two different remote nodes, those calls must be made synchronously and sequentially; in order to call them in parallel on the remote nodes, they most be called in parallel on the local node. If a call to a remote node triggers a call back to the local node, which may trigger a call to a third node, you end up with a huge spaghetti mess of calls and threads.
<p>
I've been down that road. It wasn't pretty. There is a better way.
<p>
I think AJAX has opened our eyes a bit. It's a lot closer to message-passing than it is to RPC. In a REST architecture, you "send" messages by POSTing to a URL and you "receive" messages by GETing a URL. All messages are clearly local copies that were serialized and deserialized from the remote data. There isn't any leaky abstraction of data or classes pretending to be in two places at once. Data, timeouts, and failures are in your face and you have to deal with them. So, AJAX is a lot less leaky.
<p>
But AJAX is far from perfect. I can't help but think of the Greenspun's Tenth Rule of Programming with a twist on distributed programming: "Any sufficiently complicated distributed platform contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a message-passing system (Erlang)". Once you get message-passing in your head, you can't help but think of AJAX in that way.
<p>
I've been down a better road. It was much more pleasant.
<p>
About six months ago, I rewrote major portions of our application with message-passing concurrency. I took a long time. It was tricky. It hurt my head. But it worked. It's a success. It's capable of things that the shared-state system could never do.
<p>
Having done it, I can emphatically say that if I could start all over, I would go with message-passing. Most of the work was building the infrastructure and wrapping my head around a new way of thinking. But now that I've done both of those, I've paid the costs and I'm reaping the rewards. We're moving the application in directions that would have been nearly in possible with the old concurrency model.
<p>
<b>In my experience, message-passing concurrency is the best way to write distributed applications.</b> But it isn't an easy road to go down. Support for it just isn't there in most programming languages and environments. So far, Erlang has been the pioneer. I would humbly agree that the way I've implemented message-passing in Python, compared to Erlang, looks like an ad-hoc, bug-ridden half-implementation. But for various reasons, I can't use Erlang. I'm hoping for someone to create Erlang++ or E# or Eython or something that combines the concurrency model of Erlang with a modern programming language. Until then, I'll just keep on cobbling together what I can onto the programming language I happen to be using.
<p>
More on that in another article.<p><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-937727408962662534?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com7tag:blogger.com,1999:blog-1700157236206200597.post-73461415379297560462008-04-28T15:10:00.000-07:002008-12-09T09:56:01.904-08:00Events in PythonI'm a huge fan of the <a href="http://en.wikipedia.org/wiki/Actor_model"> Actor Model </a>. I think that for most applications, it's the best way to do concurrency. As CPUs get more cores, concurrency becomes more important for programmers, and I think the Actor Model will become more important, too.
<p>
But, sometimes you need something more light-weight for handling "events". For example, imagine you have some code listening for file changes in a particular directory. What you'd like to do is make an "event" that is "fired" whenever a file change is detected. When fired, there may be a "handler" or "listener" which is notified of the event. That "handler" is ultimately just some code which is executed when the event occurs.
<p>
A while ago, <b>I wanted an event system like this for Python</b>. I didn't see anything builtin or any library available, so I decided to write my own. I'd like to share what I created with you.
<p>
But first, I want to follow an example through other programming languages to give you an idea of what I was trying to accomplish and how it compares with what's out there. After that, I'll give you my implemenation in Python. Our example will be listening for changes on the file system. We want to keep the "watcher" code decoupled from the rest of the code, so we use events.
<p>
The best implementation of events that I've used is in C#, so we'll start there. In C# 1.0, our file watcher would looks something like this:
<p>
<pre class="code">
public delegate void FileChangeHandler(string source_path);
class FileWatcher
{
public event FileChangeHandler FileChanged;
public void WatchForChanges()
{
...
if(FileChanged != null)
{
FileChanged(source_path);
}
...
}
}
class FileChangeLogger
{
public void OnFileChanged(string source_path)
{
Console.WriteLine(String.Format("{0} changed.", source_path));
}
}
watcher = FileWatcher();
logger = FileChangeLogger();
watcher.FileChanged += new FileChangeHandler(logger.OnFileChanged);
watcher.WatchForChanges();
</pre>
It's pretty nice. The best part is at the end, where you can write "watcher.FileChange += ...". But, you need to type the completely useless "new FileChangeHandler", and you also need to wrap it all in a separate FileChangeLogger class. Luckily, in C# 2.0, they added <a href="http://msdn2.microsoft.com/en-us/magazine/cc163970.aspx">Anonymous Delegates</a>, which makes this much nicer:
<p>
<pre class="code">
watcher.FileChanged += delegate(string source_path)
{
Console.WriteLine(String.Format("{0} changed.", source_path));
}
</pre>
And in C# 3.0, they've made it even nicer!
<pre class="code">
watcher.FileChanged += (source_path => Console.WriteLine(String.Format("{0} changed.", source_path)));
</pre>
<b>C# 3.0 has an event system that's downright slick, with type-inferencing and everything</b>. I don't think it gets much better than that.
<p>
Actually, there is one thing. If there's no handler registered with the event, calling "FileChanged()" will blow up because it sees FileChanged == null until a handler is registered. This means that you have to write "if(Event != null){Event(...):}" every single time you fire the event. Every single time. If you forget, your code will seem to work fine until there's no handler, in which case it will blow up and you'll smack your forhead because you forgot that you have to repeat that line of code every single time you fire an event. I really mean every single time. It's by far the worst wart on an otherwise elgant system. I have no idea why the designers of C# thought this was a good idea. When would you ever want to fire and event and have it blow up in your face?
<p>
Anyway, let's try a different programming language, perhaps Java. It has the worst implementation of events I've ever seen. Here's our FileWatcher example:
<p>
...
<p>
...
<p>
Ok, nevermind, I don't have the heart. I can imagine the code full of IListeners, and implementations, and keeping an array of them, and iterating over them, etc, and I just can't do it. I got seriously upset at one useless line in C#. In Java, it's at least 10 times worse. If you really have the stomach for it, go look at http://java.sun.com/docs/books/tutorial/uiswing/events/index.html. To me, it appears that <b>in Java, events are one giant hack around the lack of first-class functions.</b> If Java had first-class functions, none of that nonsense would be necessary.
<p>
Now that we've seen a good implementation of events in C# and avoided a bad one in Java, let's make one for Python. I'd rather it be like the C# event system, so let's see what our example would look like:
<p>
<pre class="code">
from Event import Event
class FileWatcher:
def __init__(self):
self.fileChanged = Event()
def watchFiles(self):
...
self.fileChanged(source_path)
...
def log_file_change(source_path):
print "%r changed." % (source_path,)
watcher = FileWatcher()
watcher.fileChanged += log_file_change
</pre>
I think that looks pretty good. So what does the implementation of Event look like?
<p>
<pre class="code">
class Event(IEvent):
def __init__(self):
self.handlers = set()
def handle(self, handler):
self.handler.add(handler)
return self
def unhandle(self, handler):
try:
self.handlers.remove(handler)
except:
raise ValueError("Handler is not handling this event, so cannot unhandle it.")
return self
def fire(self, *args, **kargs):
for handler in self.handlers:
handler(*args, **kargs)
def getHandlerCount(self):
return len(self.handlers)
__iadd__ = handle
__isub__ = unhandle
__call__ = fire
__len__ = getHandlerCount
</pre>
Wow. That was pretty short. Actually, this is one of the reasons I love Python. If the language doesn't have a feature, we can probably add it. <b>We just added one of C#'s best features to Python in 26 lines of code</b>. More importantly, we now have a nice, light-weight, easy-to-use event system for Python.
<p>
For all of you how like full examples that you can cut and paste, here is one that you can run. Enjoy!
<p>
<pre class="code">
class Event:
def __init__(self):
self.handlers = set()
def handle(self, handler):
self.handlers.add(handler)
return self
def unhandle(self, handler):
try:
self.handlers.remove(handler)
except:
raise ValueError("Handler is not handling this event, so cannot unhandle it.")
return self
def fire(self, *args, **kargs):
for handler in self.handlers:
handler(*args, **kargs)
def getHandlerCount(self):
return len(self.handlers)
__iadd__ = handle
__isub__ = unhandle
__call__ = fire
__len__ = getHandlerCount
class MockFileWatcher:
def __init__(self):
self.fileChanged = Event()
def watchFiles(self):
source_path = "foo"
self.fileChanged(source_path)
def log_file_change(source_path):
print "%r changed." % (source_path,)
def log_file_change2(source_path):
print "%r changed!" % (source_path,)
watcher = MockFileWatcher()
watcher.fileChanged += log_file_change2
watcher.fileChanged += log_file_change
watcher.fileChanged -= log_file_change2
watcher.watchFiles()
</pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-7346141537929756046?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com8tag:blogger.com,1999:blog-1700157236206200597.post-30579403945616660402008-04-01T21:37:00.000-07:002008-12-09T09:56:53.451-08:00You Could Have Invented Monadic ParsingEven though Pysec (and functional programming in python in general) <a href="http://www.valuedlessons.com/2008/03/why-are-my-monads-so-slow.html">turned out to be 2-3x slower</a>, monadic parsing is still great for certain tasks. After I wrote about Pysec <a href="http://www.valuedlessons.com/2008/02/pysec-monadic-combinatoric-parsing-in.html">the first time</a>, I received comments regarding <a href="http://pyparsing.wikispaces.com/">pyparsing</a>, which is a great parsing library that looks very similar. Today, I'd like to show you a very simple example where monadic parsing really shines above all other parsing techniques, including pyparsing. This is a real-world need that I had, not just an academic thought-experiment.
<p>
The situation is that I'm writing a peer-to-peer file synchronizer and I need to serialize binary data in an efficient way. I've choosen to use what I call "sized binary", which is basically <a href="http://en.wikipedia.org/wiki/Netstrings">netstrings</a> without the comma or <a href="http://en.wikipedia.org/wiki/Bencode">bencode's byte string</a>. The byte-string "ABC", for example, would be encoded as "3:ABC", or "Hello World!" would be "12:Hello World!". This is very efficient because "ABC" or "Hello World!" can be <b>any</b> bytes \x00-\xFF, without any sort of encoding or decoding like <a href="http://en.wikipedia.org/wiki/Uuencode">uuencode</a>.
<p>
Stop and think for a minute how you would define a grammer to parse "3:ABC". It's so simple, right? Well, try and write a grammar for it. Try using BNF, EBNF, yacc/bison, pyparsing, SimpleParse, or any parsing libarary you know of, in any programming language. If you manage, please write me an email and let me know how you did it, because I'm not so sure it's even possible. Maybe it is in the <a href="http://www.ibm.com/developerworks/linux/library/l-cpregex.html">Perl6 grammars</a>; they're throwing the kitchen sink into that thing :). The best I've seen is that the creator of pyparsing hacked a grammar object to change how it parses the "ABC" part after a different object parsed the "3" part. This basically uses the grammar object as a global variable. It works in a single-core world, but it won't work in the many-core world we're headed for.
<p>
But now you're getting bored with me because it's silly ringing my hands over such a simple format. I bet you could parse this format in just 4 lines of code:
<p>
<pre class="code">
def parse_sized_binary(stream):
size_bytes, stream = stream.readUntil(":")
size = int(size_bytes)
bytes, stream = stream.read(size)
return bytes, stream
</pre>
You're right, it's really simple, at least until you have to embed this little "sized binary language" inside of a bigger language. Perhaps, like me, you need to embed binary data inside of a bigger data structure, and so you need to define a serialization for things like ints, floats, lists, and hash tables. For that stuff, you can define a grammer for a language like <a href="http://en.wikipedia.org/wiki/JSON">JSON</a> or <a href="http://en.wikipedia.org/wiki/YAML">YAML</a> and hand it over to a grammar-based parser library. But now you need a way to tell the library, "OK, when you see my binary data, hand control over to my four lines of code, and then I'll hand control back". You might even say "You know, why don't you just let me hand you any function of the form stream -> (value, stream) and let me control the parsing for a while".
<p>
<b>Congratulations, you just invented monadic parsing</b>. The function you wrote, parse_sized_binary, of the form stream -> (bytes, stream), is a Parser Monad. Monadic parsing is just combining lots of these little parsers together. And so, it's incredibly easy to insert arbitrary parsing code, like parse_sized_binary, into any parser.
<p>
As a complete example, using Pysec, here's a full parser for a language just like <a href="http://en.wikipedia.org/wiki/Bencode">bencode</a> but with support for floats and unicode. Notice that our parse_size_binary is renamed to bencode_sized and is given a more functional definition.
<p>
<pre class="code">
from Pysec import Stub, until, lift, read, many_until, branch
bencode_any = Stub()
bencode_sized = until(":") >> lift(int) >> read
bencode_unsized = until("e")
bencode_list = many_until(bencode_any, "e")
bencode_any.set(branch({"s" : bencode_sized,
"u" : bencode_sized >> lift(lambda bytes : bytes.decode("utf8")),
"i" : bencode_unsized >> lift(int),
"f" : bencode_unsized >> lift(float),
"l" : bencode_list,
"d" : bencode_list >> lift(dict)}))
</pre>
I think this is sort of like <a href="http://en.wikipedia.org/wiki/Greenspun's_Tenth_Rule">Greenspun's Tenth Rule</a>. Any implementation of netstrings/sized-binary-data probably has an ad-hoc implementation of monadic parsing in it. Even my own non-monadic version of this parser (which I use for performance reasons) has an ad-hoc implementation of monadic parsing. I'm convinced that if you write a parser for a similar format, you'll implement/invent monadic parsing as well, even if you don't know it.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-3057940394561666040?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com5tag:blogger.com,1999:blog-1700157236206200597.post-25567463109265830472008-03-19T14:10:00.000-07:002008-10-14T19:57:58.801-07:00Why are my monads so slow?If you've read the last few posts, you know that I choose to use monadic parsing for some application code I am writing. At an abstract level, it turned out beautifully. I was able to write the parser very elgantly and compactly despite the serialization format having some unique requirements. But at a practical level, it failed pretty miserably. <b>It was horribly slow</b>. With a deadline looming, I had to abandon my monadic parser and go back to using an older serialization format. But not content to give up so easily, I decided to figure out <b>why are my monads so slow?</b> I hope that my investigation may be of help to you if you choose to use similar techniques in your code.
<p>
First, I tried running a profiler like <a href="http://docs.python.org/lib/module-profile.html">cProfile</a>. While such profilers normally work quite well, they don't work so well with lots of first-class functions. And since monadic code is full of first-class functions, the profiler didn't give much valuable information.
<p>
So, I had to do optimizing the "old fashioned" way. I wrote a benchmark and ran both of my serializers against it. To start, I had two data points: one in traditional imperitave code and one in fully monadic code, and this is what I got:
<p>
<pre>
imperative: 0.465 seconds
monadic: 3.893 seconds
</pre>
Monadic code was 8x slower! But why? To narrow it down, I added some more data points by implementing the serializer in gradually more monadic ways. For example, I wrote a serializer in a functional/immutable style that passes around a "stream" rather than a string, and then another that passed around "state" in a very monadic way, but withoug using the monad class that I used in Pysec. Then I got this:
<p>
<pre>
imperative: 0.465 seconds
functional with stream: 1.018 seconds
almost monadic with state: 2.450 seconds
monadic: 3.893 seconds
</pre>
This gave some answers. One answer it gave was that passing around an immutable "stream" object is twice as expensive as merely passing around a string. Also, passing around a state object on top of that is even more expensive. With these clues, I was able to optimize in certain ways and reduce it to this:
<p>
<pre>
imperative: 0.465 seconds
functional with stream: 1.018 seconds
almost monadic with state: 1.098 seconds
monadic: 1.283 seconds
</pre>
<p>
That's a nice 3x improvement. What things were I able to do? By far the biggest cost that I was able to eliminate was object creation. I flattened two layers of abstraction into one, and thus split the number of objects created in half. The second thing I did was I created a "read until" operation that could be used more efficiently in the "grammar" of the parser. This is a form of pushing performance-critical code down the layers of abstraction. Finally, I didn't used decorators, for some often-called functions.
<p>
In the end, <b>it looks like monadic code is about 2-3x slower in Python</b>. Almost all of that is actually the result of just doing things in a functional/immutable way. In particular, creating data structures appears to be the main bottleneck. In functional languagues, these types of operations are very common and optimized heavily, and so are typically very fast. It looks like it's not the same in Python. Just like you have to avoid recursion because there's no tail recursion, it looks like you have to avoid a functional/immutable coding style if you care about performance because object creation is so slow. On the other hand, if you don't mind the peformance hit, it makes the code much more elegant, just like recursion usually does.
<p>
One thing of interest is that there is essentially no performance penatly for using monads over using a functional/immutable code style. The 20% penatly seen between "almost monadic" and "monadic" is only because I'm wrapping the monad in a Parser class which allows nice operator overloading.
<p>
Here's a summary of what you can do to speed up any functional/immutable-style code, including monadic code when writing in Python:
<ul>
<li>Make object creation as fast as possible. Don't do anything fancy in __init__.</li>
<li>Use as few layers of abstraction as possible, especially when there is an object created in each layer.</li>
<li>Push common or expensive operations down the layers of abstaction, especially if it avoids creating objects.</li>
<li>Avoid using decorators for heavily used functions.</li>
<li>Don't use wrapper classes if you don't have to.</li>
</ul>
<p>
As a final thought, I'd like to mention that while there's currently a substantial performance penalty for using immutable data structures, that style is going to become increasingly important as we enter the many-core era. No matter what style of concurrency you like, immutable data is always easier to work with when concurrency is involved. Concurrency and mutable data are just a bad combo. I think that it's going to be very important for language designers to address this when working on the peformance of their languages. I certainly hope future versions Python are much faster with immutable data. If they are, then the peformance penatly of using Monads will almost disappear.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-2556746310926583047?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com6tag:blogger.com,1999:blog-1700157236206200597.post-3911760103285407252008-01-21T11:30:00.000-08:002008-10-14T20:22:38.629-07:00Garlic Programmers for Silver Code?<p>
I just read Larry O'Brien's article about <a href="http://www.knowing.net/PermaLink,guid,fde0f610-3773-47b8-9be6-d6e5a8a76858.aspx"> there being no silver programmers</a>. He makes the case that although there may be great variance in programmer productivity, "the significant thing is not that some professional programmers are awesome, it's that some professional programmers suck". I admire Larry O'Brien greatly, and I think there is much wisdom in what he says here.
<p>
But I think he's missing something.
<p>
I have noticed a trend in my own productivity: <b>my productivity varies greatly with the code base I'm working on</b>. This variance may be greater than that between individual programmers. While many are rightly concerned with individual and team productivity, no ever talks about code base productivity. If we do, where does it lead us?
<p>
A few years ago, I was working for a small medical software company. One project was a new venture with a new code base. Though new, the code was already difficult to work with. Its design forced the developer to duplicate work. In my first meeting about the project, I was told we had to do lots of "fat fingering". I didn't even know what that meant. It turned out to mean typing almost the exact same thing over and over. I was horrified and ultimately wrote a code generation tool to automate the work. Though creating and maintaining it was extra work in itself, without it, <b>doing many common tasks in this code easily took 3x longer than necessary</b>.
<p>
After that project was "shelved", I was moved to the company's main product, which was older and had a much more "mature" code base. Actually, half was new-style (similar to what I had been working on) and half was old-style. The old-style code was even worse to work with. It was so bad that I was almost giddy whenever I could work on the new-style code. I'm sure you know what kind of code I'm talking about. <b>Doing anything in this code easily took 3x even longer</b>.
<p>
Combined, that's 9x slower. That seems too high. Is it even possible? How can a code base reduce productivity so much? Here are a few possibilities I thought of; I'm sure you can think of more:
<ul>
<li>Bad code leads to more code, and more code is slower to work with</li>
<li>Bad code is hard to understand</li>
<li>Bad code is dangerous to change</li>
<li>Bad code leads duplication of effort</li>
<li>Bad code leads to a slow change-compile-test cycle, leading to wasted time and loss of focus</li>
<li>Bad code from third party libraries can drive you crazy by crashing, malfunctioning randomly, or having little documentation</li>
<li>Bad code saps energy, interest, and morale</li>
</ul>
<p>
<b>A bad code base makes any programmer less productive.</b> Have you ever noticed that when you work from scratch with no code base to weigh you down, you can be much more productive? Imagine that as your baseline productivity. Now think of how long it takes you to accomplish something similar on a project that you're working on right now. How does it compare? I bet you're slower on the existing project.
<p>
But wait just a minute. Imagine if my list of the effects of bad code were reversed. What if, instead, it read:
<ul>
<li>Good code leads to less code, and less code is easier to work with</li>
<li>Good code is easy to understand</li>
<li>Good code is safe to change</li>
<li>Good code avoids duplication of effort</li>
<li>Good code leads to a quick change-compile-test cycle</li>
<li>Good code uses libraries which remove the need for "dirty work" and let you focus on the important problems</li>
<li>Good code is fun to work with!</li>
</ul>
<p>
<b>A good code base makes any programmer more productive.</b> Have you ever worked on a project that was so wonderful that you could crank out magnificent results almost effortlessly? Perhaps <a href="http://www.gnu.org/software/emacs/">it has a great DSL embedded in it</a> or has <a href="http://www.loudthinking.com/about.html">powerful infrastructure</a>.
<p>
Even with conservative estimates, <b>if bad code slows us down 5x and good code speeds us up just 2x faster, that's easily a 10x difference. </b>
<p>
Maybe silver programmers don't exist. But is there "silver code"? What if there are programmers who are more adept at writing silver code? If they can make a code base that's just 2x easier to work with long-term, then that makes everyone that works on that code 2x more productive, possibly forever. If such programmers exist, then they must be extremely valuable, not so much because of their own productivity, but because of the effect they will have on the productivity of others in the future. Perhaps rather than "silver bullet" programmers that kill werewolves, these are just "garlic programmers" that ward of vampires.
<p>
Given that productivity can vary so much with quality of existing code, <b>perhaps its worth looking for "garlic programmers" who will write code that will keep the rest us of productive long-term.</b> I leave it as an exercise for the reader to figure out how to measure a programmer's garlic-ness :).<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-391176010328540725?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com7tag:blogger.com,1999:blog-1700157236206200597.post-33503541366842393292008-01-09T10:34:00.000-08:002008-12-09T09:59:59.612-08:00Monads in Ruby (with nice syntax!)<p>
My last <a href="http://www.valuedlessons.com/2008/01/monads-in-python-with-nice-syntax.html">article</a> was about how you can do monads in python with nice syntax. Now I'd like to present nice monad syntax in Ruby. I'm not explaining what monads are or how you can use them. For now, I'm expecting you to be familiar with monads. If you aren't, go read a <a href="http://sigfpe.blogspot.com/2006/08/you-could-have-invented-monads-and.html">nice article</a> on them.
<p>
Imagine you have a Monad base class like this:
<pre class="code">
class Monad
def bind(func)
raise "not implemented"
end
def self.unit(val)
raise "not implemented"
end
# bind to block
def bindb(&func)
bind(func)
end
end
</pre>
<p>
As an aside, Please excuse there being a "bind" and a "bindb". It's silly that Ruby differentiates between a block and Proc like that. I
also think it's silly that I have to keep typing "lambda {|val| func(val)}" to turn a method into a Proc and "proc.call(val)" to call a Proc. In python, methods, functions, and lambdas are all the same thing, and it's a lot easier to code with. In this sense, python is way better. But Ruby has really slick syntax for passing in the block, and python's lambda restrictions are annoying. Why can't we have the best of both? I guess then I'd need Scala or Perl6. End aside.
<p>
Let's implement the Either monad, which I call it Failable:
<pre class="code">
class Failable < Monad
def initialize(value, success)
@value = value
@success = success
end
def bind(bindee)
if @success
bindee.call(@value)
else
self
end
end
def self.unit(val)
self.new(val, true)
end
def to_s
if @success
"Success(#{@value})"
else
"Failure(#{@value})"
end
end
end
def success(val)
Failable.new(val, true)
end
def failure(val)
Failable.new(val, false)
end
</pre>
<p>
Now we can write some code that safely handles divide by zero without using exceptions (actually, exceptions essentiatlly are the Either monad, but never mind that for now):
<pre class="code">
def fdiv(a, b)
if b == 0
failure("cannot divide by zero")
else
success(a / b)
end
end
def fdiv_with_binding(first_divisor)
fdiv(2.0, first_divisor).bindb do |val1|
fdiv(3.0, 1.0) .bindb do |val2|
fdiv(val1, val2) .bindb do |val3|
success(val3)
end
end
end
end
puts fdiv_with_binding(1.0)
puts fdiv_with_binding(0.0)
</pre>
<p>
Which prints:
<pre class="code">
Success(0.666666666666667)
Failure(cannot divide by zero)
</pre>
<p>
But it's not very pretty. Luckily, ruby has callcc, which makes it very easy to define a function which I'll call "rbind" which can do this:
<pre class="code">
def fdiv_with_rbinding(first_divisor)
with_monad Failable do
val1 = rbind fdiv(2.0, first_divisor)
val2 = rbind fdiv(3.0, 1.0)
val3 = rbind fdiv(val1, val2)
val3
end
end
def with_monad(monad_class, & block)
begin
val = block.call()
if val.class == monad_class
val
else
monad_class.unit(val)
end
rescue MonadEscape => escape
escape.monad
end
end
def rbind(monad)
begin
checked_callcc {|cc| monad.bind(cc)}
rescue ContinuationUnused => unused
raise MonadEscape.new(unused.result)
end
end
class MonadEscape < RuntimeError
attr :monad
def initialize(monad)
@monad = monad
end
end
def checked_callcc(&with_cc)
callcc do |cont|
val = with_cc.call(lambda {|val| cont.call(val)})
raise ContinuationUnused.new(val) if cont
val
end
end
class ContinuationUnused < RuntimeError
attr :result
def initialize(result)
@result = result
end
end
</pre>
<p>
Ruby's block syntax makes it very nice to say "with_monad Monad do", which I like. But I don't really like seeing "= rbind". I'd really like it if we could override "<<" and read that as "=<<", but I think we're stuck with unary operators ("+", "-", "~", all of which look bad here) or words. At least we can choose our name, unlike in python where we have to use "yield". Does anyone have any idea for a better name?
<p>
Maybe "xx" would work:
<pre class="code">
def fdiv_with_rbinding(first_divisor)
with_monad Failable do
val1 =xx fdiv(2.0, first_divisor)
val2 =xx fdiv(3.0, 1.0)
val3 =xx fdiv(val1, val2)
val3
end
end
</pre>
<p>
So there you have it. You can have nice monad syntax in Ruby using callcc. Unfortunately, I've read that not all implementations of Ruby have continuatoins, which means JRuby and IronRuby may be left out in the cold. Keep in mind that Ruby's "yield" syntax is NOT a true generator, which means that we can't use the same trick that we used in python. It's callcc or nothing.
<p>
Here's the complete code in case you want to cut and paste it:
<pre class="code">
############# Monad Base ##############
class Monad
def bind(func)
raise "not implemented"
end
def self.unit(val)
raise "not implemented"
end
# bind to block
def bindb(&func)
bind(func)
end
end
def with_monad(monad_class, &block)
begin
val = block.call()
if val.class == monad_class
val
else
monad_class.unit(val)
end
rescue MonadEscape => escape
escape.monad
end
end
# "reverse" bind, or bind to the return value, or bind to the continuation
def rbind(monad)
begin
mycallcc {|cc| monad.bind(cc)}
rescue ContinuationUnused => unused
raise MonadEscape.new(unused.result)
end
end
class MonadEscape < RuntimeError
attr :monad
def initialize(monad)
@monad = monad
end
end
def mycallcc(&with_cc)
used = false
val = callcc do |cc|
fake_cc = lambda do |val|
used = true
cc.call(val)
end
with_cc.call(fake_cc)
end
raise ContinuationUnused.new(val) unless used
val
end
class ContinuationUnused < RuntimeError
attr :result
def initialize(result)
@result = result
end
end
############# Failable Monad ##################
class Failable < Monad
def initialize(value, success)
@value = value
@success = success
end
def bind(bindee)
if @success
bindee.call(@value)
else
self
end
end
def self.unit(val)
self.new(val, true)
end
def to_s
if @success
"Success(#{@value})"
else
"Failure(#{@value})"
end
end
end
def success(val)
Failable.new(val, true)
end
def failure(val)
Failable.new(val, false)
end
######## Failable Monad Example ##########
def fdiv(a, b)
if b == 0
failure("cannot divide by zero")
else
success(a / b)
end
end
def with_failable_binding(first_divisor)
fdiv(2.0, first_divisor).bindb { |val1|
fdiv(3.0, 1.0) .bindb { |val2|
fdiv(val1, val2) .bindb { |val3|
success(val3)
}
}
}
end
def xx(monad)
rbind(monad)
end
def with_failable_rbinding(first_divisor)
with_monad Failable do
val1 =xx fdiv(2.0, first_divisor)
val2 =xx fdiv(3.0, 1.0)
val3 =xx fdiv(val1, val2)
val3
end
end
puts with_failable_binding(1.0)
puts with_failable_binding(0.0)
puts with_failable_rbinding(1.0)
puts with_failable_rbinding(0.0)
</pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-3350354136684239329?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com9tag:blogger.com,1999:blog-1700157236206200597.post-67994722606644094622008-01-07T11:21:00.000-08:002008-12-09T10:00:50.432-08:00Monads in Python (with nice syntax!)<p>
<span style="font-weight: bold;">Update:</span> I've been informed that I didn't clearly explain what monads are and what the code examples are supposed to do. Sorry. I guess I assumed too much preexposure to monads. If you're no familiar with them, there are already so many tutorials on monads that I'll simply direct you to two of my favorites: <a href="http://www.loria.fr/%7Ekow/monads/index.html">spacesuits</a> and <a href="http://en.wikipedia.org/wiki/Monads_in_functional_programming">wikipedia</a>. I've also added more explanations of what the code snippets are supposed to do. I hope that helps.
<p>
<span style="font-weight: bold;">Update 2: </span>By popular demand, I'm including some code that you can actually run :). See the bottom of the article.
<p>
Recently, I used monads in production code for a soon-to-be-publically-released application. Although many think they are strange, estoric, and perhaps useless, monads were the best way to solve the problem. My code is not in Haskell; it's in Python. I'm not doing anything wierd like IO in a purely functional way; I'm just parsing a file.
<p>
The crazy, unexpected conclusion I came to is that <span style="font-weight: bold;">you can and should use monads in your code in almost any programming language. </span>There are two parts to this: "can" and "should". I think I'll save "should" for another article. Right now, I'm excited to show you how you can.
<p>
As a preview for "should", please consider that you may be using monads already without knowing it. <a href="http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fresearch.microsoft.com%2F%7Eemeijer%2FPapers%2FLINQ20.pdf&ei=RYiCR8DJO5HEgwPei8GCBw&usg=AFQjCNE8AG_VXzu_tHYeXDy7Jw85eWUcig&sig2=SLBcZbWmgDnnAwaEygDrCA">LINQ in C# is a monad (pdf)</a>, so if you've ever used it, you've used a monad. The C# guys used monads for queries for the same reason I'm using for them for parsing: they're the right tool for the job. But unlike them, I can't change the syntax of the programming language.
<p>
The biggest challange with using monads in a "normal" programming language is that monads involve lots of closures. This is exactly the same problem you run into with <a href="http://blogs.msdn.com/wesdyer/archive/2007/12/22/continuation-passing-style.aspx">CPS</a>, which isn't surprising since a monad's "bind" operator is CPS and since continuations can be implemented with monads. By the way, if your programming laungage doesn't have closures (meaning you are stuck programming in C, C++, or Java), then monads are probably out of the question. Assuming you have closures and use them with monads directly, you end up with code like the following. It's python using the <a href="http://en.wikipedia.org/wiki/Monads_in_functional_programming#Maybe_monad">Maybe monad</a> to handle divide by zero errors. I'm using ">>" (__rshift__) overloaded to mean "bind".
<pre class="code">
def mdiv(a, b):
if b == 0:
return Nothing
else:
return Something(a / b)
def with_maybe():
return \
mdiv(2.0, 2.0) >> (lambda val1 :
mdiv(3.0, 0.0) >> (lambda val2 :
mdiv(val1, val2) >> (lambda val3 :
Something(val3))))
</pre>
<p>
That's not very pretty. We need a way to clean that up. How we can do so depends on the programming language. Haskell has "do" syntax built-in, which makes monadic code look like an impertive language or even a list comprehension. Ruby and Scheme have call/cc which makes it trivial to wrap a bind call with a continuation to make any monadic code look "normal". C# has LINQ, which is practically Haskell's do notation with funny names.
<p>
But I'm using python. What does python have? Luckily for me, in python 2.5 they added bidirectional generators, and I found a way to use them to make something like "do" notation. Now we can write the above code like this (@do and mreturn are defined later):
<pre class="code">
@do(Maybe)
def with_maybe(first_divisor):
val1 = yield mdiv(2.0, 2.0)
val2 = yield mdiv(3.0, 0.0)
val3 = yield mdiv(val1, val2)
mreturn(val3)
</pre>
<p>
I even copied the names "do" and "return" from Haskell, although I had to spell "return" as "mreturn". All you really have to remember is that "yield" means "bind" and that the end result is a monad. There are limitations to this technique, but it's working very well for me so far. I've implemented the Maybe monad, the Error monad, the StateChanger monad, and the Continuation monad (which will require another article to explain). I particularly like the continuation monad because it allows me to write callcc, which lets me do threadless actors (message passing) in python:
<pre class="code">
from collections import deque
class Mailbox:
def __init__(self):
self.messages = deque()
self.handlers = deque()
def send(self, message):
if self.handlers:
handler = self.handlers.popleft()
handler(message)()
else:
self.messages.append(message)
def receive(self):
return callcc(self.react)
@do(ContinuationMonad)
def react(self, handler):
if self.messages:
message = self.messages.popleft()
yield handler(message)
else:
self.handlers.append(handler)
done(ContinuationMonad.zero())
@do(ContinuationMonad)
def insert(mb, values):
for val in values:
mb.send(val)
@do(ContinuationMonad)
def multiply(mbin, mbout, factor):
while True:
val = (yield mbin.receive())
mbout.send(val * factor)
@do(ContinuationMonad)
def print_all(mb):
while True:
print (yield mb.receive())
original = Mailbox()
multiplied = Mailbox()
print_all(multiplied)()
multiply(original, multiplied, 2)()
insert(original, [1, 2, 3])()
</pre>
<p>
A few months ago, I wrote a similar implementation of threadless actors in python. It used generators in a similar way, but it was 10 times as much code. I was shocked at how short this implementation ended up being. You might think that it's because the continuation monad implementation is big. Nope. It's just as short (Monad defined later, and Record defined <a href="http://www.valuedlessons.com/2007/12/immutable-data-in-python-record-or.html">here</a>):
<pre class="code">
class ContinuationMonad(Record("run"), Monad):
def __call__(self, cont = lambda a : a):
return self.run(cont)
def bind(self, bindee):
return ContinuationMonad(lambda cont : self.run(lambda val : bindee(val).run(cont)))
@classmethod
def unit(cls, val):
return cls(lambda cont : cont(val))
@classmethod
def zero(cls):
return cls(lambda cont : None)
def callcc(usecc):
return ContinuationMonad(lambda cont : usecc(lambda val : ContinuationMonad(lambda _ : cont(val))).run(cont))
</pre>So, you can use monads with elegant syntax in any language that has closures and any of the following:
<ul>
<li>do syntax (Haskell, C#)</li>
<li>call/cc (Scheme, Ruby)</li>
<li>bidirectional generators (Python 2.5, a future JavaScript?)</li>
<li>coroutines (Lua, Io)</li>
</ul>
<p>
The only think I haven't shown you is the implementation of Monad, @do, mreturn, and done. It has a few nasty details related to using generators and decorators in python, but here's the gist of it:
<pre class="code">
class Monad:
def bind(self, func):
raise NotImplementedError
def __rshift__(self, bindee):
return self.bind(bindee)
def __add__(self, bindee_without_arg):
return self.bind(lambda _ : bindee_without_arg())
@decorator_with_args
def do(func, func_args, func_kargs, Monad):
itr = func(*func_args, **func_kargs)
def send(val):
try:
monad = itr.send(val)
return monad.bind(send)
except MonadReturn, ret:
return Monad.unit(ret.value)
except Done, done:
return done.monad
return send(None)
def mreturn(val):
raise MonadReturn(val)
def done(val):
raise Done(val)
</pre>
<p>That's it. If you've made it all the way to the end of this long article, I hope you've found inspiration for using monads in your own applications, especially if you are coding in python. If so, here's some code that you can run:
<pre class="code">
import types
###### Base Monad and @do syntax#########
class Monad:
def bind(self, func):
raise NotImplementedError
def __rshift__(self, bindee):
return self.bind(bindee)
def __add__(self, bindee_without_arg):
return self.bind(lambda _ : bindee_without_arg())
def make_decorator(func, *dec_args):
def decorator(undecorated):
def decorated(*args, **kargs):
return func(undecorated, args, kargs, *dec_args)
decorated.__name__ = undecorated.__name__
return decorated
decorator.__name__ = func.__name__
return decorator
def make_decorator_with_args(func):
def decorator_with_args(*dec_args):
return make_decorator(func, *dec_args)
return decorator_with_args
decorator = make_decorator
decorator_with_args = make_decorator_with_args
@decorator_with_args
def do(func, func_args, func_kargs, Monad):
@handle_monadic_throws(Monad)
def run_maybe_iterator():
itr = func(*func_args, **func_kargs)
if isinstance(itr, types.GeneratorType):
@handle_monadic_throws(Monad)
def send(val):
try:
# here's the real magic
monad = itr.send(val)
return monad.bind(send)
except StopIteration:
return Monad.unit(None)
return send(None)
else:
#not really a generator
if itr is None:
return Monad.unit(None)
else:
return itr
return run_maybe_iterator()
@decorator_with_args
def handle_monadic_throws(func, func_args, func_kargs, Monad):
try:
return func(*func_args, **func_kargs)
except MonadReturn, ret:
return Monad.unit(ret.value)
except Done, done:
assert isinstance(done.monad, Monad)
return done.monad
class MonadReturn(Exception):
def __init__(self, value):
self.value = value
Exception.__init__(self, value)
class Done(Exception):
def __init__(self, monad):
self.monad = monad
Exception.__init__(self, monad)
def mreturn(val):
raise MonadReturn(val)
def done(val):
raise Done(val)
def fid(val):
return val
##### Failable Monad ######
class Failable(Monad):
def __init__(self, value, success):
self.value = value
self.success = success
def __repr__(self):
if self.success:
return "Success(%r)" % (self.value,)
else:
return "Failure(%r)" % (self.value,)
def bind(self, bindee):
if self.success:
return bindee(self.value)
else:
return self
@classmethod
def unit(cls, val):
return cls(val, True)
class Success(Failable):
def __init__(self, value):
Failable.__init__(self, value, True)
class Failure(Failable):
def __init__(self, value):
Failable.__init__(self, value, False)
def failable_monad_examle():
def fdiv(a, b):
if b == 0:
return Failure("cannot divide by zero")
else:
return Success(a / b)
@do(Failable)
def with_failable(first_divisor):
val1 = yield fdiv(2.0, first_divisor)
val2 = yield fdiv(3.0, 1.0)
val3 = yield fdiv(val1, val2)
mreturn(val3)
print with_failable(0.0)
print with_failable(1.0)
###### StateChanger Monad #########
class StateChanger(Monad):
def __init__(self, run):
self.run = run
def bind(self, bindee):
run0 = self.run
def run1(state0):
(result, state1) = run0(state0)
return bindee(result).run(state1)
return StateChanger(run1)
@classmethod
def unit(cls, val):
return cls(lambda state : (val, state))
def get_state(view = fid):
return change_state(fid, view)
def change_state(changer, view = fid):
def make_new_state(old_state):
new_state = changer(old_state)
viewed_state = view(old_state)
return (viewed_state, new_state)
return StateChanger(make_new_state)
def state_changer_monad_example():
@do(StateChanger)
def dict_state_copy(key1, key2):
val = yield dict_state_get(key1)
yield dict_state_set(key2, val)
mreturn(val)
@do(StateChanger)
def dict_state_get(key, default = None):
dct = yield get_state()
val = dct.get(key, default)
mreturn(val)
@do(StateChanger)
def dict_state_set(key, val):
def dict_set(dct, key, val):
dct[key] = val
return dct
new_state = yield change_state(lambda dct: dict_set(dct, key, val))
mreturn(val)
@do(StateChanger)
def with_dict_state():
val2 = yield dict_state_set("a", 2)
yield dict_state_copy("a", "b")
state = yield get_state()
mreturn(val2)
print with_dict_state().run({}) # (2, {"a" : 2, "b" : 2})
###### Continuation Monad #########
class ContinuationMonad(Monad):
def __init__(self, run):
self.run = run
def __call__(self, cont = fid):
return self.run(cont)
def bind(self, bindee):
return ContinuationMonad(lambda cont : self.run(lambda val : bindee(val).run(cont)))
@classmethod
def unit(cls, val):
return cls(lambda cont : cont(val))
@classmethod
def zero(cls):
return cls(lambda cont : None)
def callcc(usecc):
return ContinuationMonad(lambda cont : usecc(lambda val : ContinuationMonad(lambda _ : cont(val))).run(cont))
def continuation_monad_example():
from collections import deque
class Mailbox:
def __init__(self):
self.messages = deque()
self.handlers = deque()
def send(self, message):
if self.handlers:
handler = self.handlers.popleft()
handler(message)()
else:
self.messages.append(message)
def receive(self):
return callcc(self.react)
@do(ContinuationMonad)
def react(self, handler):
if self.messages:
message = self.messages.popleft()
yield handler(message)
else:
self.handlers.append(handler)
done(ContinuationMonad.zero())
@do(ContinuationMonad)
def insert(mb, values):
for val in values:
mb.send(val)
@do(ContinuationMonad)
def multiply(mbin, mbout, factor):
while True:
val = (yield mbin.receive())
mbout.send(val * factor)
@do(ContinuationMonad)
def print_all(mb):
while True:
print (yield mb.receive())
original = Mailbox()
multiplied = Mailbox()
print_all(multiplied)()
multiply(original, multiplied, 2)()
insert(original, [1, 2, 3])()
if __name__ == "__main__":
failable_monad_examle()
state_changer_monad_example()
continuation_monad_example()
</pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-6799472260664409462?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com13tag:blogger.com,1999:blog-1700157236206200597.post-37590622156809561562007-12-31T14:29:00.000-08:002008-12-09T10:01:33.147-08:00Immutable Data in Python (Record or Named Tuple)<p>
A valued lesson that I have learned the hard way is that mutable data can get nasty, and that it's really nice to have immutable data structures. I've been writing a lot of python recently, and after a while I realized that I was writing a lof of this:
<pre class="code">
class SomeDataStructure:
def __init__(self, arg1, arg2):
self.prop1 = arg1
self.prop2 = arg2
</pre>
<p>
Not only was this repetitive, but I found that little mutability bugs were cropping up on me. It's just too easy to trip over them when the default is mutability, as it is in python. In fact, immutable data structures aren't even a built-in option in python.
<p>
<span style="font-weight: bold;">Finally the pain of using mutable data structures grew too large. </span><span>I looked for a way and couldn't find anything, so I created my own. </span>It supports getters, setters, inheritance, default values, altering several values at once, and I think the syntax is nice. I've been using it for everything the last few months and it's been great. It's helped with code repetition, concurrency, and serialization.
<p>
I hope you can learn from my valued lesson and not repeat my mistakes. Here's how you use it.
<pre class="code">
class Person(Record("name", "age")):
pass
class OldPerson(Person):
@classmethod
def prepare(cls, name, age = None):
return (name, age)
peter = Person("Peter", 26)
wes = Person("Wes", 28)
grandpa = OldPerson("Bubba")
wes2 = wes.setAge(29)
wes3 = wes.alter(name = "Grandpa", age = 57)
print peter, grandpa, wes.name, wes2.age
</pre>
<p>
Here is the code. This is actually a simplified version of the original that I rewrote on my own time. I use a more comlete/complex version in the code I'm writing for my employer.
<pre class="code">
def Record(*props):
class cls(RecordBase):
pass
cls.setProps(props)
return cls
class RecordBase(tuple):
PROPS = ()
def __new__(cls, *values):
if cls.prepare != RecordBase.prepare:
values = cls.prepare(*values)
return cls.fromValues(values)
@classmethod
def fromValues(cls, values):
return tuple.__new__(cls, values)
def __repr__(self):
return self.__class__.__name__ + tuple.__repr__(self)
## overridable
@classmethod
def prepare(cls, *args):
return args
## setting up getters and setters
@classmethod
def setProps(cls, props):
for index, prop in enumerate(props):
cls.setProp(index, prop)
cls.PROPS = props
@classmethod
def setProp(cls, index, prop):
getter_name = prop
setter_name = "set" + prop[0].upper() + prop[1:]
setattr(cls, getter_name, cls.makeGetter(index, prop))
setattr(cls, setter_name, cls.makeSetter(index, prop))
@classmethod
def makeGetter(cls, index, prop):
return property(fget = lambda self : self[index])
@classmethod
def makeSetter(cls, index, prop):
def setter(self, value):
values = (value if current_index == index
else current_value
for current_index, current_value
in enumerate(self))
return self.fromValues(values)
return setter
</pre>
<p>
For comparison, I've seen similar ideas at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261 and http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303439 but I don't like their implementation or design as much, especially since they lack proper setters.
<p>
I've also read that future versions of Python will have NamedTuples, which is something I wish it had already.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1700157236206200597-3759062215680956156?l=www.valuedlessons.com'/></div>Peter Thatcherhttp://www.blogger.com/profile/01092342988993218446noreply@blogger.com4