I've learned. I'll share.

October 14, 2008

The manycore era is already upon us (and python isn't keeping up)

For many months, perhaps years, we as programmers have realized our trouble ahead coming with the "manycore era". The attitude seems to be "someday we're really going to have to figure this concurrency thing out". But this month I have been hit with a terrible realization: the manycore era is already upon us. Technically, it's the "multicore", not "manycore", but I think that's a small distinction.

Why the sudden epiphany? A few weeks ago, I built my self a new computer and put in an Intel quad-core CPU. My computer is much faster now, and I'm very happy with it, but after a few weeks I have realized something: I never use more than 1/4 of my CPU. I have a little graph on my screen at all times showing me the CPU usage. It goes up to 25% all of the time, but I have never seen it go higher. Ever.

On one hand this is a good thing. The new computer is so fast that everything I do is instant, and only uses a tiny bit of power. If it isn't instant, it's usually disk or network bound, not CPU bound.

On the other hand, even my own software isn't using more than 25%, and it is CPU bound at times. I'd like to fix it, but it's written in python, and the short version of a long, boring story is that python has a thing called The GIL which makes it so python as currently implemented cannot use more than 25% of the CPU except under very rare circumstances.

It seems that programming languages takes a long time to be adopted, but I think concurrency is a big enough rule changer to shake up which programming languages are dominant. In my specific case, if python doesn't fix it's concurrency problems soon, I'm going to have to stop considering it because I'll never be able to get it to use more than 1 little piece of the CPU. Right now, that's 25%, but in a few years, it will be only 3%. Either python is going to have to change, or I'm going to have to change programming languages (or use something like Jython or IronPython, I suppose).

A few weeks ago, sitting behind my single core computer, I was in the "someday, we'll have to tackle concurrency" camp. Now, sitting in front of my quad core machine, never using more than 25% of its power, everything has changed. Concurrency is no longer a question of if or when, it's here right now. If you don't believe me, get yourself a quad-core machine and watch the CPU usage graph. I think you'll be surprised.

How to DTrace Python in OSX

DTrace is an incredible tool. It basically lets you do profiling of a live application with no performance penatly. I'm writing a Python that needed some profiling, and I found the "normal" techniques like the profile/cProfile module very lacking. Luckily, Mac OSX comes with DTrace and it even works with Python. The only snag is that it's hard to find how to use the darn thing. I finally figured it out, so I figured it pass on the knowledge.

So, here's how you use dtrace on your python application in Mac OSX:

  1. Get DTraceToolkit.
  2. Edit Python/py_cputime.d by replacing "function-entry" with "entry" and "function-return" with "exit".
  3. Call "sudo dtrace -s Python/py_cputime.d"
  4. Let it sit there a while and hit ctrl-c.
  5. Enjoy the results

I can only assume you have to edit the file because of some difference between Solaris and OSX. You can try files other than py_cputime.d, but you might have to edit them too. Not all of them work, but most do.

The last thing to know is that you have to use the python that comes with OSX. A custom-built python doesn't seem to work.

Hope that helps!

Python Memory Usage: What values are taking up so much memory?

Python seems to use a lot of memory. So what exactly is the overhead of each type of value? Short answer:

int 24
float 24
tuple 63
list 101
dict 298
old-style class 345
new-style class 336
subclassed tuple 79
Record 79
Record with old class mixin 79
Record with new class mixin 79
Measured in bytes using Python 2.5 in 64-bit Ubuntu Linux

I measured these by running a simple program that loaded up 1,000,000 values in a list and then did time.sleep(1000). I ran that for different value types and then ran "top" to see how much memory was being used. I took that value, substracted the memory usage for a list of all the same value (14 bytes each), subtracted the value of a child value (usually an int, 24 bytes each), and then divided by 1,000,000. I'll include the code I ran at the end if you want to cut and paste. So what lessons do we learn from this?

  • Python objects are very expensive at over 300 bytes each.
  • Tuples have 1/5 as much overhead.
  • Records are almost as good as tuples, even when a mixin is added.

So, if you want to have lots of values in memory without using lots of memory, use Record.

If you want to run the test for yourself, here's the code. Just comment out the "make_val" that you want to test.

import time
from Record import Record

class TupleClass(tuple):

class RecordClass(Record("val")):

class OldClass:
    def __init__(self, val):
        self.val = val

    def method(self):

class NewClass(object):
    def __init__(self, val):
        self.val = val

    def method(self):

class RecordWithOldClass(Record("val"), OldClass):

class RecordWithNewClass(Record("val"), NewClass):

make_val = lambda i : 1 #nothing (base overhead)
#make_val = lambda i : i    
#make_val = float           
#make_val = lambda i : (i,) 
#make_val = lambda i : [i]  
#make_val = lambda i : {i:i}
#make_val = TupleClass
#make_val = RecordClass
#make_val = OldClass
#make_val = NewClass
#make_val = RecordWithOldClass
#make_val = RecordWithNewClass

count = 1000000
lst = [make_val(i) for i in xrange(count)]

Blog Archive

Google Analytics