I've learned. I'll share.

October 14, 2008

Python Memory Usage: What values are taking up so much memory?

Python seems to use a lot of memory. So what exactly is the overhead of each type of value? Short answer:

int 24
float 24
tuple 63
list 101
dict 298
old-style class 345
new-style class 336
subclassed tuple 79
Record 79
Record with old class mixin 79
Record with new class mixin 79
Measured in bytes using Python 2.5 in 64-bit Ubuntu Linux

I measured these by running a simple program that loaded up 1,000,000 values in a list and then did time.sleep(1000). I ran that for different value types and then ran "top" to see how much memory was being used. I took that value, substracted the memory usage for a list of all the same value (14 bytes each), subtracted the value of a child value (usually an int, 24 bytes each), and then divided by 1,000,000. I'll include the code I ran at the end if you want to cut and paste. So what lessons do we learn from this?

  • Python objects are very expensive at over 300 bytes each.
  • Tuples have 1/5 as much overhead.
  • Records are almost as good as tuples, even when a mixin is added.

So, if you want to have lots of values in memory without using lots of memory, use Record.

If you want to run the test for yourself, here's the code. Just comment out the "make_val" that you want to test.

import time
from Record import Record

class TupleClass(tuple):
    pass

class RecordClass(Record("val")):
    pass

class OldClass:
    def __init__(self, val):
        self.val = val

    def method(self):
        pass

class NewClass(object):
    def __init__(self, val):
        self.val = val

    def method(self):
        pass

class RecordWithOldClass(Record("val"), OldClass):
    pass

class RecordWithNewClass(Record("val"), NewClass):
    pass

make_val = lambda i : 1 #nothing (base overhead)
#make_val = lambda i : i    
#make_val = float           
#make_val = lambda i : (i,) 
#make_val = lambda i : [i]  
#make_val = lambda i : {i:i}
#make_val = TupleClass
#make_val = RecordClass
#make_val = OldClass
#make_val = NewClass
#make_val = RecordWithOldClass
#make_val = RecordWithNewClass

count = 1000000
lst = [make_val(i) for i in xrange(count)]
time.sleep(100000)

5 comments:

  1. Fun - I was just playing with the same sort of thing and came to a similar conclusion. One item that's useful is a slotted new style object:

    class Foo(object):
      __slots__ = []

    These come out to be (on average) slightly greater than 16 bytes / each.

    If you add values:

    class Foo(object):
      __slots__ = ['a','b']
      def __init__(self):
        self.a = None
        self.b = None

    They come out to be ~65 bytes each.

    ReplyDelete
  2. Wow Gary! I tried that too and it came to about a quarter the size of virtual memory. This is way over my head but clearly indicates that __slots__ might be the way forward when memory matters.

    ReplyDelete
  3. It would be interesting to see the exact same test run with a new-style class with __slots__ to eliminate the __dict__ attribute. According to your chart, the instance takes up 345 bytes, but the __dict__ should be taking up 298 of that; that that is eliminated when you use __slots__.

    ReplyDelete
  4. Oops, I should have read the preceding comments before posting mine! Sorry.

    ReplyDelete
  5. I am getting wildly bigger values. Try this:

    #!/bin/sh
    (
    exec >test.py
    echo "\
    #!/usr/bin/python
    import time"
    i=300000
    while test $i != 0; do
    echo "i$i=$i"
    : $((i--))
    done
    echo "time.sleep(3)"
    )

    chmod 755 test.py
    echo "Before: `grep ^MemFree: /proc/meminfo`"
    ./test.py &
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    sleep 1
    echo "During: `grep ^MemFree: /proc/meminfo`"
    wait
    echo "After: `grep ^MemFree: /proc/meminfo`"

    # i=100000, output on my machine (x86-64):
    #Before: MemFree: 308688 kB
    #During: MemFree: 252144 kB
    #During: MemFree: 227460 kB
    #During: MemFree: 227468 kB
    #During: MemFree: 227460 kB
    #After: MemFree: 308200 kB
    # Thus, (308200-227460)/100 = 807 bytes per each int variable

    # i=300000
    #Before: MemFree: 1007572 kB
    #During: MemFree: 695952 kB
    #During: MemFree: 421548 kB
    #During: MemFree: 851084 kB
    #During: MemFree: 835708 kB
    #During: MemFree: 808800 kB
    #During: MemFree: 795656 kB
    #During: MemFree: 795656 kB
    #During: MemFree: 795648 kB
    #During: MemFree: 795656 kB
    #After: MemFree: 1007448 kB
    # Thus, (1007448-795648)/300 = 706 bytes per each int variable

    ReplyDelete

Blog Archive

Google Analytics