Purpose
A slightly uncommon bit-twiddling operation is "popcount", which counts the number high bits. While uncommon, it's crucial for implementing Array Mapped Tries, which is a way to implement immutable maps and vectors. I'm trying to implement immutable maps and vectors, so I needed to implement popcount.I found A TON of ways, but had no idea which was fastest. So, I implemented them all and tested them. The short answer is to do this:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable POPCOUNT_TABLE16 = [0] * 2**16 for index in xrange(len(POPCOUNT_TABLE16)): POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1] def popcount32_table16(v): return (POPCOUNT_TABLE16[ v & 0xffff] + POPCOUNT_TABLE16[(v >> 16) & 0xffff])And here's the long answer:
Results
I ran popcount on 30,000 random ints between 0 and 2
Name | Time | Max Bits | URL |
c_bagwell | 0.0030 | 32 | http://lamp.epfl.ch/papers/idealhashtrees.pdf |
c_bsdmacro | 0.0030 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
c_parallel | 0.0031 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel |
c_ops64 | 0.0036 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64 |
table16 | 0.0098 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
c_table8 | 0.0110 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive |
table+kern | 0.0132 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
table8 | 0.0163 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
bsdmacro | 0.0190 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
parallel | 0.0199 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
ops64 | 0.0207 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64 |
bagwell | 0.0242 | 32 | http://lamp.epfl.ch/papers/idealhashtrees.pdf |
freebsd | 0.0257 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
timpeters3 | 0.0277 | 32 | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
timpeters2 | 0.0580 | 32 | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
timpeters | 0.0724 | ? | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
kernighan | 0.0745 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan |
elklund | 0.0889 | - | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
naive | 0.1519 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive |
seistrup | 0.2447 | ? | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
And here are the results for my MacBook with a 2.0Ghz Core 2 Duo:
Name | Time | Max Bits | URL |
table16 | 0.0279 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
table+kern | 0.0372 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
table8 | 0.0417 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
bsdmacro | 0.0481 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
bagwell | 0.0596 | 32 | http://lamp.epfl.ch/papers/idealhashtrees.pdf |
timpeters3 | 0.0744 | 32 | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
parallel | 0.0807 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
ops64 | 0.1290 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64 |
timpeters2 | 0.1439 | 32 | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
kernighan | 0.1527 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan |
timpeters | 0.1668 | ? | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
freebsd | 0.1772 | - | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
elklund | 0.1913 | - | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
naive | 0.2889 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive |
seistrup | 0.6106 | ? | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
Observations
- Implementations written in C (using Pyrex) are 5-6 times faster than Python.
- My 3.2 Ghz Linux Desktop is 200% faster than my 2.0 Ghz MacBook even though it only has a 60% clockspeed advantage. I'm not sure what to make of that. Python doesn't use multiple cores well, so I'm sure it's not that.
- If you want to run popcount on 32-bit values in pure Python, table16 is the fastest by far, and only 3 times slower than implementations in C.
- If you need to run popcount on arbitrarily large integers, kernighan is the best, but doing a hybrid of table16 and kernighan is probably better.
Conclusion
If you don't mind using about 64KB of memory, here's is the fastest popcount in pure python:#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable POPCOUNT_TABLE16 = [0] * 2**16 for index in xrange(len(POPCOUNT_TABLE16)): POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1] def popcount32_table16(v): return (POPCOUNT_TABLE16[ v & 0xffff] + POPCOUNT_TABLE16[(v >> 16) & 0xffff])If you do mind the memory usage, here's a slighly slower version:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable POPCOUNT_TABLE8 = [0] * 2**8 for index in xrange(len(POPCOUNT_TABLE8)): POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1] def popcount32_table8(v): return (POPCOUNT_TABLE8[ v & 0xff] + POPCOUNT_TABLE8[(v >> 8) & 0xff] + POPCOUNT_TABLE8[(v >> 16) & 0xff] + POPCOUNT_TABLE8[ v >> 24 ])And if you need to handle values of more than 32 bits, one of these two are the best:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan def popcount_kernighan(v): c = 0 while v: v &= v - 1 c += 1 return c POPCOUNT32_LIMIT = 2**32-1 POPCOUNT_TABLE8 = [0] * 2**8 for index in xrange(len(POPCOUNT_TABLE8)): POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1] def popcount_hybrid(v): if v <= POPCOUNT32_LIMIT: return (POPCOUNT_TABLE16[ v & 0xffff] + POPCOUNT_TABLE16[(v >> 16) & 0xffff]) else: c = 0 while v: v &= v - 1 c += 1 return cIf it needs to be faster than that, write in C!
Test For Yourself
If you want to test these yourself, here's some code you can run.#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive def popcount_naive(v): c = 0 while v: c += (v & 1) v >>= 1 return c #http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan def popcount_kernighan(v): c = 0 while v: v &= v - 1 c += 1 return c #http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable POPCOUNT_TABLE8 = [0] * 2**8 for index in xrange(len(POPCOUNT_TABLE8)): POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1] def popcount32_table8(v): return (POPCOUNT_TABLE8[ v & 0xff] + POPCOUNT_TABLE8[(v >> 8) & 0xff] + POPCOUNT_TABLE8[(v >> 16) & 0xff] + POPCOUNT_TABLE8[ v >> 24 ]) POPCOUNT_TABLE16 = [0] * 2**16 for index in xrange(len(POPCOUNT_TABLE16)): POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1] def popcount32_table16(v): return (POPCOUNT_TABLE16[ v & 0xffff] + POPCOUNT_TABLE16[(v >> 16) & 0xffff]) POPCOUNT32_LIMIT = 2**32-1 def popcount_table16_kernighan(v): if v <= POPCOUNT32_LIMIT: return (POPCOUNT_TABLE16[ v & 0xffff] + POPCOUNT_TABLE16[(v >> 16) & 0xffff]) else: c = 0 while v: v &= v - 1 c += 1 return c #http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64 def popcount32_ops64(v): return ((((v & 0xfff) * 0x1001001001001 & 0x84210842108421) % 0x1f) + ((((v & 0xfff000) >> 12) * 0x1001001001001 & 0x84210842108421) % 0x1f) + (((v >> 24) * 0x1001001001001 & 0x84210842108421) % 0x1f)) #http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel #also http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html def popcount32_parallel(v): v = v - ((v >> 1) & 0x55555555) v = (v & 0x33333333) + ((v >> 2) & 0x33333333) #v = (v & 0x0F0F0F0F) + ((v >> 4) & 0x0F0F0F0F) #v = v + ((v >> 4) & 0x0F0F0F0F) v = (v + (v >> 4)) & 0x0F0F0F0F v = (v * 0x1010101) >> 24 return v % 256 #I added %256. I'm not sure why it's needed. It's probably because of signed ints in Python def popcount32_bagwell(v): v = v - ((v >> 1) & 0x55555555) v = (v & 0x33333333) + ((v >> 2) & 0x33333333) v = (v & 0x0F0F0F0F) + ((v >> 4) & 0x0F0F0F0F) v = v + (v >> 8) v = (v + (v >> 16)) & 0x3F return v #http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html def popcount_elklund(v): c = 0 while v: v ^= v & -v c += 1 return c #http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html def popcount32_freebsd(v): v = (v & 0x55555555) + ((v & 0xaaaaaaaa) >> 1); v = (v & 0x33333333) + ((v & 0xcccccccc) >> 2); v = (v & 0x0f0f0f0f) + ((v & 0xf0f0f0f0) >> 4); v = (v & 0x00ff00ff) + ((v & 0xff00ff00) >> 8); v = (v & 0x0000ffff) + ((v & 0xffff0000) >> 16); return v #http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html def popcount32_bsdmacro(v): #define BITCOUNT(x) (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255) #define BX_(x) ((x) - (((x)>>1)&) - (((x)>>2)&0x33333333) - (((x)>>3)&0x11111111)) # #def bx(v): (v - ((v >> 1) & 0x77777777) - ((v >> 2) & 0x33333333) - ((v >> 3) & 0x11111111)) #def bc(v): ((bx(v) + (bx(v) >> 4)) & 0x0F0F0F0F) % 255 v = (v - ((v >> 1) & 0x77777777) - ((v >> 2) & 0x33333333) - ((v >> 3) & 0x11111111)) v = ((v + (v >> 4)) & 0x0F0F0F0F) return v % 255 #http://mail.python.org/pipermail/python-list/1999-July/007696.html import marshal, array, struct, string POPCOUNT8_TRANS_TABLE = "".join(map(chr, POPCOUNT_TABLE8)) #changed by me to match new dumps() and use sum() def popcount_timpeters(v): counts = array.array("B", string.translate(marshal.dumps(v), POPCOUNT8_TRANS_TABLE)) # overwrite type code counts[0] = 0 return sum(counts) #changed by me to add loop unrolling and not setting digitcounts[0] def popcount32_timpeters2(v): counts = array.array("B", string.translate(marshal.dumps(v), POPCOUNT8_TRANS_TABLE)) return counts[1] + counts[2] + counts[3] + counts[4] #improved by me: no need to translate type char def popcount32_timpeters3(v): dumped = marshal.dumps(v) return POPCOUNT_TABLE8[ord(dumped[1])] + POPCOUNT_TABLE8[ord(dumped[2])] + POPCOUNT_TABLE8[ord(dumped[3])] + POPCOUNT_TABLE8[ord(dumped[4])] #http://mail.python.org/pipermail/python-list/1999-July/007696.html _run2mask = {1: 0x5555555555555555L, 2: 0x3333333333333333L, 4: 0x0F0F0F0F0F0F0F0FL, 8: 0x00FF00FF00FF00FFL} def buildmask2(run, n): run2 = run + run k = (n + run2 - 1) / run2 n = k * run2 try: answer = _run2mask[run] k2 = 64 / run2 except KeyError: answer = (1L << run) - 1 k2 = 1 while k > k2: k2 = k2 + k2 if k >= k2: answer = answer | (answer << (run * k2)) else: answer = answer | (answer << (run2 * (k - k2/2))) k2 = k if k2 > k: answer = answer >> (run2 * (k2 - k)) return answer, n def nbits(v): return 32 #??? def popcount_seistrup(v): lomask, n = buildmask2(1, nbits(v)) v = v - ((v >> 1) & lomask) target = 2 while n > target: lomask, n = buildmask2(target, n) v = (v & lomask) + ((v >> target) & lomask) target = target + target for i in range(1, target/2): if n <= target: break n = n >> 1 n = (n + target - 1) / target * target v = (v & ((1L <<>> n) return int(v) if __name__ == "__main__": import time, random def time_func(func): before = time.time() result = func() after = time.time() span = after - before return result, span popcounts = [popcount_naive, popcount32_table8, popcount32_table16, popcount_table16_kernighan, popcount_kernighan, popcount32_ops64, popcount32_parallel, popcount32_bagwell, popcount_elklund, popcount32_freebsd, popcount32_bsdmacro, popcount_seistrup, popcount_timpeters, popcount32_timpeters2, popcount32_timpeters3, ] test_count = 30000 max_int = 2**31 - 2 ints = range(0, 257) + [random.randint(0, max_int) for _ in xrange(test_count)] + range(max_int - 100, max_int+1) expected_counts = map(popcount_naive, ints) for popcount in popcounts: counts, timespan = time_func(lambda : map(popcount, ints)) print "%5.5f %s" % ((timespan if (counts == expected_counts) else -1), popcount.__name__) #-1 means failure
In python 3 I've found (to my disappointment) that a simple bin(num).count('1') is faster than the table16 algorithm you've posted. As I was trying to replace the gmpy version of popcount I was using, this was not good enough. Thankfully, I was interested in a special case of the problem, namely I was doing a test for
ReplyDeletepopcount(a) > 1
so, inspired by the posted algorithm, I was able to replace that with:
a not in onebit
where onebit is a tuple constructed as follows:
onebit = [0]
for i in range(32): onebit.append(2**i)
onebit = tuple(onebit)
another benefit is that you get to set the size you are interested in just by changing the range of the loop. (in my case, 17 was sufficient.
Interesting results, thanks for compiling this.
ReplyDeleteOne option that has been over-looked, perhaps, is an 11-bit look-up table:
POPCOUNT_TABLE11 = [0] * (1 << 11)
for index in xrange(len(POPCOUNT_TABLE11)):
POPCOUNT_TABLE11[index] = (index & 1) + POPCOUNT_TABLE11[index >> 1]
def popcount32_table11(v):
return (POPCOUNT_TABLE11[ v & 0x7ff] +
POPCOUNT_TABLE11[(v >> 11) & 0x7ff] +
POPCOUNT_TABLE11[ v >> 22 ])
This actually runs the fastest on my system, I imagine due to caching better than the 16-bit table, being 32x smaller.
The problem is that your benchmarks assume your aren't doing anything OTHER than popcount. On newer x86 processors, the 64K table will fit in (and fill) the L1 data cache, making it very fast, but only as long as it can hog the entire L1 cache! In real code, you'll likely be doing other things that require L1 cache as well, so this will run effectively 3x slower (L2 cache speed)...
ReplyDeletevé tết 2019 đi hà nội
ReplyDeletevé tết 2019 đi đồng hới
vé tết 2019 đi buôn mê thuột
vé tết 2019 đi nha trang
vé tết 2019 đi hải phòng
I am read your article and it’s helped me a lot, thank you so much for share this wonderful article, I really like to read this type of blog, keep sharing this type of useful information, I am suggest to my all contact to visit this blog and collect this helpful information, If any one searching website designing & development company in Delhi please visit my website, we are do for you any think like website build PPC service and many more so don’t waste your time and come on my website and grow more your business, Thank you so much for read my comment. Ecommerce Website Designing Company in Delhii
ReplyDeleteThis was be great I will read your blog properly, thank you so much for share this valuable information, I am enjoy to read this, it’s provided me lot of information, if any one grow up looking so please visit me website and I am share for you some tips so you will go and grow up your looks, thank you so much for read my comment.
ReplyDeleteLifestyle Magazine India
trung tâm tư vấn du học canada vnsava
ReplyDeletecông ty tư vấn du học canada vnsava
trung tâm tư vấn du học canada vnsava uy tín
công ty tư vấn du học canada vnsava uy tín
trung tâm tư vấn du học canada vnsava tại tphcm
công ty tư vấn du học canada vnsava tại tphcm
điều kiện du học canada vnsava
chi phí du học canada vnsava
#vnsava
@vnsava
ReplyDeleteThanks for sharing this informative content.,
Leanpitch provides online training in Scrum Master Certification during this lockdown period everyone can use it wisely.
Join Leanpitch 2 Days CSM Certification Workshop in different cities.
CSM training online
Scrum master training online
Thanks for sharing this.,
ReplyDeleteLeanpitch provides online training in Scrum Master during this lockdown period everyone can use it wisely.
Join Leanpitch 2 Days CSM Certification Workshop in different cities.
Scrum master certification
csm certification
certified scrum master certification
certified scrum master
agile scrum master certification
scrum master certification cost
csm certification cost
Scrum master Training
Scrum master
Best Scrum master certification
Very informative post and really appreciable. Keep doing great work like this.
ReplyDeleteMy web site - 부산오피
(freaky)
You provide such a wonderful article, the way of expressing the word is excellent. Fix the Common Verizon Email Problem like Change Verizon Password, Verizon email is not Working, Forgot Verizon Password etc.
ReplyDeleteAfter I originally left a comment I appear to have clicked on the -Notify me when new comments are added- checkbox and from now on every time a comment is added I receive 4 emails with the exact same comment. Perhaps there is an easy method you are able to remove me from that service? Thank you!
ReplyDelete대딸방
Thanks , I have just been looking for information approximately this subject for a while and yours is the greatest I've found out till now. However, what concerning the conclusion? Are you sure about the source?
ReplyDelete타이마사지
Pretty section of content. I just stumbled upon your web blog and in accession capital to assert that I get actually enjoyed account your blog posts. Any way I will be subscribing to your feeds and even I achievement you access consistently quickly.
ReplyDelete안마
Hi there to every one, the contents existing at this web site are truly amazing for people knowledge, well, keep up the good work fellows.
ReplyDelete성인웹툰
Say no to paying tons of cash for overpriced Facebook advertising! Let me show you a platform that charges only a very small payment and provides an almost endless volume of web traffic to your website 안전놀이터
ReplyDeleteThere is something wrong with the following line in function popcount_seistrup in your "Try For Yourself" code :
ReplyDeletev = (v & ((1L <<>> n)
Python reports syntax error on this line! Seems there is something missing?
Care to fix it?
Everything is very open with a clear clarification of the issues. It was truly informative. Your site is useful. Thank you for sharing!data analytics course in jodhpur
ReplyDeleteI always like finding a site that gives you very good ideas because I like learning new stuff. Happy that I found your site because I greatly liked it and I anticipate your following post. A fantastic blog and i’ll come back again for more useful content. ufabet thai vip
ReplyDelete바카라사이트
ReplyDelete바카라게임사이트
바카라
This magnificent perform I love this website
The style of this site is great the aspect is truly incredible
ReplyDelete스포츠토토
프로토
먹튀검증
스포츠중계
ReplyDelete토토사이트
토토
You make so many great points here that I read your article a couple of times. Your views are in accordance with my own for the most part. This is great content for your readers.
토토사이트
ReplyDelete배트맨토토
Just pure classic stuff from you here. I have never seen such a brilliantly written article in a long time. I am thankful to you that you produced this!
Nice content, please checkout my website Vauld Referral Code
ReplyDeleteThanks for sharing this article, what you posted is very useful for us and everyone, for about movies and web series provide more information...
ReplyDeleteUpcoming Bollywood Movie
Can I attach this article? I want to tell my friends. My blog is pretty interesting too.
ReplyDelete먹튀온라인
ReplyDeletewill be praised anywhere. I am a columnist This post is really the best on this valuable topic 검증카지노
thanks for the realistic critique. 먹튀검증백과커뮤니티
ReplyDeleteGreat Post
ReplyDeleteMyAssignmenthelp.sg provides Dissertation Help Singapore for university students. Our experienced writers provide quality dissertation help so that you can get good grades. Our team of experts has years of experience in writing dissertations and has the ability to draft a perfect dissertation for you. We provide personalized service for dissertation help in Singapore to make sure that you get the best out of it. We also provide editing and proofreading services for dissertation help in Singapore. We are available 24X7 for you to help you with your dissertation queries.
Nice Post
ReplyDeleteNeed top-quality assignment help in Malta? Look no further than maltaassignmenthelp.com! Our expert team of writers can assist with any assignment, big or small.
I am really thankful to the valuable information for us. Keep continue to share such an amazing blog. Visit this site to know more about this file uncontested divorce virginia
ReplyDeleteGreat blog
ReplyDeleteGreeceassignmenthelp.com understands that students in Greece have busy schedules and often struggle to balance their coursework with other responsibilities. That's why we offer a convenient and affordable way to buy Greece assignment services online.
convert youtube to mp3 online high quality
ReplyDeletehttps www.youtube.coyoutube to mp3
download youtube videos to mp3 on your phone
compare youtube to mp3 online converter
how to convert a youtube to mp3
https://yttomp3.pro/
audio file converter to mp3 youtube
mp3 skull youtube to mp3 converter
use youtube to mp3 on ipad
youtube to mp3 converter program online
Great Post
ReplyDelete"Elevate your home decor with the alluring charm of resin clocks from ClassyArtz.com. Discover a symphony of colors and textures in our online collection, where resin takes center stage in the art of timekeeping. Buy resin clocks online and create a striking focal point that reflects your sophisticated taste."
Capture the beauty and love of your wedding day by preserving your garland in resin. Our dedicated artisans use high-quality resin to encapsulate the delicate flowers, securing their charm and colors. The resin forms a protective layer, preserving the garland's essence for years. Choose our Wedding Garland Preservation In Resin to relive the magic of your special day for a lifetime.
ReplyDeleteI'm continually amazed by the excellence of your blog. Your dedication to delivering top-notch content truly shines.Ley de Divorcio de Nueva York Propiedad Matrimonial
ReplyDeleteYou guys are following impressive trends. I like the look of your website too. Winter Sale Jackets
ReplyDelete"Welcome to Gupta Homeo Clinic, your trusted destination for Vitiligo treatment Clinic in Jaipur. Our clinic stands as a beacon of hope, employing advanced homeopathic methodologies to treat Vitiligo. With a dedicated team and personalized protocols, we strive to re-pigment affected areas naturally. Gupta Homeo Clinic in Jaipur is committed to delivering comprehensive and reliable care, ensuring patients receive the best possible treatment for Vitiligo."
ReplyDeleteAppeal Denial Protective Order Virginia Denied a protective order? Our skilled team will help you.
ReplyDelete