I've learned. I'll share.

April 1, 2008

You Could Have Invented Monadic Parsing

Even though Pysec (and functional programming in python in general) turned out to be 2-3x slower, monadic parsing is still great for certain tasks. After I wrote about Pysec the first time, I received comments regarding pyparsing, which is a great parsing library that looks very similar. Today, I'd like to show you a very simple example where monadic parsing really shines above all other parsing techniques, including pyparsing. This is a real-world need that I had, not just an academic thought-experiment.

The situation is that I'm writing a peer-to-peer file synchronizer and I need to serialize binary data in an efficient way. I've choosen to use what I call "sized binary", which is basically netstrings without the comma or bencode's byte string. The byte-string "ABC", for example, would be encoded as "3:ABC", or "Hello World!" would be "12:Hello World!". This is very efficient because "ABC" or "Hello World!" can be any bytes \x00-\xFF, without any sort of encoding or decoding like uuencode.

Stop and think for a minute how you would define a grammer to parse "3:ABC". It's so simple, right? Well, try and write a grammar for it. Try using BNF, EBNF, yacc/bison, pyparsing, SimpleParse, or any parsing libarary you know of, in any programming language. If you manage, please write me an email and let me know how you did it, because I'm not so sure it's even possible. Maybe it is in the Perl6 grammars; they're throwing the kitchen sink into that thing :). The best I've seen is that the creator of pyparsing hacked a grammar object to change how it parses the "ABC" part after a different object parsed the "3" part. This basically uses the grammar object as a global variable. It works in a single-core world, but it won't work in the many-core world we're headed for.

But now you're getting bored with me because it's silly ringing my hands over such a simple format. I bet you could parse this format in just 4 lines of code:

def parse_sized_binary(stream):
    size_bytes, stream = stream.readUntil(":")
    size               = int(size_bytes)
    bytes, stream      = stream.read(size)
    return bytes, stream
You're right, it's really simple, at least until you have to embed this little "sized binary language" inside of a bigger language. Perhaps, like me, you need to embed binary data inside of a bigger data structure, and so you need to define a serialization for things like ints, floats, lists, and hash tables. For that stuff, you can define a grammer for a language like JSON or YAML and hand it over to a grammar-based parser library. But now you need a way to tell the library, "OK, when you see my binary data, hand control over to my four lines of code, and then I'll hand control back". You might even say "You know, why don't you just let me hand you any function of the form stream -> (value, stream) and let me control the parsing for a while".

Congratulations, you just invented monadic parsing. The function you wrote, parse_sized_binary, of the form stream -> (bytes, stream), is a Parser Monad. Monadic parsing is just combining lots of these little parsers together. And so, it's incredibly easy to insert arbitrary parsing code, like parse_sized_binary, into any parser.

As a complete example, using Pysec, here's a full parser for a language just like bencode but with support for floats and unicode. Notice that our parse_size_binary is renamed to bencode_sized and is given a more functional definition.

from Pysec import Stub, until, lift, read, many_until, branch

bencode_any     = Stub()
bencode_sized   = until(":") >> lift(int) >> read
bencode_unsized = until("e")
bencode_list    = many_until(bencode_any, "e")
bencode_any.set(branch({"s" : bencode_sized,
                        "u" : bencode_sized >> lift(lambda bytes : bytes.decode("utf8")),
                        "i" : bencode_unsized >> lift(int),
                        "f" : bencode_unsized >> lift(float),
                        "l" : bencode_list,
                        "d" : bencode_list >> lift(dict)}))
I think this is sort of like Greenspun's Tenth Rule. Any implementation of netstrings/sized-binary-data probably has an ad-hoc implementation of monadic parsing in it. Even my own non-monadic version of this parser (which I use for performance reasons) has an ad-hoc implementation of monadic parsing. I'm convinced that if you write a parser for a similar format, you'll implement/invent monadic parsing as well, even if you don't know it.

32 comments:

  1. Using this bencode example and the javascript example, combined with the rest of the source you posted earlier (the Pysec/Parsec in Python post), I tried to create something that can parse something that could otherwise be decoded as follows:

    def tlv(data):
    # data is a string for this example
      import struct
    # two shorts in network byte order
      type, len = struct.unpack('!HH', data[:4])
      value, data = data[4:len+4], data[len+4:]
      return (type,value), data

    I succeeded in using your monad classes and parser decorator to create something to read a short, but passing the length into the next parser (I tried to use read, like in the bencode example) didn't work. It seems likely that the source provided for the javascript example isn't exactly what it is inside the Pysec module you import from in the bencode example.

    Any tips?

    ReplyDelete
  2. Pysec has changed a bit since I posted about it the first time, mostly for performance reasons. If you'd like, I can post somewhere a more complete example that you can actually run. I'm never sure how popular a post will be, so I don't always go to the trouble to make a full, runnable example.

    ReplyDelete
  3. That's understandable. Posting all the code over and over again when the changes are minor could clutter up the posts pretty quickly. I don't know what your plans are for releasing Pysec are but I was wondering, is the code publically accessible anywhere? Those of us keeping up with your blog can keep up with the code too =)

    ReplyDelete
  4. Sorry about not getting to you sooner.

    My code isn't anywhere publicly, other than this blog. Do you know of a good place to host code informally? I guess Google Code looks pretty good, but what do I call the project? "Peter's Stuff"? I might want to release thing other than Pysec.

    Also, I find it hard enough making regular blog entries. I'm not sure how much time I'd have to dedicate to any kind of "project" right now. But perhaps if there's sufficient interest for it, I could find a little time.

    ReplyDelete
  5. Well, a formal "project" isn't really necessary, I was thinking just a pub directory somewhere ;-)

    Or, if you have a revision control system and a webhost, you could have a source browser (such as trac, or viewcvs).

    Just a suggestion, though. Thanks for the full source on the most recent post, by the way =)

    ReplyDelete
  6. Now that you have discovered github, please post post your work on this parser there. I love the simplicity and ease for defining a grammar.

    ReplyDelete
  7. Bencode isn’t context-free, so of course this isn’t possible. One of the reasons I think it’s a terrible format. If you restrict the length of Bytestrings you could probably describe it with a gigantic grammar, but I didn’t feel like figuring it out.

    ReplyDelete
  8. There's also an issue with the parser you posted; viz. it allows you to have things like this 00000000000000000005:hello which would be bad. :-) I found a lot of buffer overflows this way... most people aren't subtle enough in writing these parsers. I ended up writing one by hand for Bencode, but it took a long time. — What a total crap format.

    ReplyDelete

  9. Good post it's amazing the knowledge you have in your Niche

    xiaomi so popular


    ReplyDelete
  10. In my opinion, on https://stjosephpublicschoolsfoundation.org/essay-writing-rubrics-for-teachers/ you can get useful essay writing rubrics for teachers. I had such experience and it was great idea

    ReplyDelete
  11. This is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the leisure here! Keep up the good work. I have been meaning to write something like this on my website and you have given me an idea.

    data science course in India

    ReplyDelete
  12. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
    Artificial Intelligence Course

    ReplyDelete
  13. Thanks for sharing this informative content.,
    Leanpitch provides online training in Scrum Master Certification during this lockdown period everyone can use it wisely.
    Join Leanpitch 2 Days CSM Certification Workshop in different cities.

    Scrum master certification online

    CSM certification online

    CSM online

    CSM online certification

    ReplyDelete
  14. Thanks for sharing this informative content.,
    Leanpitch provides online training in Scrum Master Certification during this lockdown period everyone can use it wisely.
    Join Leanpitch 2 Days CSM Certification Workshop in different cities.

    Scrum master certification online
    CSM online training

    CSM training online

    Scrum master training online

    ReplyDelete
  15. I truly like your style of blogging. I added it to my preferred's blog webpage list and will return soon…
    Click Here
    Jackets on Aspire Leather

    ReplyDelete
  16. If your looking for Online Illinois license plate sticker renewals then you have need to come to the right place.We offer the fastest Illinois license plate sticker renewals in the state.Please Visit:

    Jackets on Genuine Shopping Store

    ReplyDelete
  17. I want to always read your blogs. I love them Are you also searching for Accounting Homework Help? we are the best solution for you. Contact us today and get the best. +1-(951)-468-9855 info@cheapassignmentwriters.com

    ReplyDelete
  18. This is how we do it! I know that sometime we all necessity assistance with essay nad you can be the best and first who learn! Check this help writing dissertation proposal and go for it! Have a kind day my friend!

    ReplyDelete
  19. It's very simple to find out any topic on net as compared to books, as I found this post at this site.
    온라인카지노
    온라인카지노사이트
    카지노

    ReplyDelete
  20. Hi there, I enjoy reading all of your post. I wanted to write a little comment to support you.

    토토사이트
    사설토토
    토토사이트추천

    ReplyDelete
  21. I have read so many articles or reviews regarding the blogger lovers except this paragraph is truly a good post, keep it up.

    스포츠토토
    먹튀검증
    스포츠토토티비

    ReplyDelete
  22. 토토
    배트맨토토
    스포츠토토

    This web site truly has all the info I wanted about this subject and didn't know who to ask.

    ReplyDelete
  23. The first thing you should do is to find a reputable homework assignment help provider. Make sure that the company you are considering is a legitimate company and not a scam. A legitimate company will provide you with all the necessary information and a free consultation to make sure that you are satisfied with their service.

    ReplyDelete
  24. A very good article. I am very happy to read this article. This article is too informative and helpful. I will share with my friends. Thanks for sharing this article and good knowledge. Now it's time to avail Locksmith in Leeds for more information.

    ReplyDelete
  25. Amr Helmy Company provides the best kitchen for sale with high quality and reasonable price using the finest materials and the best wardrobes design. Amr Helmy’s kitchen is covered by a ten-year warranty and immediate maintenance service,

    ReplyDelete
  26. قد يُلحق النمل في بعض الأحيان ضرراً في المنزل، أو أنّه ينقل البكتيريا ولكي تتخلص منه لابد أن تفهم طريقة عمل مستعمرة النمل وملكتها؛ لأنّ الملكة تستمر بوضع البيض، وترسل النمل العامل لجلب الطعام دائماً. مراقبة المسارات التي يمشي فيها النمل. تعامل مع المركز الالماني لانه يعتبر من افضل شركات مبيدات حشرية.

    ReplyDelete
  27. Students will find this site especially interesting because they can learn a lot without ever getting in touch with the healthcare assignment help. Come here to quickly and conveniently complete your task. Why do you need more now? hence, hurry.

    ReplyDelete
  28. Your blogs are really good and interesting. It is very great and informative. This basically uses the grammar object as a global variable. It works in a single-core world, but it won't work in the many-core world we're headed for bankruptcy lawyers in virginia beach. I got a lots of useful information in your blog. Keeps sharing more useful blogs..

    ReplyDelete
  29. The article is informative and helpful, this article is important for programming language. I hope you share more blog on this type. Now it's time to avail same day computer repair in Essex, MD for more information.

    ReplyDelete
  30. We are grateful that you have such high-quality information on your blog.divorce in new jersey laws.

    ReplyDelete
  31. This comment has been removed by the author.

    ReplyDelete

Blog Archive

Google Analytics