Programming

Best In Class: Python vs Clojure - Evolving

steloflute 2012. 9. 6. 00:22

http://www.bestinclass.dk/index.clj/2009/10/python-vs-clojure-evolving.html


This is an interesting time to review programming languages! On the one hand I see a general tendency that companies and individuals are leaving the old (un)safe paths and are embracing 'newer' technologies and on the other we see some of these new-comers now moving into maturity. Python has been with us since the late 1980's and this articles take a peek at how Python has handled stepping into the new millennium.

Pythonic overview

Python boasts of being a multi-paradigm  programming language, suitable for Object Oriented Programming (OOP), Functional programming, Imperative programming and even concurrency. When you run your python code it's actually compiled to intermediate bytecode which is then interpreted by the VM. That's similar to Clojure code, compiling to Java Bytecode and being run by the JVM.

Python has a unique syntax and comes in a few different flavors. Finally it's made by Guido Van Rossum who still oversees development.

A word on Syntax

When you are reviewing Python code you'll notice that it all looks very much the same. That's because Python enforces certain amounts of whitespace, 4 units when indenting. This comes with certain benefits, uniform code, no braces/delimiters needed, etc. But on the downside, you don't get to decide what your code looks like, Guido does. This isn't a case of right/wrong, everything is a trade-off but personally I want to be the one who decides what my code looks like.

Not directly related to Syntax comes the issue of scope, which is understood by your level of indenting:

x = 5
if (x == 5):
    y = 10
    print x  # print 5
print y # print 10

The last print statement outputs "10" which is surprising to me, since y has never been declared in that given scope. I don't quite know how Pythonists handle this, but I can imagine many situations where this will be the cause of much confusion. To resolve it you have to manually call 'del y', which is a type of manual garbage collection. I don't have the numbers for Python, but I can't count the number of C app's that have suffered from memory-leaks and what not, based on the lack of automated garbage collection. With Pythons use of scope, I imagine quite a few bugs will follow.

Functional Python

Someone once remarked, that a truly functional programming language is only good for heating your computer. You need at least to generate some output, which can be considered a side-effect. The majority of the Python code that I've seen, has been object oriented and imperative in structure, but there has been written a lot of material on Pythons functional side, some of which I'll look at here. Clojure isn't what you would call a purely functional language, but it comes so close that you really have to concentrate to start defying the main principles. Python, wanting to cater to both sides, doesn't provide the same level of cover, although it does sport immutable collections internally this is of little value when the user representation is still subject to change. To give an example of what I mean, I'll first show a Clojure example and then the corresponding Python code:

(let [x {:foo "bar" :baz 5}]
   (assoc x :quux 10)   ; this returns {:foo "bar", :baz 5, :quux 10}
   x)
{:foo "bar", :baz 5}
    === PYTHON ===

>>> x = {"foo":"bar", "baz":5}
>>> x["quux"] = 10
>>> x
{'quux': 10, 'foo': 'bar', 'baz': 5}

These 2 programs do basically the same thing. In the Clojure version I let 'x' be a hash-map and then associate into that the key/val pair :quux 10. Same thing goes on in Python, just using a 'dict' instead of a hash-map. The only noticeable difference in the definitions is the syntax of course, where Python does not support keyword. Clojure doesn't care about whitespace and regards commas as being such.

Then in the last line of both programs we evaluate 'x' to see which value it holds. The key difference is of course, that Clojures 'x' didn't change while Pythons did. Now in one Python doc I found they actually refer to this type of data as immutable, because if I alias x as y and change x, y remains the same. But this is very different from Clojures concept where immutable means: 'x' never changes. I can make decisions based on my perception of 'x' since it won't change. As we will see later, this makes all the difference in concurrent programming.

Lambda

Python supports anonymous functions called lambdas, these are limited to being 1-liners. For people likeCGrand that might not be a problem, but for me it limits functionality. If you combine Pythons First-Class approach to functions with lambdas, you get to do some nice functional programming. Firstly, lets try to implement Pythons counter to Clojure's predicate even?

Clojure:

>> (filter even? (range 10))
( 0 2 4 6 8 )

Python:

>>> even? = 2
  File "", line 1
    even? = 2
        ^
SyntaxError: invalid syntax

Ok, so that was a no go and the reason is I named my Python predicate as I would name a Clojure predicate "even?". Python doesn't support non-alphanumerical characters in definition names, so I'll have to switch to a Common Lisp inspired style instead. To Pythons credit it actually pinpointed the exact character which caused the exception - You shouldn't except that much help from Clojures backtraces....yet.

>>> evenp = (lambda n: (n % 2) == 0)
>>> filter(evenp,range(10))
[0, 2, 4, 6, 8]

Selecting the name evenp lets me save the anonymous function as a predicate. Using the built-in 'filter' I can then interact with a range of numbers just like in Clojure. On both accounts this is purely functional.

Taking it out for a spin

Let's take a look at the Great Funtional Stomping Ground: Project Euler. I think the first 40 or so problems, have been solved using a functional style of Python, so lets compare notes on a couple of them. The Python code is taken from: here.

Euler 1: Find all numbers dividable by 3 or 5 in 0 < x < 1000:

(reduce +
       (filter #(or (zero? (rem % 3))
                    (zero? (rem % 5)))
            (range 1000)))
multiple = lambda x, y: x % y == 0

print sum(x for x in xrange(1, 1000+1)
         if multiple(x, 3) or multiple(x, 5))

On the left we have Clojure, which reads: Take the sum (reduce using addition), of filtering on my predicate from the list (range 1000). My predicate being a normal 'or' statement asking if the modulus value % (our argument) of 3 and 5 respectively is zero.

On the right we have Python who uses a lambda to test for candidates, then uses list comprehension to go through a for loop and pass the result to sum(). I wonder why they didn't use filter. For such a simple task, both languages are almost identical. They don't impose, they don't complicate and they don't add ceremony.

Euler 2: Get the sum of all even Fibonaccis below 4 mill.

So we have the same elements as before

  • getting a sum
  • filtering on a predicate

Compared to E1 the only new thing here is getting the Fibonnaci numbers. The Python crew resolved to generating a stream of these fibonnaci numbers, so I'll do the same.

(def fib-seq (lazy-cat [0 1]
     (map + fib-seq (rest fib-seq))))

(reduce +
      (filter even?
            (take-while #(< % 4000000)
                 fib-seq)))
from itertools import takewhile, count

def ireduce(func, iterable, init=None):
    """Like reduce() but using iterators (also
    known also scanl). Non-functional version"""
    if init is None:
        iterable = iter(iterable)
        curr = iterable.next()
    else:
        curr = init
        yield init
    for x in iterable:
        curr = func(curr, x)
        yield curr

def fibonacci():
    get_next = lambda (a, b), _: (b, a+b)
    return (b for a, b in ireduce(get_next, count(), (0, 1)))

candidates = takewhile(lambda n: n < 4000000, fibonacci())
print sum(x for x in candidates if x % 2)

On the left side we read the Clojure code like this: Take the sum of filtering the even numbers, from taking Fibonacci numbers until the item is >= 4 million. I'll admit the fib-seq definition is not intuitively understood for non functional programmers, but that just makes for a good challenge :)

On the right hand side, Python is showing some teeth. Basically what they're doing is defining an ireduce function, which iterates a collection accumulating the result of applying the first arg to every item (the result is similar to using Clojures reduce). Using this function they then define the fibbonacci sequence using a very cryptic statement - and from that point, it's comparable to Clojure.

Alright, last one.

Euler 4: Finding Palindroms

Almost the same components as the above. Instead of getting the sum we get the max, but we're still filtering on a predicate from some list. So we should expect to see almost similar programs to those above, right...?

(reduce max
    (filter #(let [s (str %)]
               (= (seq s) (reverse s)))
            (for [x (range 100 1000)
                  y (range 100 1000)]
              (* x y))))
from itertools import ifilter
import operator

def mul(nums):
    return reduce(operator.mul, nums)

def icross(*sequences):
    if sequences:
        for x in sequences[0]:
            for y in icross(*sequences[1:]):
                yield (x,)+y
    else: yield ()

def digits_from_num(num, base=10):
    def recursive(num, base):
        if num < base:
            return [num]
        return [num%base] + recursive(num/base, base)
    return list(reversed(recursive(num, base)))

def is_palindrome(num, base=10):
    digitslst = digits_from_num(num, base)
    return (digitslst == list(reversed(digitslst)))

def euler4(lstlst):
    canditates = (mul(ns) for ns in icross(*lstlst))
    return max(ifilter(is_palindrome, canditates))

print euler4(2*[range(111, 1000)])

Clojure version reads like this: Take the max from the result of filtering all items where the string representation is equal the the reverse string representation from the sequence which is formed by multiplying all ints i [100-1000] with [100-1000]. When you read and write these stream functions fluently, Euler suddenly becomes a lot easier, as you can see several problems conform to that method. Notice how Clojures lambdas blend into the rest of the code, not being restricted to one-liners.

On the right hand side we have Python. The Python team opted to reverse sequences of ints instead of translating to a string representation, this is handled in digits_from_num, the result is later passed to is_palindrome which basically does the same as my predicate. There are 2 thing which seem to be common for most of the Python attempts and thats nested 'for loops' and reliance on the yield statement. This helps them get to an amazing 29 lines for this simple problem and also makes room for quite a bit of error, although to their credit there is none.

And although it can be a bit premature to speak about effenciency, their very verbose solution runs at almost15 seconds, which is about 300% slower than the Clojure solution.

Concurrency

It is common knowledge we've hit the Gigahertz wall! When the hardware vendors realized that they couldn't squeak more Ghz out of your processor they decided to put more processors in your system. Our programming languages now need to enable us hammer away at multiple cores, sharing data and dividing labor across many processes, basically: Concurrency. As I've demonstrated in times past, Clojure does this using the STM, Refs, Atoms and Agents. Python doesn't have an STM but it has thread-support and does asynchronous development as well. But before we move on, we need to clarify the term concurrency even further because, like immutability, it's actually not understood the same way by Python programmers as it is by Clojurians.

Concurrency defined

One way to illustrate a property of concurrency is by the allowed activity of worker-threads (T) in your system. It's desirable to a relatively high count of worker-threads working simultaneously to allow for better distribution of labor. That of course comes with quite a few challenges in terms of controlling access to shared data. Here's an illustration

p-vs-c

For Clojure this means, that if you are computing a problem in parallel on 2 cores you get 2x performance minus a scheduling penalty, ditto for 3+ cores.

For Python this means, that no matter how cleverly you distribute your labor Python will enforce sequential execution, giving you no added performance + a 'scheduling' penalty.....?

Meet Gil. Gil is Pythons "Global Interpreter Lock". Basically all threads in Python work under a contract: He who holds the lock, gets to work. And in case you're wondering, this isn't  joke nor a bug in Python per say. Some years ago the core developers realized that this wouldn't fly as core-counts increased, so they removed the GIL and (as I recall) introduced some type of fine grain locking, but the challenge proved too difficult, so this was the result:

[we tried this] once before (in the late ’90s) and the resulting interpreter ran twice as slow

This isn't good considering we're coming up on 2010 and the amount of cores continues to rise. But ok, how much are we paying? David Beazly did a thorough decompilation of the Python VMs handling of concurrent requests and I'll share a few of his findings here.

An example

He made a small countdown function and ran it twice

def count(n):
    while n > 0:
        n -= 1

count(100000000)
count(100000000)

That takes about 30 seconds and thats a 100% serial execution, meaning you count down once, when done you count down again. So what would we expect if we multi thread this operation, so we have both count downs starting in direct succession of one another? 16-17 seconds perhaps? Lets try

import threading

t1 = threading.Thread(target=count,args=(100000000,))
t1.start()
t2 = threading.Thread(target=count,args=(100000000,))
t2.start()
t1.join(); t2.join()

That runs at 43 seconds! That means if you distribute labor in Python, you get a scheduling penalty of 143%.

For comparison I'll show you how to do a comparable benchmark in Clojure:

(defn countdown [n] (when (pos? n) (recur (dec n))))

(time (dotimes [i 2] (countdown 100000000)))
"Elapsed time: 20228.413955 msecs"

(time (doall (pmap countdown (repeat 2 100000000))))
"Elapsed time: 10656.830477 msecs"

The definition of countdown is self-explanatory I hope. First I called the countdown twice using (dotimes [i 2] ...), this is sequential and takes about 20.2 seconds. Then I call it in a pmap (parallel-map) using 'doall'  to force the evaluation, utilizing both of my cores. The results are as you would expect: Much faster than python when run sequentially, much much faster when run in parallel.

Based on my explanation thus far you'd think Python has almost the same speed in parallel as when run sequentially. The last percentages of performance that go missing are based on the way the GIL is implemented. If you want to dig deeper into that, check out the link to Dave Beazlys presentation above.

So Python isn't geared for concurrent programming, it doesn't support you in any meaningful way, so the question is this: Will that change anytime soon? I'll quote Guido:

…you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities.

Just because Java was once aimed at a set-top box OS that didn’t support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn’t mean that multiple processes (with judicious use of IPC) aren’t a much better approach to writing apps for multi-CPU boxes than threads.

Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions.

So what's he saying? He's basically saying that instead of providing solutions to the challenges inherent to concurrent programming, he would rather that you chop up your program into several instances who then communicate using some type of IPC (inter-process-communication method). Leaving Pythonists with IPC is basically saying: "You deal with it, I'm not going to", because this can be anything from from pipes, sockets, you name it. To me the deciding blow is not that Python is unable to meet the requirements of 2009, but rather that its founder is unwilling to improve on this position.

A man with a thousand knives...

...of which none are sharp. It's a chinese proverb which is typically used about martial artists who hold 5+ black belts, but can also be used to describe Python. It seems that in their attempt to support multiple paradigms, they've over complicated things.

In an earlier unrelated blogpost I commented on Pythons poor multi-threading capabilities and aligned with its imperative nature. Some commentators found that to be unreasonable but it's not quite so. The definition of an imperative language goes something like this

In computer science, imperative programming is a programming paradigm that describes computation in terms of statements that change a program state.

When all your data-containers (call them variables for instance) are subject to change (mutable) how can you guarantee a consistent view of the world? If you have 10 threads working on the same data and that data is constantly being altered nobody can make good decisions looking at that data, because the grounds for decision making are all volatile. So you need some way to control this change or to control the workers who can make the change. Pythons GIL is an example of how (not) to handle it, by freezing all workers except one. Try starting the above multi-threaded Python example and hit Ctrl-C, you'll see that Python doesn't even react because the 2 threads are now battling for the attention of the Gil which is constantly context-switching and delays the processing of events.

Easy Access

So since Python has gained so much traction lately, what's the great attraction? What's drawing in the crowds? My best guess would be

  1. libraries
  2. no learning curve
  3. huge community

There are library for everything (hence the slogan 'batteries included'), it reads almost like english and you can google your way out of anything. These are by no means bad things and only on point #1 can Clojure compete. The thing you should notice however, is that all of these are related to 'ease' and not 'quality'.

Conclusions

Generally I think nobody would argue that Python has a very clear syntax, but I truly dislike the fact that Guido gets to decide how I manage my white-space - I'm not free to arrange/align my data the way I want to. This is a style/taste point and I've known skilled competent programmers who prefer this type of rigid syntax. And rightfully so. In a workplace situation, it might not be Guido but a manager who needs to step in and declare some conventions.

I think as an introductory language I might expose a young student to Python for a brief period, then move on to more well-grounded languages before too many bad habits are picked up. Guido is not preparing to transition into modern programming, rather he says that it is to be avoided. With that in mind, Python isn't likely to let you have some fun with all coming challenges within concurrent programming and in my oppinion that disqualifies it for a lot of the serious programming challenges we'll be facing soon and are now facing. It also means that when you've coded your prototype and find out it's performance heavy, your next step is to buy another server chop the input in half and feed each it's portion. I don't dare estimate how many resources are being wasted on this already.

Having mutability be the default, you're teaching yourself a treacherous way of programming, which I guarantee will lead you into many bugs along the way. Again consider how the attraction is 'ease'. It's not difficult to go from only using immutable values to using mutable ones, but it's quite another story to go the other way. I personally don't mind a steep learning curve as long as the payoffs are worth it. Consider it an investment.

The thing that drew hard on me when I was taking on my first Lisp was this question:

"Are you smarter than your programming language?"

If you are, you'll know it because as soon as you can imagine some outline of your programs behavior you're hammering away at the keyboard and 25 - 50 lines later you see that behavior being implemented. With Lisp/Clojure you can sit and ponder your next expression for quite some time, but once you write it down it'll pack a ton of functionality. To me at least, Python falls in the first category and that comes with a huge price-tag: Your horizon isn't being expanded as it should. I read somewhere a staggering statistic, that theseniority and experience of developers wasn't directly related to the quality of their code! How can this be? Well consider a C programmer who has spent 10 years manipulation pointers. What insights has he gained into architectural challenges, concurrency pitfalls, etc? Very little, he spends his days speaking baby-language (low-level) to the compiler in order to get the behavior he wants, at the cost of personal development.

And don't get me wrong, I'm not saying this to put anybody down but rather to draw some attention to the fact that while we're sharpening our skills within one specific language, there's also a personal/mental side that's needs developing. And to do that we need to be challenged in more ways than how to find functions in libraries. For me, Clojure is a big part of that but of course your mileage will vary.

Hope you enjoyed the read,

/Lau

Note: This post has a sequel: here

Update: In version 1.0 of this post (which is now a few hours old) there was an incorrect assumption about Pythons whitespace policy, which has since been corrected. Whitespace policy is enforced only in regards to indenting.


'Programming' 카테고리의 다른 글

Clojure의 장단점  (0) 2012.09.07
Try Clojure  (0) 2012.09.06
Performance in DrRacket  (0) 2012.09.05
Inferior Emacs Lisp Mode (ielm)  (0) 2012.09.05
(Go) Structs and Interfaces  (0) 2012.09.04