Using Coconuts - a Pythonic Blog

Username:

Password:


Don't have an account? Get one!

Python Is Faster Than C++: An Objection

http://blog.opensourcenerd.com/upload/picard-facepalm

Boy, was I wrong. Or right, I suppose. One of my readers did know what I did wrong in my previous post about how Python is faster than C++ at bubble-sorting a list of files. I was also wrong about the execution time overall of such a task, which should have tipped me off. A proper bubble sort of 150000 items would have taken almost an hour to accomplish. The fact mine was only taking a few seconds should have been a cause for worry.

The problem was in my while loop. It should have been:

while bubble(names):
    pass

Because otherwise it would end because it did a swap, and wouldn't really sort. So, okay, I had two programs that did the same stupid thing, except Python did the stupid thing faster than C++ did. That was because I used endl to end my lines in C++. For my more computer-savvy readers, endl does a flush after the line, every time.

For those less computer-savvy, imagine you were being dictated something to write down on paper. Wouldn't it be faster to listen to the dictation then write it down sentence by sentence, rather than being told: "'The', write it down. 'Quick' write it down. 'Brown' write it down. 'Fox' write it down." ... et cetera.

http://blog.opensourcenerd.com/upload/doing-it-wrong

You are doing it very wrong.

So, the question is "what now?" Being a good programmer, I went back and fixed my code to actually sort. Upon running it, I noticed it would not finish running for a while, since bubble sort is so inefficient, so I got a different list of files, this time only about 1000 long. As a result, I got something to this tune:

Method Time
Python - Bubble Sort 2.856s
C++ - Bubble Sort 0.742s

Now that is the performance I was talking about. But wait, it gets better.

http://blog.opensourcenerd.com/upload/ohai-i-upgraded-your-ram

The GNU C compiler (aka GCC) has a command line option (-O2) at compilation time to optimize the generated bytecode of your program. This is risky on very complicated programs, where it may break some under the hood things, but for simple programs like this, it's great. Let's add that optimization to the table.

Method Time
Python - Bubble Sort 2.856s
C++ - Bubble Sort 0.742s
C++ - Bubble Sort Opt 0.249s

That's almost a three-fold increase in performance! I'm not sure quite what the optimizer does, but I don't think I'm leaving it off of any of my future programs!

Now, that friend of mine also sent me a C program to do the same thing, which was even faster because it didn't use slow-ish C++ features such as vectors. Here is his code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *names[155000];
int nnames;

bubble(nsort) {
   int switched = 0;
   int i;
   for (i = 0; i < nsort; i++) {
       if (strcmp (names[i], names[i+1]) > 0 ) {
           char *t = names[i];
           names[i] = names[i+1];
           names[i+1] = t;
           switched = 1;
       }
   }
   return switched;
}

int main() {
   char buf[512];
   FILE *fin = fopen ("files.txt", "r");
   nnames = 0;
   while (fgets (buf, 512, fin)) {
       names[nnames] = malloc (strlen (buf) + 1);
       strcpy (names[nnames++], buf);
   }
   int nsort = nnames;
   while (nsort && bubble(--nsort))
       ;
   FILE *fout = fopen ("files.out", "w");
   int i;
   for (i = 0; i < nnames; i++)
       fputs (names[i], fout);
   exit (0);
}

Then I measured the time of that, plus the time with it optimized, and added it to my table. For fun, I also whipped up two quick solutions in C++ and Python using the respective built-in sort algorithms - Quicksort based, lightning fast sorting methods. This is the final tally:

Method Time (s)
Python – Bubble Sort 2.856
C++ - Bubble Sort 0.742
C++ - Bubble Sort Opt 0.249
C – Bubble Sort 0.130
C – Bubble Sort Opt 0.105
Python – Quick Sort 0.086
C++ - Quick Sort 0.049
C++ - Quick Sort Opt 0.042
http://blog.opensourcenerd.com/upload/shocked-cat

And, for those who can't quite grasp the difference in performance here, have a graph:

http://blog.opensourcenerd.com/upload/python-vs-c-performance

So, yeah. My intuition was correct that C++ is faster than Python. By far. So then, nothing new. Move along, now. Nothing to see here.

t3h_sood says... source permalink

Lullz. :-P

On a slightly unrelated note:

IIRC, on a properly inlining compiler, the only reason why a std::vector should be slower than a good ol' C-style array is because it dynamically allocates its memory (and by extension due to the increased possibility of cache misses, but we can overlook that :-P). If you used the 'reserve' method prior to repeatedly calling 'push_back' over and over again, you'd close the gap between C++ and C by preemptively allocating your necessary memory, avoiding unnecessary re-allocations and memory copies.

on 2010-01-25 00:25:08
bvargo says... source permalink

Nitpicking here, but I thought I should mention this nonetheless: C++ and Python are languages, not particular implementations of said languages, so saying C++ is faster than Python can never, in principle, be correct. What you really are saying here is that the GNU C++ compiler version x.y.z produces code that sorts some data faster than a python program running under CPython version a.b.c that uses the same algorithm on the same data. There are some awful C++ compilers out there, and there are better ones. Similarly, for the python world, there is Jython, IronPython, etc. In theory I could write a python compiler that produces native code, just as a GNU C compiler produces native code for my system. Lisp is a perfect example here. CLSIP can compile Lisp code to C, which is then compiled to native code. It can also interpret Lisp code directly. Both methods produce different performance results, even though both methods involve the language Common Lisp.

What versions of g++ and python are you using? Adding icc and clang/llvm to the mix would be interesting too. icc is really good with floating point numbers, though I have heard it is not as good as gcc with integers in some cases (parts of the Linux kernel, for example).

on 2010-01-25 05:39:27
t3h_sood says... source permalink

*ignores the nitpicking*

IIRC, LLVM's strength is interprocedural analysis over run-time and link-time, which doesn't appear to have much of a place in this simple application of bubble sort. But, yeah... it'd be interesting to read about, I guess.

That aside, say NO to language wars! Say YES to text-editor wars!

Emacs > Vim.

on 2010-01-25 07:01:46

t3h_sood, I approve.

And, bvargo, I just knew someone was going to nitpick me apart because it's the interpreter/compiler that's really determining the speed, and not the language itself. So, for purposes of clarification:

$ python --version
Python 2.6.4
$ gcc --version
gcc (Ubuntu 4.4.1-4ubuntu9) 4.4.1
$ g++ --version
g++ (Ubuntu 4.4.1-4ubuntu9) 4.4.1

That is, CPython, since it is the most common implementation.

So yes, GCC 4.4.1 makes my code run faster than Python 2.6.4 on similar tasks using similar methods.

on 2010-01-25 07:25:45
bvargo says... source permalink

Wow, Ubuntu is using GCC version 4.4 by default already? That surprises me. I've run into some problems with some big-name software (e.g. boost), so I'm still on GCC 4.3 on my desktop (and 3.4 on my ARM box, but we won't talk about that).

My nitpicking came after a few weeks of working with Visual Studio for one of my classes. It is not the latest version, I admit, but some of the code it produces is quite awful, so I have a feeling it actually could be slower than Java in some cases. Ruby is also quite fun in this area; simply using a different version of Ruby (MRI) can result in runtimes that are orders of magnitude apart in some cases.

If you do run some comparisons with LLVM, I'd be interested in hearing about it. My Clang/LLVM FreeBSD box is really out of date at the moment, so I haven't tried it myself. Yes, LLVM is known for cross-file optimization where GCC cannot optimize, among other things, but it is still pretty good in other areas.

Editor wars: http://xkcd.com/378/. [xkcd.com] Enough said (with the possible addition that emacs supports tetris too).

on 2010-01-25 21:22:07
Scott says... source permalink

For what it's worth, here's a C version of this program using the qsort() standard library function and a little bit of smart memory management to decrease the number of malloc() calls and increase spatial locality. It also works with input files of an arbitrary number of lines, not hard coded like the C version in the original post. The result is a program that is over twice as fast as the C++ quick sort version (with the same compiler optimization levels).

$ time ./sort-c++

real 0m2.650s

user 0m2.495s

sys 0m0.155s

$ time ./sort-c

real 0m1.139s

user 0m1.024s

sys 0m0.114s

Results above were with a larger input file, 633510 lines, and gcc version 4.1.2 20080704 (Red Hat 4.1.2-48). I get similar results with gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5).

#include <stdio.h>

#include <string.h>

#include <stdlib.h>

int strpcmp(const void *a, const void *b) {

return strcmp(*(char const **)a, *(char const **)b);

}

int main() {

FILE *fin = fopen("files.txt", "r");

int c = 8192;

int i = 0;

int n = 0;

char *buf = malloc(c);

while(fgets(buf + i, c - i, fin)) {

n++;

i += strlen(buf + i) + 1;

if(i + 512 > c) {

c <<= 1;

buf = realloc(buf, c);

}

}

char **names = malloc(n * sizeof(char*));

int m;

i = 0;

for(m = 0; m < n; ++m) {

names = buf + i;

i += strlen(buf + i) + 1;

}

qsort(names, n, sizeof(char*), &strpcmp);

FILE *fout = fopen("files.out", "w");

for(i = 0; i < n; ++i) {

fputs(names, fout);

}

return 0;

}

on 2011-01-29 06:08:35.062316
New Comment
You're not logged in! Log in to be awesome!
Format: BBCode ReStructured Text

Author (max. 20 characters):