Python Is Faster Than C++
Some of my friends here at college have started taking the introductory class for C++, and one of them put the general opinion thus: "C++ makes me love Python". Syntax-wise the jump from Python to C++ is pretty terrible, indeed. However, I gave him the usual stuff about how C++ is faster and more efficient. However, I'm not sure that argument is valid anymore.
I know, right? After telling him this is I wanted some conclusive evidence that C++ is indeed faster, to better prove my point. So, I rigged up a quick test. First, I got the names of all of Ubuntu's system files and put them in "files.txt".
$ find /usr > files.txt $ wc -l files.txt 151246 files.txt
151246 lines, that's a fair number. Next, I run this Python script to shuffle them:
from random import shuffle fnames = open('files.txt').read().split() shuffle(fnames) out = open('files.txt','w') for x in fnames: out.write(x+'\n') out.close()
Then, I wrote two versions of a bubble sort, the slowest kind of "standard" sort algorithms, but easy to code. What it consists of is looping over the list of items to be sorted, and if it finds any two items that are out of order, it swaps them. If there were no swaps done on an iteration, that means it's done, and it stops. I implemented this in Python:
def bubble(names): swapped = False for i in range(0,len(names)-1): if names[i]>names[i+1]: names[i], names[i+1] = names[i+1], names[i] swapped = True return swapped names = open('files.txt').read().split() while not bubble(names): pass out = open('sorted.txt', 'w') for x in names: out.write(str(x) + '\n') out.close()
... and in C++:
#include <fstream> #include <vector> #include <string> using namespace std; bool bubble(vector<string*>& names) { bool switched = false; for(vector<string*>::iterator it=names.begin(); it<names.end()-1; ++it) if (**it > **(it+1)) { string* tmp = *it; *it = *(it+1); *(it+1)= tmp; switched = true; } return switched; } int main() { ifstream in("files.txt"); vector<string*> names; string x; while (in >> x) names.push_back(new string(x)); while (!bubble(names)) continue; ofstream out("sorted.txt"); for(vector<string*>::iterator it=names.begin(); it<names.end(); ++it) out << **it << endl; out.close(); }
Well, there's the code/syntax complexity everyone talks about. But really, it's not that bad. Next, I compiled the C++ program:
$ g++ sort.cpp -o sort
And, in the end, I ran both the programs with the "time" command, which times their runtime, which in this case is an indicator of how fast both sorted the list of file names. So, here is the result for Python:
$ time python sort.py real 0m1.355s user 0m1.236s sys 0m0.084s
... and the one for C++:
$ time ./sort Segmentation fault
Oops! Yeah, to be noted that I spent a ton more time debugging my C++ code because of these segfault errors. But in the end, it worked:
$ time ./sort real 0m2.040s user 0m0.708s sys 0m1.332s
Here is a little breakdown of what these numbers mean:
- real: the sum of the other two, the total time the process took to
- run
- user: the amount of time the program spent actually running its
- own commands. When looping, this is what a program accumulates.
- sys: the amount of time spent processing by calls to the kernel on
- the program's behalf. For example, reading/writing a file counts here.
So, it would appear Python spent more time working itself than C++ did, but whatever C++ did under the surface was by far slower than what Python did, looking at the "real" times.
Also when Python is faster than C++.
Now, I'm confused. Is Python really that efficient? No. I have witnessed firsthand how my Mandelbrot fractal generator written in Python was far out-paced by one written in C, and I know that C++ is not any slower than C. Then? Was this test I devised wrong? Is Python more efficient at string sorting than C++? Are there really instances where Python can be faster?
Maybe my readers know...
^ agreed.
In this case, I think the problem is that your Python I/O reads in the WHOLE file into working memory before operating on it, while in your C++ version, you're reading in the file piecemeal, resulting in more I/O calls. Maybe (in the C++ version) reading the whole file into a string and then using an istringstream would make the I/O times more comparable?
Also, using the '<' comparison for iterators is only defined for iterators supporting the RandomAccessIterator concept. While, admittedly, you are using a RandomAccessContainer (std::vector), it's generally more standard to use the not-equal comparison as the loop condition for anything supporting the Iterator concept.
I would compare the running times of open('files.txt').read().split() to the whole ifstream business.
Looking at the time each program takes to run, it's clear the C++ version is making way more system calls than the python version, and that's probably because file IO in python (at the C layer) is written much more efficiently than the method you chose to use in your C++ version. As a rule, system calls are incredibly slow, and the fewer of them you make, the faster your program runs.
Another rule is that IO is incredibly slow regardless of the language you are using. This is why writing a typical web application in C++ is usually not worth it. Your application would only be marginally faster than the python version because most of the time is spent doing IO.
For applications which are processor bound instead of IO bound (like making fractals), C++ is always going to kick python's butt.