I recently gave a talk on the Sage Notebook server that we recently constructed for members of our university community. In that talk, I gave a short introduction to Cython. I’m going to post some of the examples from that portion of the talk here, and explain a bit about what’s going on behind the scenes. Let me start by saying that I think Cython is amazing. The main idea is to create compiled C objects from Python code, callable from Python code.

Why would you want to do such a thing? In as few words as possible: Sage’s Python balances a weakly typed environment with an environment that is friendly to code in. The problem is, the type casts and function calls that are made can cause Sage’s Python (or Python in general) to slow down considerably. By compiling C code with strongly typed variables, we push the important calculations much closer to efficient machine code.

I’m doing these specific examples in the Sage Notebook. Cython was originally developed as part of Sage. If you use your Google foo, you can find info about the history of Cython and so on. I really just want to look at a few examples here.

## Recursively Generated Fibonacci Numbers

Let’s say that you wanted to create a simple function in Python to calculate Fibonacci numbers. I wouldn’t recommend doing the following, because it is incredibly recursive and inefficient. However, sometimes when I need to peg the processing power of a machine and it has Python, the following function does get the job done.

def fibonacci(n): if n == 0: return 0 elif n == 1: return 1 else: return fibonacci(n-1) + fibonacci(n-2)

If I run this in Sage and ask for the runtime, I get the following.

time fibonacci(33) 3524578 Time: CPU 6.53 s, Wall: 6.53 s

So, we’ve found the 33rd Fibonacci number in roughly 6.53 seconds. (The CPU time in essence measures the actual computation time. The Wall time measures the overall time. It could happen that the machine becomes overloaded and your process sits in a queue before or while being run, in which case the wall time will increase while the CPU time shouldn’t change much.)

How can we improve this using Cython? From the standpoint of using the Sage Notebook, the only thing I really have to do in this instance is to change two very small details. The function will now look like this.

%cython def fibonacci(int n): if n == 0: return 0 elif n == 1: return 1 else: return fibonacci(n-1) + fibonacci(n-2)

The first thing I’ve changed is I’ve added “%cython” to the top line of the cell in the notebook worksheet where my code is located. This simply tells Sage that I want to use Cython instead of the usual Sage Python. The second thing I have done is that I’ve declared the variable “n” as “int n”. I’ve told Sage that the specific C datatype that I want to use for “n” is an int. The complication here is that you need to know something about C datatypes. C is strongly typed and, unlike Python, being competent in C means knowing what the types are and knowing precisely how those types get stored and accessed in system memory. More on that, and why it matters, in a bit. Let’s see how our Cython code runs in the same calculation as before.

time fibonacci(33) 3524578 Time: CPU 1.60 s, Wall: 1.61 s

Great. We went from 6.53 seconds to 1.60 seconds of computation time. Given that we did very little work to modify our code, that seems like a good deal. Now, I’ve glossed over a bit of a detail here. When I execute the cell containing my Cython function in the Sage Notebook, what actually happens is as follows. Cython converts chunks of my code to C, compiles that C code, and makes it accessible to Sage’s Python. When this happens, a link to the C code becomes available. In this case, the C code can be found here exactly as it was produced by Cython:

__home_sageserver__sage_sage_notebook_sagenb_home_hilljb_37_code_sage5_spyx.c

If you read that, you’ll notice that it’s considerably different than any reasonable human would write the function in C. But, it compiled and obviously sped up our calculation.

## The Sum of Integers From 0 to n

Here’s an example of where Cython can give much greater speedup, but where one also much be very careful. Let’s consider a basic Python function to sum all the integers from 0 to some positive integer n. We have something like the following.

def sum_all(n): sum = 0 for i in range(n+1): sum = sum + i return sum

Running this function on a suitably large integer goes something like this.

time sum_all(10^8) 5000000050000000 Time: CPU 19.11 s, Wall: 19.12 s

You’d be tempted to write the same function in Cython like this:

%cython def sum_all(int n): cdef int sum = 0 cdef int i = 0 while i <= n: sum = sum + i i = i + 1 return sum

The input value for the function is again a C int. Inside the function, we define the variable “sum” as a C int set initially to zero. We also define a C int variable named “i”, on which we will perform our recursion. When we execute this code, it compiles fine. When we run it, we get the following.

time sum_all(10^8) 987459712 Time: CPU 0.16 s, Wall: 0.16 s

There are two things of note here: (1) It ran MUCH faster than the Python version. (2) The answer returned is wrong. The reason why the answer is wrong is because I haven’t used the correct C datatypes. Any C programmer would know that a C int (depending on the system architecture and compiler) won’t be capable of holding the large integers in question. The data will simply overflow the available memory space and the program won’t care at all. We need to make sure that the C datatypes we use are actually capable of holding the data we’re going to throw at them. We need an integer type in this situation that will hold a larger integer. So, I’ll go all out and use unsigned long long ints instead. That looks like this:

%cython def sum_all(long long unsigned int n): cdef long long unsigned int sum = 0 cdef long long unsigned int i = 0 while i <= n: sum = sum + i i = i + 1 return sum

And, when we run the calculation, we get the speed-up that we want (roughly 174 times over the original Python version) and the correct answer, in this case returned as a Python long:

time sum_all(10^8) 5000000050000000L Time: CPU 0.11 s, Wall: 0.11 s