Problem: The following iterative sequence is defined for the set of positive integers:

\[n\rightarrow\begin{cases}n/2 & n \text{ even}\\ 3n+1 & n \text{ odd}\end{cases}\]

Using the rule above and starting with 13, we generate the following sequence:

\[13\rightarrow 40\rightarrow 20\rightarrow 10\rightarrow 5\rightarrow 16\rightarrow 8\rightarrow 4\rightarrow 2\rightarrow 1\]

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain? Note: Once the chain starts the terms are allowed to go above one million.

Idea Behind a Solution

I’ll refer to the “Collatz length of $n$” as the length of the chain from an integer $n$ to 1 using the above described sequence. If we were to calculate the Collatz length of each integer separately, that would be incredibly inefficient. In Python, that would look something like this.

First Python Solution

import time
 
start = time.time()
 
def collatz(n, count=1):
    while n > 1:
        count += 1
        if n % 2 == 0:
            n = n/2
        else:
            n = 3*n + 1
    return count
 
max = [0,0]
for i in range(1000000):
    c = collatz(i)
    if c > max[0]:
        max[0] = c
        max[1] = i
 
elapsed = (time.time() - start)
print "found %s at length %s in %s seconds" % (max[1],max[0],elapsed)

Now, this will actually determine the solution, but it is going to take a while, as shown when we run the code.

found 837799 at length 525 in 46.6846499443 seconds.

A Better Python Solution

What I’m going to do is to cache any Collatz numbers for integers below one million. The idea is that we can use the cached values to make calculations of new Collatz numbers more efficient. But, we don’t want to record every single number in the Collatz sequences that we’ll be using, because some of the sequences actually reach up into the hundreds of millions. We’ll make a list called TO_ADD, and we’ll only populate that with numbers for which Collatz lengths are unknown. Once known, the Collatz lengths will be stored for repeated use.

import time
 
start = time.time()
 
limit = 1000000
collatz_length = [0] * limit
collatz_length[1] = 1
max_length = [1,1]
 
for i in range(1,1000000):
    n,s = i,0
    TO_ADD = [] # collatz_length not yet known
    while n > limit - 1 or collatz_length[n] < 1:
        TO_ADD.append(n)
        if n % 2 == 0: n = n/2
        else: n = 3*n + 1
        s += 1
    # collatz_length now known from previous calculations
    p = collatz_length[n]
    for j in range(s):
        m = TO_ADD[j]
        if m < limit:
            new_length = collatz_length[n] + s - j
            collatz_length[m] = new_length
            if new_length > max_length[1]: max_length = [i,new_length]
 
elapsed = (time.time() - start)
print "found %s at length %s in %s seconds" % (max_length[0],max_length[1],elapsed)

This should return the same result, but in significantly less time.

found 837799 at length 525 in 5.96128201485 seconds

A First Cython Solution

If we take our original approach of computing each Collatz length from scratch, this might actually work slightly better in Cython.

%cython
 
import time
 
cdef collatz(unsigned int n):
    cdef unsigned count = 1
    while n > 1:
        count += 1
        if n % 2 == 0:
            n = n/2
        else:
            n = 3*n + 1
    return count
 
cdef find_max_collatz(unsigned int min, unsigned int max):
    cdef unsigned int m = 1
    cdef unsigned long num = 1
    cdef unsigned int count = 1
    cdef unsigned long iter = min
    while iter < max:
        count = collatz(iter)
        if count > m:
            m = count
            num = iter
        iter += 1
    return num
 
start = time.time()
max_found = find_max_collatz(1,1000000)   
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_found,elapsed)

In fact, when executed, we find that it is significantly better than our efficient Python code.

found 837799 in 0.604798078537 seconds

This just goes to show that even low efficiency machine/compiled code can drastically outperform efficient Python. But, how far can we take this Cython refinement? What if we were to recode our more efficient algorithm in Cython? It may look something like this.

A Better Cython Solution

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef find_max_collatz(unsigned long int max):
    cdef int *collatz_length = <int *>malloc(max * sizeof(int))
    cdef list TO_ADD # holds numbers of unknown collatz length
    cdef unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0
 
    # set initial collatz lengths
    iter = 0
    while iter < max:
        collatz_length[iter] = 0
        iter += 1
    collatz_length[1] = 1
 
    # iterate to max and find collatz lengths
    iter = 1
    while iter < max:
        n,s = iter,0
        TO_ADD = []
        while n > max - 1 or collatz_length[n] < 1:
            TO_ADD.append(n)
            if n % 2 == 0: n = n/2
            else: n = 3*n + 1
            s += 1
        # collatz length now known from previous calculations
        p = collatz_length[n]
        j = 0
        while j < s:
            m = TO_ADD[j]
            if m < max:
                new_length = collatz_length[n] + s - j
                collatz_length[m] = new_length
                if new_length > max_length:
                    max_length = new_length
                    ind = m
            j += 1
        iter += 1
 
    free(collatz_length)
    return ind
 
start = time.time()
max_collatz = find_max_collatz(1000000)
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_collatz,elapsed)

This gives us some relatively good results:

found 837799 in 0.46523308754 seconds

Still, it isn’t a great improvement over the naive Cython code. What’s going on? I bet that the TO_ADD data structure could be changed from a Python list (notice the “cdef list” definition) to a malloc’d C array. That will be a bit more work, but my gut instincts tell me that this is probably the bottleneck in our current Cython code. Let’s rewrite it a bit.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef find_max_collatz(unsigned long int max):
    cdef int *collatz_length = <int *>malloc(max * sizeof(int))
    cdef int *TO_ADD = <int *>malloc(600 * sizeof(int))
    cdef unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0
 
    # set initial collatz lengths and TO_ADD numbers
    iter = 0
    while iter < max:
        collatz_length[iter] = 0
        iter += 1
    collatz_length[1] = 1
    iter = 0
    while iter < 600:
        TO_ADD[iter] = 0
        iter += 1
 
    # iterate to max and find collatz lengths
    iter = 1
    while iter < max:
        n,s = iter,0
        while n > max - 1 or collatz_length[n] < 1:
            TO_ADD[s] = n
            if n % 2 == 0: n = n/2
            else: n = 3*n + 1
            s += 1
        # collatz length now known from previous calculations
        p = collatz_length[n]
        j = 0
        while j < s:
            m = TO_ADD[j]
            TO_ADD[j] = 0
            if m < max:
                new_length = collatz_length[n] + s - j
                collatz_length[m] = new_length
                if new_length > max_length:
                    max_length = new_length
                    ind = m
            j += 1
        iter += 1
 
    free(collatz_length)
    free(TO_ADD)
    return ind
 
start = time.time()
max_collatz = find_max_collatz(1000000)
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_collatz,elapsed)

Now, when we execute this code we get the following.

found 837799 in 0.0465848445892 seconds

That’s much better. So, by using Cython and writing things a bit more efficiently, the code executes 1119 times as fast.

C Solution

If I structure the algorithm in the same way, I don’t expect to gain much by rewriting things in C, but I’ll see what happens.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
 
int find_max_collatz(unsigned long max) {
    unsigned int collatz_length[max];
    unsigned int TO_ADD[600];
    unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0;
 
    // set initial collatz lengths and TO_ADD numbers
    iter = 0;
    while(iter < max) {
        collatz_length[iter] = 0;
        iter++;
    }
    collatz_length[1] = 1;
    iter = 0;
    while(iter < 600) {
        TO_ADD[iter] = 0;
        iter++;
    }
    // iterate to max and find collatz lengths
    iter = 1;
    while(iter < max) {
        n = iter;
        s = 0;
        while(n > max - 1 || collatz_length[n] < 1) {
            TO_ADD[s] = n;
            if(n % 2 == 0) n = n/2;
            else n = 3*n + 1;
            s++;
        }
        // collatz length now known from previous calculations
        p = collatz_length[n];
        j = 0;
        while(j < s) {
            m = TO_ADD[j];
            TO_ADD[j] = 0;
            if(m < max) {
                new_length = collatz_length[n] + s - j;
                collatz_length[m] = new_length;
                if(new_length > max_length) {
                    max_length = new_length;
                    ind = m;
                }
            }
            j++;
        }
        iter++;
    } 
    return ind;
}
 
 
int main(int argc, char **argv) {
    unsigned int max, i;
    time_t start, end;
    double total_time;
 
    start = time(NULL);
 
    for(i=0;i<1000;i++) max = find_max_collatz(1000000);
 
    end = time(NULL);
    total_time = difftime(end,start);
 
    printf("%d found in %lf seconds.\n",max,total_time);
 
    return 0;
}

We’re using 1,000 iterations to try and get a good idea of how this runs.

$ time ./problem-14
837799 found in 38.000000 seconds.
 
real	0m38.871s
user	0m38.120s

So, in C we’re getting each iteration in roughly 0.038 seconds, which is ever so slightly faster than the Cython code.


Comments are closed