A triangular pyramid is constructed using spherical balls so that each ball rests on exactly three balls of the next lower level.

Then, we calculate the number of paths leading from the apex to each position: A path starts at the apex and progresses downwards to any of the three spheres directly below the current position. Consequently, the number of paths to reach a certain position is the sum of the numbers immediately above it (depending on the position, there are up to three numbers above it).

The result is Pascal’s pyramid and the numbers at each level n are the coefficients of the trinomial expansion $(x + y + z)^n$. How many coefficients in the expansion of $(x + y + z)^{200000}$ are multiples of $10^{12}$?

Solution Using the Multinomial Theorem

The generalization of the binomial theorem is the multinomial theorem. It says that multinomials raised to exponents can be expanded using the formula
\[
(x_1+x_2+\cdots+x_m)^n=\sum_{{k_1+k_2+\cdots+k_m=n}\atop{0\le k_i\le n}}\left({n}\atop{k_1,k_2,\ldots,k_m}\right)\prod_{1\le t\le m}x_t^{k_t}
\]
where
\[
\left({n}\atop{k_1,k_2,\ldots,k_m}\right)=\frac{n!}{k_1!k_2!\cdots k_m!}.
\]
Of course, when m=2 this gives the binomial theorem. The sum is taken over all partitions $k_1+k_2+\cdots+k_m=n$ for integers $k_i$. If n=200000 abd m=3, then the terms in the expansion are given by
\[
\left({200000}\atop{k_1,k_2,k_3}\right)x_1^{k_1}x_2^{k_2}x_3^{k_3}=\frac{200000!}{k_1!k_2!k_3!}x_1^{k_1}x_2^{k_2}x_3^{k_3}
\]
where $k_1+k_2+k_3=200000$. It’s worth pointing out that each of the coefficients is an integer, and thus has a unique factorization into products of prime integers. Of course, there’s no way that we’re going to calculate these coefficients. We only need to know when they’re divisible by $10^{12}$. Thus, we only need to consider how many factors of 2 and 5 are involved.

First, we’ll create a function $p(n,d)$ that outputs how many factors of $d$ are included in $n!$. We have that
\[
p(n,d)=\left\lfloor\frac{n}{d}\right\rfloor+\left\lfloor\frac{n}{d^2}\right\rfloor+\left\lfloor\frac{n}{d^3}\right\rfloor+ \cdots+\left\lfloor\frac{n}{d^r}\right\rfloor,
\]
where $d^r$ is the highest power of $d$ dividing $n$. For instance, there are 199994 factors of 2 in 200000!. Since we’re wondering when our coefficients are divisible by $10^{12}=2^{12}5^{12}$, we’ll be using the values provided by $p(n,d)$ quite a bit for $d=2$ and $d=5$. We’ll store two lists:
\[
p2=[p(i,2)\text{ for }1\le i\le 200000]\quad\text{and}\quad p5=[p(i,5)\text{ for }1\le i\le 200000].
\]
For a given $k_1,k_2,k_3$, the corresponding coefficient is divisible by $10^{12}$ precisely when
\[
p2[k_1]+p2[k_2]+p2[k_3]<199983\ \text{and}\ p5[k_1]+p5[k_2]+p5[k_3]<49987.
\]
That is, this condition ensures that there are at least 12 more factors of 2 and 5 in the numerator of the fraction defining the coefficients.

Now, we know that $k_1+k_2+k_3=200000$, and we can exploit symmetry and avoid redundant computations if we assume $k_1\le k_2\le k_3$. Under this assumption, we always have
\[
k_1\le\left\lfloor\frac{200000}{3}\right\rfloor=66666.
\]
We know that $k_1+k_2+k_3=200000$ is impossible since 200000 isn't divisible by 3. It follows that we can only have (case 1) $k_1=k_2 < k_3$, or (case 2) $k_1 < k_2=k_3$, or (case 3) $k_1 < k_2 < k_3$.

In case 1, we iterate $0\le k_1\le 66666$, setting $k_2=k_1$ and $k_3=200000-k_1-k_2$. We check the condition, and when it is satisfied we record 3 new instances of coefficients (since we may permute the $k_i$ in 3 ways).

In case 2, we iterate $0\le k_1\le 66666$, and when $k_1$ is divisible by 2 we set $k_2=k_3=\frac{200000-k_1}{2}$. When the condition holds, we again record 3 new instance.

In case 3, we iterate $0\le k_1\le 66666$, and we iterate over $k_2=k_1+a$ where $1\le a < \left\lfloor\frac{200000-3k_1}{2}\right\rfloor$. Then $k_3=200000-k_1-k_2$. When the condition holds, we record 6 instances (since there are 6 permutations of 3 objects).

Cython Solution

I’ll provide two implementations, the first written in Cython inside Sage. Then, I’ll write a parallel solution in C.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
head_time = time.time()
 
cdef unsigned long p(unsigned long k, unsigned long d):
    cdef unsigned long power = d
    cdef unsigned long exp = 0
    while power <= k:
        exp += k / power
        power *= d
    return exp
 
cdef unsigned long * p_list(unsigned long n, unsigned long d):
    cdef unsigned long i = 0
    cdef unsigned long * powers = <unsigned long *>malloc((n+1)*sizeof(unsigned long))
    while i <= n:
        powers[i] = p(i,d)
        i += 1
    return powers
 
 
run_time = time.time()
 
# form a list of number of times each n! is divisible by 2.
cdef unsigned long * p2 = p_list(200000,2)
 
# form a list of number of times each n! is divisible by 5.
cdef unsigned long * p5 = p_list(200000,5)
 
cdef unsigned long k1, k2, k3, a
cdef unsigned long long result = 0
 
k1 = 0
while k1 <= 66666:
    # case 1: k1 = k2 < k3
    k2 = k1
    k3 = 200000 - k1 - k2
    if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
        result += 3
    # case 2: k1 < k2 = k3
    if k1 % 2 == 0:
        k2 = (200000 - k1)/2
        k3 = k2
        if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
            result += 3
    # case 3: k1 < k2 < k3
    a = 1
    while 2*a < (200000 - 3*k1):
        k2 = k1 + a
        k3 = 200000 - k1 - k2
        if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
            result += 6
        a += 1
    k1 += 1
 
 
free(p2)
free(p5)
 
 
elapsed_run = round(time.time() - run_time, 5)
elapsed_head = round(time.time() - head_time, 5)
 
print "Result: %s" % result
print "Runtime: %s seconds (total time: %s seconds)" % (elapsed_run, elapsed_head)

When executed, we find the correct result relatively quickly.

Result: 479742450
Runtime: 14.62538 seconds (total time: 14.62543 seconds)

C with OpenMP Solution

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <omp.h>
 
/*****************************************************************************/
/* function to determine how many factors of 'd' are in 'k!'                 */
/*****************************************************************************/
unsigned long p(unsigned long k, unsigned long d) {
    unsigned long power = d;
    unsigned long exp = 0;
    while (power <= k) {
        exp += k/power;
        power *= d;
    }
    return exp;
}
 
/*****************************************************************************/
/* create a list [p(0,d),p(1,d),p(2,d), ... ,p(n,d)] and return pointer      */
/*****************************************************************************/
unsigned long * p_list(unsigned long n, unsigned long d) {
    unsigned long i;
    unsigned long * powers = malloc((n+1)*sizeof(unsigned long));
    for (i=0;i<=n;i++) powers[i] = p(i,d);
    return powers;
}
 
/*****************************************************************************/
/* main                                                                      */
/*****************************************************************************/
int main(int argc, char **argv) {
    unsigned long k1, k2, k3, a;
    unsigned long long result = 0;
 
    unsigned long * p2 = p_list(200000, 2);
    unsigned long * p5 = p_list(200000, 5);
 
 
    #pragma omp parallel for private(k1,k2,k3,a) reduction(+ : result)
    for (k1=0;k1<66667;k1++) {
        // case 1: k1 = k2 < k3
        k2 = k1;
        k3 = 200000 - k1 - k2;
        if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
            result += 3;
        } 
        // case 2: k1 < k2 = k3
        if (k1 % 2 == 0) {
            k2 = (200000 - k1)/2;
            k3 = k2;
            if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
                result += 3;
            }
        }
        // case 3: k1 < k2 < k3
        for (a=1;2*a<(200000-3*k1);a++) {
            k2 = k1 + a;
            k3 = 200000 - k1 - k2;
            if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
                result += 6;
            }
        }
    }
 
    free(p2);
    free(p5);
 
    printf("result: %lld\n", result);
 
    return 0;
}

This can be compiled and optimized using GCC as follows.

$ gcc -O3 -fopenmp -o problem-154-omp problem-154-omp.c

When executed on a 16-core machine, we get the following result.

$ time ./problem-154-omp 
result: 479742450
 
real    0m1.487s

This appears to be the fastest solution currently known, according to the forum of solutions on Project Euler. The CPUs on the 16-core machine are pretty weak compared to modern standards. When running on a single core on a new Intel Core i7, the result is returned in about 4.7 seconds.


Problem

Euler published the remarkable quadratic formula:

\[n^2+n+41\]

It turns out that the formula will produce 40 primes for the consecutive values $n=0$ to $39$. However, when $n=40$, $40^2+40+41=40(40+1)+41$ is divisible by 41, and certainly when $n=41$, $41^2+41+41$ is clearly divisible by 41.

Using computers, the incredible formula $n^2-79n+1601$ was discovered, which produces 80 primes for the consecutive values $n=0$ to $79$. The produce of the coefficients, $-79$ and $1601$ is $-126479$.

Considering quadratics of the form:

$n^2+an+b$ where $|a|

where $|n|$ is the modulus/absolute value of $n$, e.g., $|11|=11$ and $|-4|=4$.

Find the product of the coefficients $a$ and $b$, for the quadratic expression that produces the maximum number of primes for consecutive values of $n$, starting with $n=0$.

Sage Solution

Some observations:

  • Clearly, $b$ must be a prime or a negative prime, since $n=0$ must result in a prime.
  • We can refine a lower bound on $b$ as follows: If a given combination $(a,b)$ results in $m$ consecutive primes, then clearly $b>m$, since otherwise we would obtain a factor of $m$ in the given polynomial. Also, when $a$ is negative, we must have $b>-(n^2+an)$, since the prime values $n^2+an+b$ must be positive. Since we know that $n^2+n+41$ returns 40 primes, we know that any interesting values for $a$ and $b$ must then satisfy $b>-(40^2+40a)$.

Given these observations, I’ll write a routine in Sage.

import time
 
start = time.time()
 
P = prime_range(1000)
a_max, b_max, c_max = 0, 0, 0
for a in range(-1000,1001):
    for b in P:
        if b &lt; -1600-40*a or b &lt; c_max: continue
        c, n = 0, 0
        while is_prime(n**2 + a*n + b):
            c += 1
            n += 1
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
 
prod = a_max * b_max
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

When executed, we obtain the following.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 8.53578400612 seconds

It’s entirely possible that we’re being a bit redundant with the integer values that we’re throwing at Sage’s “is_prime” function. We’ll cache a list of primes within the range that we’re considering. (I came up with the number 751000 below in the Python section.)

import time
 
start = time.time()
 
P = prime_range(751000)
L = [False] * 751000
for p in P: L[p] = True
 
a_max, b_max, c_max = 0, 0, 0
for a in range(-1000,1001):
    for b in range(1,1001):
        if L[b] is False: continue
        if b &lt; -1600-40*a or b &lt; c_max: continue
        c, n = 0, 0
        while L[n**2 + a*n + b] is True:
            c += 1
            n += 1
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
 
prod = a_max * b_max
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

This indeed executes more quickly, as is seen in the output.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 1.39028286934 seconds

Python Solution

Of course, Sage’s prime/factorization functionality isn’t available in Python, and so we’re going to need to create a prime sieve and store primes in some form. One of the things I need to know is how large the values $n^2+an+b$ can be. If we were to assume that the maximum value of the length of a prime sequence is 500 (which is an overestimate as we now know), then we’d have the maximum value of the quadratic given as:

500**2 + 1000 * 500 + 1000
751000

Let’s create a prime sieve that will return a list of all primes below a given number.

def primes_xrange(stop):
    primes = [True] * stop
    primes[0], primes[1] = [False] * 2
    L = []
    for ind, val in enumerate(primes):
        if val is True:
            primes[ind*2::ind] = [False] * (((stop - 1)//ind) - 1)
            L.append(ind)
    return L

Let’s verify that it works.

primes_xrange(20)
[2, 3, 5, 7, 11, 13, 17, 19]

How long does this function take to form all of the primes under 751000?

time P = primes_xrange(751000)
Time: CPU 0.39 s, Wall: 0.39 s

Now, we’ll take the Sage routine from above and write it completely in Python. Here, I’m using only the sieve list portion of the primes_xrange function I just created. As above, I’m just checking to see if values in that list are True or False, corresponding to primes and composite numbers. This should take a bit longer to run than the Sage version above.

import time
 
def primes_xrange(stop):
    primes = [True] * stop
    primes[0], primes[1] = [False] * 2
    for ind, val in enumerate(primes):
        if val is True:
            primes[ind*2::ind] = [False] * (((stop - 1)//ind) - 1)
    return primes
 
start = time.time()
 
P = primes_xrange(751000)
a_max, b_max, c_max = 0, 0, 0
for a in range(-1000,1001):
    for b in range(1,1001):
        if P[b] is False: continue
        if b &lt; -1600-40*a or b &lt; c_max: continue
        c, n = 0, 0
        while P[n**2 + a*n + b] is True:
            c += 1
            n += 1
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
 
prod = a_max * b_max
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

And it does run slightly slow compared to the Sage version, as expected.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 1.73107814789 seconds

Cython Solution

I’m going to take roughly the same approach as with the Python solution, but I’m going to malloc a C array instead of using the Python list. That should make things a bit more efficient. Before we do that, let’s see how much faster our routine runs if we just slap a “%cython” on the first line.

%cython
 
import time
 
def primes_xrange(stop):
    primes = [True] * stop
    primes[0], primes[1] = [False] * 2
    for ind, val in enumerate(primes):
        if val is True:
            primes[ind*2::ind] = [False] * (((stop - 1)//ind) - 1)
    return primes
 
start = time.time()
 
P = primes_xrange(751000)
a_max, b_max, c_max = 0, 0, 0
for a in range(-1000,1001):
    for b in range(1,1001):
        if P[b] is False: continue
        if b &lt; -1600-40*a or b &lt; c_max: continue
        c, n = 0, 0
        while P[n**2 + a*n + b] is True:
            c += 1
            n += 1
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
 
prod = a_max * b_max
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

We find that this does improve the timing, and our Python code (which hasn’t been modified at all) now runs more quickly than the Sage code.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 1.02033305168 seconds

Rewriting to take advantage of Cython’s C datatypes, how much faster can we obtain the result?

%cython
 
import time
from libc.stdlib cimport malloc, free
 
start = time.time()
 
cdef long i, j
cdef bint *primes = malloc(751000 * sizeof(bint))
primes[0], primes[1] = 0, 0
i = 2
while i &lt; 751000:
    primes[i] = 1
    i += 1
i = 2
while i &lt; 751000:
    if primes[i] == 1:
        j = 2 * i
        while j &lt; 751000:
            primes[j] = 0
            j += i
    i += 1
 
cdef long long a, b, c, t
cdef long long pol, n
cdef long a_max, b_max, c_max, prod
a_max, b_max, c_max = 0, 0, 0
a = -1000
while a &lt; 1001:
    b = 2
    while b &lt; 1001:
        if primes[b] == 0:
            b += 1
            continue
        t = -1600-40*a
        if b &lt; t or b &lt; c_max:
            b += 1
            continue
        c, n = 0, 0
        pol = n**2 + a*n + b
        while primes[pol] == 1:
            c += 1
            n += 1
            pol = n**2 + a*n + b
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
        b += 1
    a += 1
 
prod = a_max * b_max
 
free(primes)
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

Now, this should run much more quickly.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 0.0381400585175 seconds

Thus, the Sage version runs 1.245 times as fast as the Python version. The Cython version runs 45.387 times as fast as the Python version and 36.452 times as fast as the Sage version.


Problem

Let $d(n)$ be defined as the sum of proper divisors of $n$ (numbers less than $n$ which divide evenly into $n$).
If $d(a) = b$ and $d(b) = a$, where $a\neq b$, then $a$ and $b$ are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore $d(220) = 284$. The proper divisors of 284 are 1, 2, 4, 71 and 142; so $d(284) = 220$.

Evaluate the sum of all the amicable numbers under 10000.

Python Solution

I used the idea of creating a list and populating that list with values corresponding to the sum of divisors of numbers, and then I walk through the list and do some index and value checking.

import time
 
def sum_divisors(n):
    s = 0
    for i in range(1,n):
        if n % i == 0: s += i
    return s
 
def amicable_pairs_xrange(low,high):
    L = [sum_divisors(i) for i in range(low,high + 1)]
    pairs = []
    for i in range(high - low + 1):
        ind = L[i]
        if i + low < ind and low <= ind and ind <= high and L[ind - low] == i + low:
            pairs.append([i+low,ind])
    return pairs
 
def sum_pairs(pairs):
    return sum([sum(pair) for pair in pairs])
 
start = time.time()
 
ans = sum_pairs(amicable_pairs_xrange(1,10000))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

This returns the correct result.

31626 found in 7.86725997925 seconds

Cython Solution

One of the things I love about using Cython inside the Sage Notebook environment is that one can simply place a Cython statement inside Python code and often achieve some sort of speedup. All I’ve done in this first example is to do just that. I haven’t change the Python code at all yet.

%cython
 
import time
 
def sum_divisors(n):
    s = 0
    for i in range(1,n):
        if n % i == 0: s += i
    return s
 
def amicable_pairs_xrange(low,high):
    L = [sum_divisors(i) for i in range(low,high + 1)]
    pairs = []
    for i in range(high - low + 1):
        ind = L[i]
        if i + low < ind and low <= ind and ind <= high and L[ind - low] == i + low:
            pairs.append([i+low,ind])
    return pairs
 
def sum_pairs(pairs):
    return sum([sum(pair) for pair in pairs])
 
start = time.time()
 
ans = sum_pairs(amicable_pairs_xrange(1,10000))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

This runs faster than the original, and that blows my mind.

31626 found in 4.98208904266 seconds

We can improve on this obviously, by rewriting the involved functions using C datatypes.

%cython
 
import time
 
cdef sum_divisors(unsigned int n):
    cdef unsigned int s = 0, i = 1
    while i < n:
        if n % i == 0: s += i
        i += 1
    return s
 
cdef sum_amicable_pairs(unsigned int low, unsigned int high):
    cdef unsigned int a = low, b, sum = 0
    while a <= high:
        b = sum_divisors(a)
        if b > a and sum_divisors(b) == a:
            sum += a + b
        a += 1
 
    return sum
 
start = time.time()
 
ans = sum_amicable_pairs(1,10000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

We do achieve somewhat significant speedup now.

31626 found in 1.06677889824 seconds

C Solution

If I take the fast Cython version and rewrite it in C, I don’t get much better performance.

#include <stdio.h>
#include <stdlib.h>
 
unsigned int sum_divisors(unsigned int n) {
    unsigned int s = 0, i = 1;
    while(i < n) {
        if(n % i == 0) s = s + i;
        i++;
    }
    return s;
}
 
unsigned int sum_amicable_pairs(unsigned int low, unsigned int high) {
    unsigned int a = low, b, sum = 0;
    while(a <= high) {
        b = sum_divisors(a);
        if(b > a && sum_divisors(b) == a) sum = sum + a + b;
        a++;
    }
    return sum;
}
 
int main(int argc, char **argv) {
    unsigned int ans = sum_amicable_pairs(1,10000);
    printf("result: %d\n",ans);
    return 0;
}

The result, compiled with heavy optimization, is only slightly faster than the Cython code.

$ gcc -O3 -o problem-21 problem-21.c 
$ time ./problem-21
result: 31626
 
real	0m1.032s

About the only way to really improve this, beyond restructuring the algorithm, is to use OpenMP to run the process in parallel.

C with OpenMP

A parallel version using shared memory via OpenMP looks something like this. (I’ve rewritten it slightly.)

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
 
unsigned int sum_divisors(unsigned int n) {
    unsigned int s = 0, i = 1;
    while(i < n) {
        if(n % i == 0) s = s + i;
        i++;
    }
    return s;
}
 
int main(int argc, char **argv) {
    unsigned int a, b, nthreads, result = 0;
 
    #pragma omp parallel
    nthreads = omp_get_num_threads();
    printf("no of threads %d\n", nthreads);
 
    #pragma omp parallel for reduction(+:result) private(b)
    for(a=0; a < 10000; a++) {
        b = sum_divisors(a);
        if(b > a && sum_divisors(b) == a) result = result + a + b;
    }
 
    printf("result: %d\n",result);
    return 0;
}

Now, we get much better performance. I’m running this on a 16-core machine, where 3 of the cores are currently being used.

$ gcc -fopenmp -O3 -o problem-21-openmp problem-21-openmp.c 
$ time ./problem-21-openmp 
no of threads 16
result: 31626
 
real	0m0.140s

Problem: You are given the following information, but you may prefer to do some research for yourself.

  • 1 Jan 1900 was a Monday.
  • Thirty days has September,
    April, June and November.
    All the rest have thirty-one,
    Saving February alone,
    Which has twenty-eight, rain or shine.
    And on leap years, twenty-nine.
  • A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

Approach

There are several different ways to approach this. The easiest, I think, is to use the Gaussian formula for day of week. It is a purely mathematical formula that I have encoded in the following Python code.

Python Solution

import time
from math import floor
 
"""
Gaussian algorithm to determine day of week
"""
def day_of_week(year, month, day):
    """
    w = (d+floor(2.6*m-0.2)+y+floor(y/4)+floor(c/4)-2*c) mod 7
 
    Y = year - 1 for January or February
    Y = year for other months
    d = day (1 to 31)
    m = shifted month (March = 1, February = 12)
    y = last two digits of Y
    c = first two digits of Y
    w = day of week (Sunday = 0, Saturday = 6)
    """
 
    d = day
    m = (month - 3) % 12 + 1
    if m &gt; 10: Y = year - 1
    else: Y = year
    y = Y % 100
    c = (Y - (Y % 100)) / 100
 
    w = (d + floor(2.6 * m - 0.2) + y + floor(y/4) + floor(c/4) - 2*c) % 7
 
    return int(w)
 
"""
Compute the number of months starting on a given day of the week in a century
"""
def months_start_range(day,year_start,year_end):
    total = 0
    for year in range(year_start, year_end + 1):
        for month in range(1,13):
            if day_of_week(year, month, 1) == day: total += 1
    return total
 
start = time.time()
 
total = months_start_range(0,1901,2000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (total,elapsed)

This returns the correct result.

171 found in 0.0681998729706 seconds

That will run faster if executed directed in Python. Remember, I’m using the Sage notebook environment to execute most Python here. I’m also writing Cython in the same environment.

Cython Solution

There is a nearly trivial rewriting to Cython of the above Python code.

%cython
 
import time
from math import floor
 
"""
Gaussian algorithm to determine day of week
"""
cdef day_of_week(int year, int month, int day):
    """
    w = (d+floor(2.6*m-0.2)+y+floor(y/4)+floor(c/4)-2*c) mod 7
 
    Y = year - 1 for January or February
    Y = year for other months
    d = day (1 to 31)
    m = shifted month (March = 1, February = 12)
    y = last two digits of Y
    c = first two digits of Y
    w = day of week (Sunday = 0, Saturday = 6)
    """
 
    cdef int d = day
    cdef int m = (month - 3) % 12 + 1
    cdef int Y
    if m &gt; 10: Y = year - 1
    else: Y = year
    y = Y % 100
    cdef int c = (Y - (Y % 100)) / 100
 
    cdef double w
    w = (d + floor(2.6 * m - 0.2) + y + floor(y/4) + floor(c/4) - 2*c) % 7
 
    return int(w)
 
"""
Compute the number of months starting on a given day of the week in range of years
"""
cdef months_start_range(int day, int year_start,int year_end):
    cdef unsigned int total = 0
    cdef int year, month
    for year in range(year_start, year_end + 1):
        for month in range(1,13):
            if day_of_week(year, month, 1) == day: total += 1
    return total
 
start = time.time()
 
total = months_start_range(0,1901,2000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (total,elapsed)

The code is a bit longer, but it executes much faster.

171 found in 0.00387215614319 seconds

The Cython code runs roughly 18 times faster.

C Solution

The Cython code was used as a model to create more efficient C code. The only issue here is in maintaining the correct datatypes (not too hard, but compared to Cython it is a pain).

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
 
int day_of_week(int year, int month, int day) {
    // Using the Gaussian algorithm
    int d = day;
 
    double m = (double) ((month - 3) % 12 + 1);
    int Y;
    if(m > 10) Y = year - 1;
    else Y = year;
    int y = Y % 100;
    int c = (Y - (Y % 100)) / 100;
 
    int w = ((d+(int)floor(2.6*m-0.2)+y+ y/4 + c/4 -2*c))%7;
 
    return w; 
}
 
long months_start_range(int day, int year_start, int year_end) {
    unsigned long total = 0;
    int year, month;
    for(year = year_start; year < year_end; year++) {
        for(month = 1; month <= 12; month++) {
            if(day_of_week(year, month, 1)==day) total++;
        }
    }
    return total;
 
}
 
int main(int argc, char **argv) {
 
    int iter = 0;
    long total;
    while(iter < 100000) {
        total = months_start_range(0,1901,2000);
        iter++;
    }
    printf("Solution: %ld\n",total);
 
    return 0;
}

Notice that this executes the loop 100,000 times, as I’m trying to get a good idea of what the average runtime is. We compile with optimization and the [[[-lm]]] math option. We get the following result.

gcc -O3 -o problem-19 problem-19.c -lm
$ time ./problem-19
Solution: 171
 
real	0m6.240s

The C code runs roughly 62 times as fast as the Cython and roughly 1124 times as fast as the Python. Each iteration executes in about 6.2000e-5 seconds.


Problem: $2^{15}=32768$ and the sum of its digits is $3+2+7+6+8=26$.
What is the sum of the digits of the number $2^{1000}$?

Sage Solution

Sage’s built-in Python and functions makes this easy.

import time
 
start = time.time()
a = 2^1000
s = sum(a.digits())
elapsed = time.time() - start
 
print "%s found in %s seconds" % (s,elapsed)

It executes pretty quickly too.

1366 found in 0.000343084335327 seconds

An Easy Python Solution

Python itself also makes this problem too easy, due to string functions.

import time
 
def pow2sum(exp):
    pow = list(str(2**1000))
    return sum([int(i) for i in pow])
 
start = time.time()
n = pow2sum(1000)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

And it is fast:

1366 found in 0.000911951065063 seconds

Python Without Strings Attached

Let’s do our arithmetic without string functionality. Then, I’d note that $2^{1000}<10^{1000}$ and so we know that, at most, we're dealing with 1001 digits. So, we can create a routine to multiply 2 by itself 1000 times, maintaining the result at each step in a list instead of a single integer. (This is how you'd be forced to do things in C, where arbitrary length integers are pure fiction.) Since we're only multiplying by 2 at each iteration, we know that we'll either carry a zero or one to the next stage... which does make this routine a bit more simple than your typical multiply-by-list routine.

import time
 
def pow2sum(exp):
    L = [0] * exp # make a list exp long
    L[0] = 1
    for power in range(exp):
        carry, add = 0, 0
        for index in range(exp):
            prod = L[index] * 2 + carry
            if prod > 9:
                carry = 1
                prod = prod % 10
            else: carry = 0
            L[index] = prod
    return sum(L)
 
start = time.time()
n = pow2sum(1000)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

It runs relatively quickly, although not as quickly as the string version runs.

1366 found in 0.361020803452 seconds

Cython Solutions

We can first trivially rewrite the string Python version.

%cython
 
import time
 
cdef pow2sum(unsigned int exp):
    cdef list pow = list(str(2**1000))
    return sum([int(i) for i in pow])
 
start = time.time()
n = pow2sum(1000)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

When executed, we find that the string Cython code runs about 1.14 times as fast as the string Python code.

1366 found in 0.000799894332886 seconds

Of course, I’m more interested in seeing how much faster the arithmetic version runs.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef pow2sum(unsigned int exp):
    cdef int *L = <int *>malloc(exp * sizeof(int))
    cdef unsigned int power = 0,index = 0
    cdef unsigned int prod, carry, add
    while index < exp:
        L[index] = 0
        index += 1
    L[0] = 1
    while power < exp:
        carry, add, index = 0, 0, 0
        while index < exp:
            prod = L[index] * 2 + carry
            if prod > 9:
                carry = 1
                prod = prod % 10
            else: carry = 0
            L[index] = prod
            index += 1
        power += 1
    cdef int sum = 0
    index = 0
    while index < exp:
        sum += L[index]
        index += 1
    free(L)
    return sum
 
start = time.time()
n = pow2sum(1000)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

When executed, we get the following result.

1366 found in 0.00858902931213 seconds

Problem: Starting in the top left corner of a $2\times 2$ grid and moving only down and right, there are 6 routes to the bottom right corner.

How many routes are there through a $20\times 20$ grid?

A Great Interview Problem

This is precisely the sort of problem I expect to see in a technical coding interview, and knowing the various ways of solving a problem such as this will help you get far in those situations. But, even if you’re an experienced coder, the solutions may not be obvious at first. I want to walk through a few potential solutions and see how they perform.

Recursive Python Solution

I think the easiest solution, and one you should know (but possible don’t) is the recursive approach. The main idea here is to develop a function that will call itself, inching along to the right and down in all possible combinations, returning a value of 1 whenever it reaches the bottom-right and summing all of those 1s along the way. You should convince yourself that the procedure actually terminates (at what we call “the base case”) and returns the correct solution. Try doing it on paper on the 2×2 grid, or a 3×3 grid, and see what happens. Here’s what the Python code looks like.

#!/usr/bin/python
 
import time
 
 
gridSize = [20,20]
 
def recPath(gridSize):
    """
    Recursive solution to grid problem. Input is a list of x,y moves remaining.
    """
    # base case, no moves left
    if gridSize == [0,0]: return 1
    # recursive calls
    paths = 0
    # move left when possible
    if gridSize[0] > 0:
        paths += recPath([gridSize[0]-1,gridSize[1]])
    # move down when possible
    if gridSize[1] > 0:
        paths += recPath([gridSize[0],gridSize[1]-1])
 
    return paths
 
start = time.time()
result = recPath(gridSize)
elapsed = time.time() - start
 
print "result %s found in %s seconds" % (result, elapsed)

That’s great, and it will actually work, but it may take some time. Actually it takes a lot of time. By that, I mean it really, really takes a lot of time. When we run it on the 2×2 output, we get the following.

result 6 found in 9.05990600586e-06 seconds

When we run it on the 20×20 input, as the problem requires, it runs for about 4 hours before I kill it. Python is OK at recursive function calls, and it can handle/collapse the memory required moderately well, but what we’re doing here is manually constructing ALL possible paths to a solution, which isn’t incredibly efficient. Still, during a technical interview, this is definitely the first idea for a solution that should pop into your head. It’s quick and easy to write, and many problems can be solved like this.

Dynamic Python Solution

Our recursive approach suffers from the problem that we’re doing a lot of similar operations over and over. Can we learn anything from the smaller cases and build up from there? In this case, the answer is yes, but it requires us to build up some mathematics a bit more. This is actually quite easy if approached correctly. The idea is to construct the solution recursively, but differently this time.

Claim: Let $n$ be any natural number and consider the 2-dimensional sequence $S_{i,j}$ defined by

\[S_{i,j}=\begin{cases}1 &\text{ if }j=0\\ S_{i,j-1}+S_{i-1,j} &\text{ if }0<j<i\\ 2S_{i,j-1} &\text{ if }i=j\end{cases}\]

where $0\le i\le n$ and $0\le j\le i$. Then the number of non-backtracking paths from top-left to bottom-right through an $n\times n$ grid is $S_{n,n}$.

Proof: Consider a grid of $m$ rows and $n$ columns (we do not need to assume that the grid is square). Counting from the upper-left and starting at zero, denote the intersection/node in the $i$-th row and $j$-th column by $N_{i,j}$. Thus, the upper-left node is $N_{0,0}$, the bottom-left is $N_{m,0}$ and the bottom-right is $N_{m,n}$. Clearly, the number of paths from $N_{0,0}$ to any node along the far left or far top of the grid is only 1 (since we may only proceed down or left). Now, consider how many paths there are to $N_{1,1}$. We must first travel through $N_{0,1}$ or $N_{1,0}$. This yields only two paths to $N_{1,1}$. We can continue this process. In order to determine the total number of paths to any node $N_{i,j}$, we only need to sum together the total number of paths to $N_{i,j-1}$ and $N_{i-1,j}$. The process is understood graphically in the following diagram, where each new integer represents the number of paths to a node.

Thus, in a $4\times 4$ grid, there are 70 non-backtracking paths. How does this relate to the sequence $S_{i,j}$? Simply put, $S_{i,j}$ is the number of paths to node $N_{i,j}$. If we write out the sequence $S_{i,j}$ for $0\le i\le 4$ and $0\le j\le i$, we simply obtain the lower diagonal sequence embedded in the diagram above.

\[\begin{array}{ccccc} S_{0,0} = 1 & & & & \cr S_{1,0} = 1 & S_{1,1}=2 & & & \cr S_{2,0} = 1 & S_{2,1}=3 & S_{2,2}=6 & & \cr S_{3,0} = 1 & S_{3,1}=4 & S_{3,2}=10 & S_{3,3}=20 & \cr S_{4,0} = 1 & S_{4,1}=5 & S_{4,2}=15 & S_{4,3}=35 & S_{4,4}=70\end{array}\]

That completes the proof.

Let’s code that into a Python solution and see how fast it runs. My bet is that this will be considerably faster than our initial recursive solution. We can record the sequence $S_{i,j}$ as a single dimensional list that is simply rewritten at each iteration.

#!/usr/bin/python
 
import time
 
def route_num(cube_size):
    L = [1] * cube_size
    for i in range(cube_size):
        for j in range(i):
            L[j] = L[j]+L[j-1]
        L[i] = 2 * L[i - 1]
    return L[cube_size - 1]
 
start = time.time()
n = route_num(20)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

When executed, we get the following.

137846528820 found in 0.000205039978027 seconds

Cython Solution

We’ll recode things in Cython and see how much faster we can get the result returned.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef route_num(short cube_size):
    cdef unsigned long *L = <unsigned long *>malloc((cube_size + 1) * sizeof(unsigned long))
    cdef short j,i = 0
    while i <= cube_size:
        L[i] = 1
        i += 1
    i = 1
    while i <= cube_size:
        j = 1
        while j < i:
            L[j] = L[j]+L[j-1]
            j += 1
        L[i] = 2 * L[i - 1]
        i += 1
    cdef unsigned long c = L[cube_size]
    free(L)
    return c
 
start = time.time()
cdef unsigned long n = route_num(20)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

We now get the result a bit more quickly.

137846528820 found in 2.21729278564e-05 seconds

The Cython code executes roughly 9 times as fast as the Python.


Problem: The following iterative sequence is defined for the set of positive integers:

\[n\rightarrow\begin{cases}n/2 & n \text{ even}\\ 3n+1 & n \text{ odd}\end{cases}\]

Using the rule above and starting with 13, we generate the following sequence:

\[13\rightarrow 40\rightarrow 20\rightarrow 10\rightarrow 5\rightarrow 16\rightarrow 8\rightarrow 4\rightarrow 2\rightarrow 1\]

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain? Note: Once the chain starts the terms are allowed to go above one million.

Idea Behind a Solution

I’ll refer to the “Collatz length of $n$” as the length of the chain from an integer $n$ to 1 using the above described sequence. If we were to calculate the Collatz length of each integer separately, that would be incredibly inefficient. In Python, that would look something like this.

First Python Solution

import time
 
start = time.time()
 
def collatz(n, count=1):
    while n > 1:
        count += 1
        if n % 2 == 0:
            n = n/2
        else:
            n = 3*n + 1
    return count
 
max = [0,0]
for i in range(1000000):
    c = collatz(i)
    if c > max[0]:
        max[0] = c
        max[1] = i
 
elapsed = (time.time() - start)
print "found %s at length %s in %s seconds" % (max[1],max[0],elapsed)

Now, this will actually determine the solution, but it is going to take a while, as shown when we run the code.

found 837799 at length 525 in 46.6846499443 seconds.

A Better Python Solution

What I’m going to do is to cache any Collatz numbers for integers below one million. The idea is that we can use the cached values to make calculations of new Collatz numbers more efficient. But, we don’t want to record every single number in the Collatz sequences that we’ll be using, because some of the sequences actually reach up into the hundreds of millions. We’ll make a list called TO_ADD, and we’ll only populate that with numbers for which Collatz lengths are unknown. Once known, the Collatz lengths will be stored for repeated use.

import time
 
start = time.time()
 
limit = 1000000
collatz_length = [0] * limit
collatz_length[1] = 1
max_length = [1,1]
 
for i in range(1,1000000):
    n,s = i,0
    TO_ADD = [] # collatz_length not yet known
    while n > limit - 1 or collatz_length[n] < 1:
        TO_ADD.append(n)
        if n % 2 == 0: n = n/2
        else: n = 3*n + 1
        s += 1
    # collatz_length now known from previous calculations
    p = collatz_length[n]
    for j in range(s):
        m = TO_ADD[j]
        if m < limit:
            new_length = collatz_length[n] + s - j
            collatz_length[m] = new_length
            if new_length > max_length[1]: max_length = [i,new_length]
 
elapsed = (time.time() - start)
print "found %s at length %s in %s seconds" % (max_length[0],max_length[1],elapsed)

This should return the same result, but in significantly less time.

found 837799 at length 525 in 5.96128201485 seconds

A First Cython Solution

If we take our original approach of computing each Collatz length from scratch, this might actually work slightly better in Cython.

%cython
 
import time
 
cdef collatz(unsigned int n):
    cdef unsigned count = 1
    while n > 1:
        count += 1
        if n % 2 == 0:
            n = n/2
        else:
            n = 3*n + 1
    return count
 
cdef find_max_collatz(unsigned int min, unsigned int max):
    cdef unsigned int m = 1
    cdef unsigned long num = 1
    cdef unsigned int count = 1
    cdef unsigned long iter = min
    while iter < max:
        count = collatz(iter)
        if count > m:
            m = count
            num = iter
        iter += 1
    return num
 
start = time.time()
max_found = find_max_collatz(1,1000000)   
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_found,elapsed)

In fact, when executed, we find that it is significantly better than our efficient Python code.

found 837799 in 0.604798078537 seconds

This just goes to show that even low efficiency machine/compiled code can drastically outperform efficient Python. But, how far can we take this Cython refinement? What if we were to recode our more efficient algorithm in Cython? It may look something like this.

A Better Cython Solution

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef find_max_collatz(unsigned long int max):
    cdef int *collatz_length = <int *>malloc(max * sizeof(int))
    cdef list TO_ADD # holds numbers of unknown collatz length
    cdef unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0
 
    # set initial collatz lengths
    iter = 0
    while iter < max:
        collatz_length[iter] = 0
        iter += 1
    collatz_length[1] = 1
 
    # iterate to max and find collatz lengths
    iter = 1
    while iter < max:
        n,s = iter,0
        TO_ADD = []
        while n > max - 1 or collatz_length[n] < 1:
            TO_ADD.append(n)
            if n % 2 == 0: n = n/2
            else: n = 3*n + 1
            s += 1
        # collatz length now known from previous calculations
        p = collatz_length[n]
        j = 0
        while j < s:
            m = TO_ADD[j]
            if m < max:
                new_length = collatz_length[n] + s - j
                collatz_length[m] = new_length
                if new_length > max_length:
                    max_length = new_length
                    ind = m
            j += 1
        iter += 1
 
    free(collatz_length)
    return ind
 
start = time.time()
max_collatz = find_max_collatz(1000000)
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_collatz,elapsed)

This gives us some relatively good results:

found 837799 in 0.46523308754 seconds

Still, it isn’t a great improvement over the naive Cython code. What’s going on? I bet that the TO_ADD data structure could be changed from a Python list (notice the “cdef list” definition) to a malloc’d C array. That will be a bit more work, but my gut instincts tell me that this is probably the bottleneck in our current Cython code. Let’s rewrite it a bit.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef find_max_collatz(unsigned long int max):
    cdef int *collatz_length = <int *>malloc(max * sizeof(int))
    cdef int *TO_ADD = <int *>malloc(600 * sizeof(int))
    cdef unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0
 
    # set initial collatz lengths and TO_ADD numbers
    iter = 0
    while iter < max:
        collatz_length[iter] = 0
        iter += 1
    collatz_length[1] = 1
    iter = 0
    while iter < 600:
        TO_ADD[iter] = 0
        iter += 1
 
    # iterate to max and find collatz lengths
    iter = 1
    while iter < max:
        n,s = iter,0
        while n > max - 1 or collatz_length[n] < 1:
            TO_ADD[s] = n
            if n % 2 == 0: n = n/2
            else: n = 3*n + 1
            s += 1
        # collatz length now known from previous calculations
        p = collatz_length[n]
        j = 0
        while j < s:
            m = TO_ADD[j]
            TO_ADD[j] = 0
            if m < max:
                new_length = collatz_length[n] + s - j
                collatz_length[m] = new_length
                if new_length > max_length:
                    max_length = new_length
                    ind = m
            j += 1
        iter += 1
 
    free(collatz_length)
    free(TO_ADD)
    return ind
 
start = time.time()
max_collatz = find_max_collatz(1000000)
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_collatz,elapsed)

Now, when we execute this code we get the following.

found 837799 in 0.0465848445892 seconds

That’s much better. So, by using Cython and writing things a bit more efficiently, the code executes 1119 times as fast.

C Solution

If I structure the algorithm in the same way, I don’t expect to gain much by rewriting things in C, but I’ll see what happens.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
 
int find_max_collatz(unsigned long max) {
    unsigned int collatz_length[max];
    unsigned int TO_ADD[600];
    unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0;
 
    // set initial collatz lengths and TO_ADD numbers
    iter = 0;
    while(iter < max) {
        collatz_length[iter] = 0;
        iter++;
    }
    collatz_length[1] = 1;
    iter = 0;
    while(iter < 600) {
        TO_ADD[iter] = 0;
        iter++;
    }
    // iterate to max and find collatz lengths
    iter = 1;
    while(iter < max) {
        n = iter;
        s = 0;
        while(n > max - 1 || collatz_length[n] < 1) {
            TO_ADD[s] = n;
            if(n % 2 == 0) n = n/2;
            else n = 3*n + 1;
            s++;
        }
        // collatz length now known from previous calculations
        p = collatz_length[n];
        j = 0;
        while(j < s) {
            m = TO_ADD[j];
            TO_ADD[j] = 0;
            if(m < max) {
                new_length = collatz_length[n] + s - j;
                collatz_length[m] = new_length;
                if(new_length > max_length) {
                    max_length = new_length;
                    ind = m;
                }
            }
            j++;
        }
        iter++;
    } 
    return ind;
}
 
 
int main(int argc, char **argv) {
    unsigned int max, i;
    time_t start, end;
    double total_time;
 
    start = time(NULL);
 
    for(i=0;i<1000;i++) max = find_max_collatz(1000000);
 
    end = time(NULL);
    total_time = difftime(end,start);
 
    printf("%d found in %lf seconds.\n",max,total_time);
 
    return 0;
}

We’re using 1,000 iterations to try and get a good idea of how this runs.

$ time ./problem-14
837799 found in 38.000000 seconds.
 
real	0m38.871s
user	0m38.120s

So, in C we’re getting each iteration in roughly 0.038 seconds, which is ever so slightly faster than the Cython code.


The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be $1 + 2 + 3 + 4 + 5 + 6 + 7 = 28$. The first ten terms would be:

\[1, 3, 6, 10, 15, 21, 28, 36, 45, 55, \ldots\]

Let us list the factors of the first seven triangle numbers:

 1: 1
 3: 1,3
 6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28

We can see that 28 is the first triangle number to have over five divisors.

What is the value of the first triangle number to have over five hundred divisors?

Mathematics Behind a Solution

If $T(n)$ is the n-th triangular number, then we know that

\[T(n)=1+2+\cdots+n=\sum_{i=1}^n=\frac{n(n+1)}{2}.\]

Let’s assume that we know the prime factorizations of both $n$ and $n+1$, and we’ll write

\[n = p_1^{e_1}p_2^{e_2}\cdots p_s^{e^s}\quad\text{and}\quad n+1 = q_1^{f_1}q_2^{f_2}\cdots q_t^{f_t}\]

Notice that $n$ and $n+1$ cannot share any prime factors (as they are consecutive numbers), and so we know that all the $p_i$ and $q_i$ are distinct primes. Also, one and only one of $n$ and $n+1$ are divisible by 2. The exponents $e_i$ and $f_i$ are therefore the only things we really need to consider in order to determine the number of divisors of $T(n)$. The fact that $T(n)=(n(n+1))/2$ means that we’ll need to neglect a single power of two in the factorization of $n$ or $n+1$ (remember, only one is even). Let’s assume, without loss of generality, that $n$ is even and that $p_1=2$. Then, some quick combinatorics tell us that the total number of factors of $T(n)$ will be

\[(e_1)(e_2+1)\cdots(e_s+1)(f_1+1)(f_2+1)\cdots(f_t+1)=e_1\prod_{i=2}^s(e_i+1)\prod_{i=1}^t(f_i+1)\]

Even better, as we’re increasing our triangular numbers looking for the first one to satisfy the property in question, we only need to calculate the factorization of $n+1$ (as we already know that factorization of $n$, as it was previously $n+1$). This should decrease the runtime substantially.

Python Solution

The mathematics described above can be coded in Python as follows.

import time
 
def num_divisors(n):
    if n % 2 == 0: n = n/2
    divisors = 1
    count = 0
    while n % 2 == 0:
        count += 1
        n = n/2
    divisors = divisors * (count + 1)
    p = 3
    while n != 1:
        count = 0
        while n % p == 0:
            count += 1
            n = n/p
        divisors = divisors * (count + 1)
        p += 2
    return divisors
 
def find_triangular_index(factor_limit):
    n = 1
    lnum, rnum = num_divisors(n), num_divisors(n+1)
    while lnum * rnum < 500:
        n += 1
        lnum, rnum = rnum, num_divisors(n+1)
    return n
 
start = time.time()
index = find_triangular_index(500)
triangle = (index * (index + 1)) / 2
elapsed = (time.time() - start)
 
print "result %s returned in %s seconds." % (triangle,elapsed)

When executed, we get the following result.

result 76576500 returned in 3.48622322083 seconds.

Cython Solution

We can recode the Python to Cython and achieve a significant speedup.

%cython
 
import time
 
cdef num_divisors(unsigned long int n):
    if n % 2 == 0: n = n/2
    cdef unsigned long divisors = 1
    cdef unsigned int count = 0
    while n % 2 == 0:
        count += 1
        n = n/2
    divisors = divisors * (count + 1)
    cdef unsigned int p = 3
    while n != 1:
        count = 0
        while n % p == 0:
            count += 1
            n = n/p
        divisors = divisors * (count + 1)
        p += 2
    return divisors
 
cdef find_triangular_index(unsigned int factor_limit):
    cdef unsigned long n = 1
    cdef unsigned long lnum, rnum
    lnum, rnum = num_divisors(n), num_divisors(n+1)
    while lnum * rnum < 500:
        n += 1
        lnum, rnum = rnum, num_divisors(n+1)
    return n
 
start = time.time()
index = find_triangular_index(500)
triangle = (index * (index + 1)) / 2
elapsed = (time.time() - start)
 
print "result %s returned in %s seconds." % (triangle,elapsed)

When executed, we obtain the following.

result 76576500 returned in 0.122308969498 seconds.

Thus, the Cython executes roughly 28.5 times faster than the Python version.

Sage Solution

We could also solve the problem using Sage’s built-in functions.

import time
 
start = time.time()
 
i = 1
triangle = 1
while True:
    if len(triangle.divisors()) > 500: break
    i += 1
    triangle += i
 
elapsed = (time.time() - start)
print triangle
print "found %s in %s seconds" % (i,elapsed)

This yields the following.

found 76576500 in 0.597285985947 seconds

Problem: The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17. Find the sum of all the primes below two million.

Sage Solution

There is a direct Sage solution here. We’ll create a list of all primes under two million and sum over that list. We’ll also throw a time component in front so we know how long Sage takes to do the computations.

time sum(prime_range(2e6))

Executing this gives the following result.

142913828922
Time: CPU 0.04 s, Wall: 0.05 s

Python Solution

We’ll use our prime sieve from Project Euler Problem 7 and simply rewrite things a bit.

import time
 
def prime_sum(n):
    if n < 2: return 0
    if n == 2: return 2
    if n % 2 == 0: n += 1
    primes = [True] * n
    primes[0],primes[1] = [None] * 2
    sum = 0
    for ind,val in enumerate(primes):
        if val is True and ind > n ** 0.5 + 1:
            sum += ind
        elif val is True:
            primes[ind*2::ind] = [False] * (((n - 1)//ind) - 1)
            sum += ind
    return sum
 
start = time.time()
sum = prime_sum(2000000)
elapsed = (time.time() - start)
 
print "found %s in %s seconds" % (sum,elapsed)

This executes in under a second:

found 142913828922 in 0.838065862656 seconds

Cython Solution

What is a bit amusing is that we can do a trivial rewriting to Cython and achieve a speedup.

%cython
 
import time
 
cdef prime_sum(unsigned long n):
    if n < 2: return 0
    if n == 2: return 2
    if n % 2 == 0: n -= 1
    primes = [True] * n
    primes[0],primes[1] = [None] * 2
    sum = 0
    for ind,val in enumerate(primes):
        if val is True and ind > n ** 0.5 + 1:
            sum += ind
        elif val is True:
            primes[ind*2::ind] = [False] * (((n - 1)//ind) - 1)
            sum += ind
    return sum
 
start = time.time()
sum = prime_sum(2000000)
elapsed = (time.time() - start)
 
print "found %s in %s seconds" % (sum,elapsed)

When executed, we get:

found 142913828922 in 0.277922153473 seconds

The Cython code (which has minimal modifications) is about 3 times as fast as the Python code. Let’s see how much we can improve the Cython version. We’ll rewrite the list of integers as a pointer-based C array of Boolean values. This should speed things up a bit.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
# a function to sum all prime numbers below a given number
cdef prime_sum(unsigned long n):
 
    if n < 2: return 0
    if n == 2: return 2
    # if n is even, use next lowest number
    if n % 2 == 0: n -= 1
 
    cdef bint *is_prime = <bint *>malloc(n * sizeof(bint))
 
    # set all to prime and then sieve
    cdef unsigned long int i,j,sum = 0
    i = 0
    while i < n:
        is_prime[i] = 1
        i += 1
    #for i in range(n): is_prime[i] = 1
    is_prime[0],is_prime[1] = 0,0
    i = 2
    while i < n ** 0.5 + 1:
        if is_prime[i] == 1:
            # add this to the sum
            sum += i
            # remove all multiples of i from prime list
            j = i ** 2
            while j <= n:
                is_prime[j] = 0
                j += i
        i += 1
    while i <= n:
        if is_prime[i]: sum += i
        i += 1
    free(is_prime)
 
    return sum
 
start = time.time()
sum = prime_sum(2000000)
elapsed = (time.time() - start)
 
print "found %s in %s seconds" % (sum,elapsed)

When executed, we now get the following.

found 142913828922 in 0.0436608791351 seconds

The Cython executes roughly 19 times faster than the original Python code.


Problem: A Pythagorean triplet is a set of three natural numbers, $a<b<c$ such that

\[a^2+b^2=c^2\]

For example, $3^2+4^2=9+16=25=5^2$. There exists exactly one Pythagorean triplet for which $a+b+c=1000$. Find the product $abc$.

Python Solution

import time
 
def prod_triplet_w_sum(n):
    for i in range(1,n,1):
        for j in range(1,n-i,1):
            k = n-i-j
            if i**2+j**2==k**2:
                return i*j*k
    return 0
 
start = time.time()
product = prod_triplet_w_sum(1000)
elapsed = (time.time() - start)
 
print "found %s in %s seconds" % (product,elapsed)

When executed, this code gives the following result.

found 31875000 in 0.101438999176 seconds

Cython Solution

We take the same approach in Cython, but remove the Python iterators in an attempt to improve performance.

%cython
 
import time
 
cdef prod_triplet_w_sum(unsigned int n):
    cdef unsigned int i,j,k
    i = 1
    while i < n:
        j = 1
        while j < n - i:
            k = n - i - j
            if i**2 + j**2 == k**2:
                return i*j*k
            j += 1
        i += 1
    return 0
 
start = time.time()
product = prod_triplet_w_sum(1000)
elapsed = (time.time() - start)
 
print "found %s in %s seconds" % (product,elapsed)

This gives us the following.

found 31875000 in 0.00186085700989 seconds

Thus, the Cython version is roughly 56 times faster than the Python code.