Introduction

A while ago in this post, I proved that the sum of divisors of an integer can be computed as a multiplicative function on primes. Thus, knowing the prime factorization of an integer results in a direct algorithmic method for computing the sum of divisors of the integer without any combinatorial hassle.

This little trick, as well as many other factorization techniques (e.g., greatest common divisor of two integers), pops up quite a bit in the Project Euler problems. They pop up so often that I eventually found myself wanting to collect the routines in a Python module instead of copying and pasting the code every time I needed to do factorizations in Python.

So, today I pushed this little module. (There’s no setup script yet. Just place it in your working directory and ‘import primes’ for now.)

There are other ways we could go about this (using Sage to coordinate factorization methods, for instance). And, of course, for the more complicated problems (e.g., higher numbered Project Euler problems), such an approach in Python won’t do. But, for a surprising number of Project Euler problems below 150 or so, this module should help a ton.

Factoring a Range of Integers

First, you should understand that there’s no elliptic curve method and no quadratic sieve. This module is designed to provide factorization tools (sum of divisors, gcd, lcm, etc.) for a large range or list of consecutive integers. While certain fancy methods are fast at factoring large random integers, they aren’t the best approach when factoring a big list of consecutive integers. A typical sieve and some careful planning works better in such a situation, as strange as that may feel.

Examples: Prime Sieve Class

Using IPython, we’ll explore this module a bit and see what we can do with it. Then, we’ll show how it can be used to solve a Project Euler problem. For now, let’s start by creating a prime sieve. We’ll ask Python to compute all primes below one million.

In [1]: import primes
 
In [2]: time S = primes.Sieve(1000000)
CPU times: user 0.12 s, sys: 0.02 s, total: 0.13 s
Wall time: 0.12 s

How many primes are now included in this sieve? That is, how many primes are there below one million?

In [3]: S.numPrimes
Out[3]: 78498

We can use the sieve to test any integer in the given range to see if it is a prime.

In [4]: S.isPrime(873)
Out[4]: False

We can make a list of primes from the given sieve, in case we wish to use an iterable over primes.

In [5]: time L = S.listPrimes()
CPU times: user 0.06 s, sys: 0.00 s, total: 0.06 s
Wall time: 0.05 s

Now, L is a list containing all prime values less than one million. We can also make a set. The set will be slightly more expensive to create, but will provide faster membership access once created.

In [6]: time M = S.setPrimes()
CPU times: user 0.09 s, sys: 0.01 s, total: 0.10 s
Wall time: 0.07 s

If you forget what the limit point of the sieve is, it is contained as a variable inside the Sieve class.

In [7]: S.limit
Out[7]: 1000000

Examples: RangeFactor Class

The RangeFactor class uses a prime sieve to factor a list of integers. It then makes various methods available that use this collection of factorization records. Let’s start by factoring all natural numbers less than one million.

In [8]: time F = primes.RangeFactor(1000000)
CPU times: user 2.13 s, sys: 0.06 s, total: 2.19 s
Wall time: 2.18 s

We can get a more detailed time analysis by asking for verbose output.

In [9]: time F = primes.RangeFactor(1000000, verbose=True)
primes : creating sieve
primes : sieve created in 0.140506982803 seconds
primes : creating factor records
primes : factor records created in 0.232191085815 seconds
primes : populating factor records
primes : factor records filled in 1.8158788681 seconds
CPU times: user 2.22 s, sys: 0.08 s, total: 2.29 s
Wall time: 2.27 s

One of the most simple and obvious things to do now is to ask for the factorization of an integer in the given range. Currently, the factors are in a list with each integer having a dictionary of prime keys and exponent value pairs. This example should make that relationship a bit more clear.

In [10]: F.factors[5]
Out[10]: {5: 1}
 
In [11]: F.factors[50]
Out[11]: {2: 1, 5: 2}

The next most obvious thing we could do is to examine the greatest common divisor and least common multiple of two integers. We could also simply ask if two integers are coprime (which is slightly more efficient than computing the gcd or lcm).

In [12]: F.gcd(100,29754)
Out[12]: 2
 
In [13]: F.gcd(100,6634)
Out[13]: 2
 
In [14]: F.gcd(100,500)
Out[14]: 100
 
In [15]: F.lcm(774,384)
Out[15]: 49536
 
In [16]: F.areCoprime(774,384)
Out[16]: False

Nothing here so far is too special. Really, we’re just making standard prime tools available more easily. But, now we can use the multiplicative function on primes to compute the sum of divisors of any integers in the given range. (Recall that a perfect number is a number that is the sum of all of its proper divisors.)

In [17]: F.sumDivisors(496)
Out[17]: 992
 
In [18]: F.sumProperDivisors(496)
Out[18]: 496
 
In [19]: F.isPerfect(496)
Out[19]: True

Now, we can do some slightly more complicated computations. Let’s sum all the perfect numbers under one million. First, let’s see what they are. Then, we’ll take the sum.

In [20]: for i in range(2,1000000):
   ....:     if F.isPerfect(i): print i
   ....:     
6
28
496
8128

What’s a bit crazy here is that we’re computing the sum of divisors for ALL integers less than one million to check if each integer is perfect or not. All things considered, that’s going on pretty quickly due to our use of that nice multiplicative function.

In [28]: time sum([i for i in range(2,F.sieve.limit) if F.isPerfect(i)])
CPU times: user 1.60 s, sys: 0.03 s, total: 1.62 s
Wall time: 1.58 s
Out[28]: 8658

A Sample Project Euler Problem

Problem 21 is stated as follows:

Let d(n) be defined as the sum of proper divisors of n (numbers less than n which divide evenly into n).
If d(a) = b and d(b) = a, where a ≠ b, then a and b are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore d(220) = 284. The proper divisors of 284 are 1, 2, 4, 71 and 142; so d(284) = 220.

Evaluate the sum of all the amicable numbers under 10000.

Python Solution

This is solved using the just discussed ‘primes’ module with the following code.

#!/usr/bin/python2
 
import time
import primes
 
start = time.time()
 
# factor all numbers from 1 to 10000
F = primes.RangeFactor(10000)
 
# compute the sum of all proper divisors for integers from 1 to 10000
S = [F.sumProperDivisors(i) for i in range(10000)]
 
# find amicable pairs and add to a set
pairs = set()
for i in range(2,10000):
    if i in pairs:
        pass
    else:
        if S[i] < 10000 and S[i]!=i and S[S[i]]==i:
            pairs.add(i)
            pairs.add(S[i])
 
elapsed = time.time() - start
 
print "Found result %s in %s seconds" % (sum(pairs), elapsed)

When executed, we get the following output.

Found result 31626 in 0.0326828956604 seconds

Problem

The fraction $\frac{49}{98}$ is a curious fraction, as an inexperienced mathematician in attempting to simplify it may incorrectly believe that $\frac{49}{98}=\frac{4}{8}$, which is correct, is obtained by cancelling the 9s.

We shall consider fractions like $\frac{30}{50}=\frac{3}{5}$ to be trivial examples.

There are exactly four non-trivial examples of this type of fraction, less than one in value, and containing two digits in the numerator and denominator.

If the product of these four fractions is given in its lowest common terms, find the value of the denominator.

Solution in Python

We really only need to iterate from 10 to 99 for the numberator and from the numerator to 99 for the denominator, since we require that the fraction be less than one. Also, we can ignore any denominators that have zeros (since the only other possible integer to match in the numerator is nonzero). We form a list from the numerator and denominator, check for redundancy, remove the redundancies when found, then cross multiply to find if the fractions are equivalent to their redundant-removed siblings.

import time
import fractions
 
start = time.time()
 
p = fractions.Fraction(1,1)
 
for a in range(10, 100, 1):
    for b in range(a+1, 100, 1):
        if b % 10 == 0 or a == b: continue
        La, Lb = [a/10, a%10], [b/10, b%10]
        if any(i in Lb for i in La) and not all(i in Lb for i in La):
            if La[0] in Lb: x = La[0]
            else: x = La[1]
            La.remove(x)
            Lb.remove(x)
            if a*Lb[0] == b*La[0]: p *= fractions.Fraction(La[0],Lb[0])
 
elapsed = time.time() - start
 
print "result %s found in %s seconds" % (p, elapsed)

We’ve used Python’s fractions module to make things a bit easier. When executed, we get the following.

result 1/100 found in 0.00531482696533 seconds

Problem

Consider all integer combinations of $ab$ for $2\le a\le 5$ and $2\le b\le 5$:

22=4, 23=8, 24=16, 25=32
32=9, 33=27, 34=81, 35=243
42=16, 43=64, 44=256, 45=1024
52=25, 53=125, 54=625, 55=3125

If they are then placed in numerical order, with any repeats removed, we get the following sequence of 15 distinct terms:

4, 8, 9, 16, 25, 27, 32, 64, 81, 125, 243, 256, 625, 1024, 3125

How many distinct terms are in the sequence generated by $ab$ for $2\le a\le 100$ and $2\le b\le 100$?

Solution in Python

Update: After a conversation with my friend and co-worker Larry, I’ve changed just two lines of this code – using a Python set instead of a list. That has taken the runtime down considerably. The set version runs in about 2% the time of the list version. This is because checking if an element is in a set is a constant time operation, while checking membership in a list is a linear time operation.

This is one of those problems that can be over-engineered. If you simply code the basic happy-path Python version, you find that it runs very quickly and returns the result. You could consider unique factorizations and so on … and that was my initial idea, but coding the “brute-force” Python version is all you really need here.

#!/usr/bin/python
 
import time
 
# L = [] # old version with a list
L = set() # new version with a set
 
limit = 101
 
start = time.time()
 
for a in range(2,limit):
    for b in range(2,limit):
        c = a**b
        if c in L: pass
        # else: L.append(c) # old list version
        L.add(c) # new version with set
 
elapsed = time.time() - start
 
print "found %s ints in %s seconds" % (len(L), elapsed)

When executed, we get the correct result in under a second.

$ python 29.py 
# found 9183 ints in 0.66696190834 seconds # old list version
found 9183 ints in 0.0150249004364 seconds

Problem

The $n^\text{th}$ term of the sequence of triangle numbers is given by, $t_n=\frac{1}{2}n(n+1)$; so the first ten triangle numbers are: 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, …

By converting each letter in a word to a number corresponding to its alphabetical position and adding these values we form a word value. For example, the word value for SKY is 19 + 11 + 25 = 55 = $t_{10}$. If the word value is a triangle number then we shall call the word a triangle word.

Using words.txt (right click and ‘Save Link/Target As…’), a 16K text file containing nearly two-thousand common English words, how many are triangle words?

Solution in Python

#!/usr/bin/python
 
import time
 
# start the clock
start = time.time()
 
# turn the string of words into a list of python strings
with open('words.txt','r') as f:
    words = f.read().split(',')
    words = [list(word.strip('\"')) for word in words]
    f.close()
 
# we should have an idea of how long the longest word is,
# giving us an idea of the magnitude of the triangle words
m = max([len(word) for word in words])
# form triangle numbers up to the given range
triangles = [n*(n + 1)/2 for n in range(1,2*m)]
 
# make a dictionary map for character values
vals = {}
s = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
for c in s:
    vals[c] = s.index(c) + 1
 
# count triangle words
triangle_words = 0
for word in words:
    if sum([vals[c] for c in word]) in triangles:
        triangle_words += 1
 
# terminate the clock
elapsed = time.time() - start
 
print "found %s triangle words in %s seconds" % (triangle_words, elapsed)

When executed, this returns as follows.

found 162 triangle words in 0.00315809249878 seconds

A triangular pyramid is constructed using spherical balls so that each ball rests on exactly three balls of the next lower level.

Then, we calculate the number of paths leading from the apex to each position: A path starts at the apex and progresses downwards to any of the three spheres directly below the current position. Consequently, the number of paths to reach a certain position is the sum of the numbers immediately above it (depending on the position, there are up to three numbers above it).

The result is Pascal’s pyramid and the numbers at each level n are the coefficients of the trinomial expansion $(x + y + z)^n$. How many coefficients in the expansion of $(x + y + z)^{200000}$ are multiples of $10^{12}$?

Solution Using the Multinomial Theorem

The generalization of the binomial theorem is the multinomial theorem. It says that multinomials raised to exponents can be expanded using the formula
\[
(x_1+x_2+\cdots+x_m)^n=\sum_{{k_1+k_2+\cdots+k_m=n}\atop{0\le k_i\le n}}\left({n}\atop{k_1,k_2,\ldots,k_m}\right)\prod_{1\le t\le m}x_t^{k_t}
\]
where
\[
\left({n}\atop{k_1,k_2,\ldots,k_m}\right)=\frac{n!}{k_1!k_2!\cdots k_m!}.
\]
Of course, when m=2 this gives the binomial theorem. The sum is taken over all partitions $k_1+k_2+\cdots+k_m=n$ for integers $k_i$. If n=200000 abd m=3, then the terms in the expansion are given by
\[
\left({200000}\atop{k_1,k_2,k_3}\right)x_1^{k_1}x_2^{k_2}x_3^{k_3}=\frac{200000!}{k_1!k_2!k_3!}x_1^{k_1}x_2^{k_2}x_3^{k_3}
\]
where $k_1+k_2+k_3=200000$. It’s worth pointing out that each of the coefficients is an integer, and thus has a unique factorization into products of prime integers. Of course, there’s no way that we’re going to calculate these coefficients. We only need to know when they’re divisible by $10^{12}$. Thus, we only need to consider how many factors of 2 and 5 are involved.

First, we’ll create a function $p(n,d)$ that outputs how many factors of $d$ are included in $n!$. We have that
\[
p(n,d)=\left\lfloor\frac{n}{d}\right\rfloor+\left\lfloor\frac{n}{d^2}\right\rfloor+\left\lfloor\frac{n}{d^3}\right\rfloor+ \cdots+\left\lfloor\frac{n}{d^r}\right\rfloor,
\]
where $d^r$ is the highest power of $d$ dividing $n$. For instance, there are 199994 factors of 2 in 200000!. Since we’re wondering when our coefficients are divisible by $10^{12}=2^{12}5^{12}$, we’ll be using the values provided by $p(n,d)$ quite a bit for $d=2$ and $d=5$. We’ll store two lists:
\[
p2=[p(i,2)\text{ for }1\le i\le 200000]\quad\text{and}\quad p5=[p(i,5)\text{ for }1\le i\le 200000].
\]
For a given $k_1,k_2,k_3$, the corresponding coefficient is divisible by $10^{12}$ precisely when
\[
p2[k_1]+p2[k_2]+p2[k_3]<199983\ \text{and}\ p5[k_1]+p5[k_2]+p5[k_3]<49987.
\]
That is, this condition ensures that there are at least 12 more factors of 2 and 5 in the numerator of the fraction defining the coefficients.

Now, we know that $k_1+k_2+k_3=200000$, and we can exploit symmetry and avoid redundant computations if we assume $k_1\le k_2\le k_3$. Under this assumption, we always have
\[
k_1\le\left\lfloor\frac{200000}{3}\right\rfloor=66666.
\]
We know that $k_1+k_2+k_3=200000$ is impossible since 200000 isn't divisible by 3. It follows that we can only have (case 1) $k_1=k_2 < k_3$, or (case 2) $k_1 < k_2=k_3$, or (case 3) $k_1 < k_2 < k_3$.

In case 1, we iterate $0\le k_1\le 66666$, setting $k_2=k_1$ and $k_3=200000-k_1-k_2$. We check the condition, and when it is satisfied we record 3 new instances of coefficients (since we may permute the $k_i$ in 3 ways).

In case 2, we iterate $0\le k_1\le 66666$, and when $k_1$ is divisible by 2 we set $k_2=k_3=\frac{200000-k_1}{2}$. When the condition holds, we again record 3 new instance.

In case 3, we iterate $0\le k_1\le 66666$, and we iterate over $k_2=k_1+a$ where $1\le a < \left\lfloor\frac{200000-3k_1}{2}\right\rfloor$. Then $k_3=200000-k_1-k_2$. When the condition holds, we record 6 instances (since there are 6 permutations of 3 objects).

Cython Solution

I’ll provide two implementations, the first written in Cython inside Sage. Then, I’ll write a parallel solution in C.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
head_time = time.time()
 
cdef unsigned long p(unsigned long k, unsigned long d):
    cdef unsigned long power = d
    cdef unsigned long exp = 0
    while power <= k:
        exp += k / power
        power *= d
    return exp
 
cdef unsigned long * p_list(unsigned long n, unsigned long d):
    cdef unsigned long i = 0
    cdef unsigned long * powers = <unsigned long *>malloc((n+1)*sizeof(unsigned long))
    while i <= n:
        powers[i] = p(i,d)
        i += 1
    return powers
 
 
run_time = time.time()
 
# form a list of number of times each n! is divisible by 2.
cdef unsigned long * p2 = p_list(200000,2)
 
# form a list of number of times each n! is divisible by 5.
cdef unsigned long * p5 = p_list(200000,5)
 
cdef unsigned long k1, k2, k3, a
cdef unsigned long long result = 0
 
k1 = 0
while k1 <= 66666:
    # case 1: k1 = k2 < k3
    k2 = k1
    k3 = 200000 - k1 - k2
    if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
        result += 3
    # case 2: k1 < k2 = k3
    if k1 % 2 == 0:
        k2 = (200000 - k1)/2
        k3 = k2
        if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
            result += 3
    # case 3: k1 < k2 < k3
    a = 1
    while 2*a < (200000 - 3*k1):
        k2 = k1 + a
        k3 = 200000 - k1 - k2
        if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
            result += 6
        a += 1
    k1 += 1
 
 
free(p2)
free(p5)
 
 
elapsed_run = round(time.time() - run_time, 5)
elapsed_head = round(time.time() - head_time, 5)
 
print "Result: %s" % result
print "Runtime: %s seconds (total time: %s seconds)" % (elapsed_run, elapsed_head)

When executed, we find the correct result relatively quickly.

Result: 479742450
Runtime: 14.62538 seconds (total time: 14.62543 seconds)

C with OpenMP Solution

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <omp.h>
 
/*****************************************************************************/
/* function to determine how many factors of 'd' are in 'k!'                 */
/*****************************************************************************/
unsigned long p(unsigned long k, unsigned long d) {
    unsigned long power = d;
    unsigned long exp = 0;
    while (power <= k) {
        exp += k/power;
        power *= d;
    }
    return exp;
}
 
/*****************************************************************************/
/* create a list [p(0,d),p(1,d),p(2,d), ... ,p(n,d)] and return pointer      */
/*****************************************************************************/
unsigned long * p_list(unsigned long n, unsigned long d) {
    unsigned long i;
    unsigned long * powers = malloc((n+1)*sizeof(unsigned long));
    for (i=0;i<=n;i++) powers[i] = p(i,d);
    return powers;
}
 
/*****************************************************************************/
/* main                                                                      */
/*****************************************************************************/
int main(int argc, char **argv) {
    unsigned long k1, k2, k3, a;
    unsigned long long result = 0;
 
    unsigned long * p2 = p_list(200000, 2);
    unsigned long * p5 = p_list(200000, 5);
 
 
    #pragma omp parallel for private(k1,k2,k3,a) reduction(+ : result)
    for (k1=0;k1<66667;k1++) {
        // case 1: k1 = k2 < k3
        k2 = k1;
        k3 = 200000 - k1 - k2;
        if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
            result += 3;
        } 
        // case 2: k1 < k2 = k3
        if (k1 % 2 == 0) {
            k2 = (200000 - k1)/2;
            k3 = k2;
            if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
                result += 3;
            }
        }
        // case 3: k1 < k2 < k3
        for (a=1;2*a<(200000-3*k1);a++) {
            k2 = k1 + a;
            k3 = 200000 - k1 - k2;
            if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
                result += 6;
            }
        }
    }
 
    free(p2);
    free(p5);
 
    printf("result: %lld\n", result);
 
    return 0;
}

This can be compiled and optimized using GCC as follows.

$ gcc -O3 -fopenmp -o problem-154-omp problem-154-omp.c

When executed on a 16-core machine, we get the following result.

$ time ./problem-154-omp 
result: 479742450
 
real    0m1.487s

This appears to be the fastest solution currently known, according to the forum of solutions on Project Euler. The CPUs on the 16-core machine are pretty weak compared to modern standards. When running on a single core on a new Intel Core i7, the result is returned in about 4.7 seconds.


The arithmetic sequence, 1487, 4817, 8147, in which each of the terms increases by 3330, is unusual in two ways: (i) each of the three terms are prime, and, (ii) each of the 4-digit numbers are permutations of one another.

There are no arithmetic sequences made up of three 1-, 2-, or 3-digit primes, exhibiting this property, but there is one other 4-digit increasing sequence.

What 12-digit number do you form by concatenating the three terms in this sequence?

Sage Solution

There are really only a handful of ways to go about this, and the timing isn’t going to be too substantial. So, I’m simply going to write an in-line Sage code block and execute it. The idea is to get a list of primes between 1,000 and 10,000, sort their digit decompositions, and then (after seeing which primes have the same digit decompositions), see which prime subsets have an arithmetic sequence underlying the distances between them. No too horribly interesting, but the code below does the job.

import time
 
start = time.time()
 
# make a list of primes
P = prime_range(1000,10000)
 
# for each prime, make a list of digits, and sort that list
L = [p.digits() for p in P]
for i in range(len(L)):
    L[i].sort()
L.sort()
 
# make a list of digits that appear for at least 3 primes
M = []
for i in range(len(L)-3):
    if L[i]==L[i+1] and L[i+1]==L[i+2] and L[i+2]==L[i+3]:
        if L[i] not in M: M.append(L[i])
 
# N is a list of lists of at least 3 primes with the same 4 digits
N = []
for m in M:
    N.append([])
for p in P:
    s = p.digits()
    s.sort()
    if s in M:
        r = M.index(s)
        N[r].append(p)
 
# Use a Sage combinations iterator to examine sets of 3 primes
for n in N:
    X = combinations_iterator(n, 3)
    for x in X:
        if x[1]-x[0] == x[2]-x[1]: print x
elapsed = time.time() - start
 
print "time elapsed: %s seconds" % elapsed

When executed, we get the following result.

[1487, 4817, 8147]
[2969, 6299, 9629]
time elapsed: 0.0485339164734 seconds

Problem

Euler published the remarkable quadratic formula:

\[n^2+n+41\]

It turns out that the formula will produce 40 primes for the consecutive values $n=0$ to $39$. However, when $n=40$, $40^2+40+41=40(40+1)+41$ is divisible by 41, and certainly when $n=41$, $41^2+41+41$ is clearly divisible by 41.

Using computers, the incredible formula $n^2-79n+1601$ was discovered, which produces 80 primes for the consecutive values $n=0$ to $79$. The produce of the coefficients, $-79$ and $1601$ is $-126479$.

Considering quadratics of the form:

$n^2+an+b$ where $|a|

where $|n|$ is the modulus/absolute value of $n$, e.g., $|11|=11$ and $|-4|=4$.

Find the product of the coefficients $a$ and $b$, for the quadratic expression that produces the maximum number of primes for consecutive values of $n$, starting with $n=0$.

Sage Solution

Some observations:

  • Clearly, $b$ must be a prime or a negative prime, since $n=0$ must result in a prime.
  • We can refine a lower bound on $b$ as follows: If a given combination $(a,b)$ results in $m$ consecutive primes, then clearly $b>m$, since otherwise we would obtain a factor of $m$ in the given polynomial. Also, when $a$ is negative, we must have $b>-(n^2+an)$, since the prime values $n^2+an+b$ must be positive. Since we know that $n^2+n+41$ returns 40 primes, we know that any interesting values for $a$ and $b$ must then satisfy $b>-(40^2+40a)$.

Given these observations, I’ll write a routine in Sage.

import time
 
start = time.time()
 
P = prime_range(1000)
a_max, b_max, c_max = 0, 0, 0
for a in range(-1000,1001):
    for b in P:
        if b &lt; -1600-40*a or b &lt; c_max: continue
        c, n = 0, 0
        while is_prime(n**2 + a*n + b):
            c += 1
            n += 1
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
 
prod = a_max * b_max
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

When executed, we obtain the following.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 8.53578400612 seconds

It’s entirely possible that we’re being a bit redundant with the integer values that we’re throwing at Sage’s “is_prime” function. We’ll cache a list of primes within the range that we’re considering. (I came up with the number 751000 below in the Python section.)

import time
 
start = time.time()
 
P = prime_range(751000)
L = [False] * 751000
for p in P: L[p] = True
 
a_max, b_max, c_max = 0, 0, 0
for a in range(-1000,1001):
    for b in range(1,1001):
        if L[b] is False: continue
        if b &lt; -1600-40*a or b &lt; c_max: continue
        c, n = 0, 0
        while L[n**2 + a*n + b] is True:
            c += 1
            n += 1
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
 
prod = a_max * b_max
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

This indeed executes more quickly, as is seen in the output.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 1.39028286934 seconds

Python Solution

Of course, Sage’s prime/factorization functionality isn’t available in Python, and so we’re going to need to create a prime sieve and store primes in some form. One of the things I need to know is how large the values $n^2+an+b$ can be. If we were to assume that the maximum value of the length of a prime sequence is 500 (which is an overestimate as we now know), then we’d have the maximum value of the quadratic given as:

500**2 + 1000 * 500 + 1000
751000

Let’s create a prime sieve that will return a list of all primes below a given number.

def primes_xrange(stop):
    primes = [True] * stop
    primes[0], primes[1] = [False] * 2
    L = []
    for ind, val in enumerate(primes):
        if val is True:
            primes[ind*2::ind] = [False] * (((stop - 1)//ind) - 1)
            L.append(ind)
    return L

Let’s verify that it works.

primes_xrange(20)
[2, 3, 5, 7, 11, 13, 17, 19]

How long does this function take to form all of the primes under 751000?

time P = primes_xrange(751000)
Time: CPU 0.39 s, Wall: 0.39 s

Now, we’ll take the Sage routine from above and write it completely in Python. Here, I’m using only the sieve list portion of the primes_xrange function I just created. As above, I’m just checking to see if values in that list are True or False, corresponding to primes and composite numbers. This should take a bit longer to run than the Sage version above.

import time
 
def primes_xrange(stop):
    primes = [True] * stop
    primes[0], primes[1] = [False] * 2
    for ind, val in enumerate(primes):
        if val is True:
            primes[ind*2::ind] = [False] * (((stop - 1)//ind) - 1)
    return primes
 
start = time.time()
 
P = primes_xrange(751000)
a_max, b_max, c_max = 0, 0, 0
for a in range(-1000,1001):
    for b in range(1,1001):
        if P[b] is False: continue
        if b &lt; -1600-40*a or b &lt; c_max: continue
        c, n = 0, 0
        while P[n**2 + a*n + b] is True:
            c += 1
            n += 1
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
 
prod = a_max * b_max
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

And it does run slightly slow compared to the Sage version, as expected.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 1.73107814789 seconds

Cython Solution

I’m going to take roughly the same approach as with the Python solution, but I’m going to malloc a C array instead of using the Python list. That should make things a bit more efficient. Before we do that, let’s see how much faster our routine runs if we just slap a “%cython” on the first line.

%cython
 
import time
 
def primes_xrange(stop):
    primes = [True] * stop
    primes[0], primes[1] = [False] * 2
    for ind, val in enumerate(primes):
        if val is True:
            primes[ind*2::ind] = [False] * (((stop - 1)//ind) - 1)
    return primes
 
start = time.time()
 
P = primes_xrange(751000)
a_max, b_max, c_max = 0, 0, 0
for a in range(-1000,1001):
    for b in range(1,1001):
        if P[b] is False: continue
        if b &lt; -1600-40*a or b &lt; c_max: continue
        c, n = 0, 0
        while P[n**2 + a*n + b] is True:
            c += 1
            n += 1
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
 
prod = a_max * b_max
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

We find that this does improve the timing, and our Python code (which hasn’t been modified at all) now runs more quickly than the Sage code.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 1.02033305168 seconds

Rewriting to take advantage of Cython’s C datatypes, how much faster can we obtain the result?

%cython
 
import time
from libc.stdlib cimport malloc, free
 
start = time.time()
 
cdef long i, j
cdef bint *primes = malloc(751000 * sizeof(bint))
primes[0], primes[1] = 0, 0
i = 2
while i &lt; 751000:
    primes[i] = 1
    i += 1
i = 2
while i &lt; 751000:
    if primes[i] == 1:
        j = 2 * i
        while j &lt; 751000:
            primes[j] = 0
            j += i
    i += 1
 
cdef long long a, b, c, t
cdef long long pol, n
cdef long a_max, b_max, c_max, prod
a_max, b_max, c_max = 0, 0, 0
a = -1000
while a &lt; 1001:
    b = 2
    while b &lt; 1001:
        if primes[b] == 0:
            b += 1
            continue
        t = -1600-40*a
        if b &lt; t or b &lt; c_max:
            b += 1
            continue
        c, n = 0, 0
        pol = n**2 + a*n + b
        while primes[pol] == 1:
            c += 1
            n += 1
            pol = n**2 + a*n + b
        if c &gt; c_max:
            a_max, b_max, c_max = a, b, c
        b += 1
    a += 1
 
prod = a_max * b_max
 
free(primes)
 
elapsed = time.time() - start
 
print "a=%s, b=%s, longest sequence = %s, prod=%s\nfound in %s seconds"\
% (a_max, b_max, c_max, prod, elapsed)

Now, this should run much more quickly.

a=-61, b=971, longest sequence = 71, prod=-59231
found in 0.0381400585175 seconds

Thus, the Sage version runs 1.245 times as fast as the Python version. The Cython version runs 45.387 times as fast as the Python version and 36.452 times as fast as the Sage version.


Problem

Let $d(n)$ be defined as the sum of proper divisors of $n$ (numbers less than $n$ which divide evenly into $n$).
If $d(a) = b$ and $d(b) = a$, where $a\neq b$, then $a$ and $b$ are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore $d(220) = 284$. The proper divisors of 284 are 1, 2, 4, 71 and 142; so $d(284) = 220$.

Evaluate the sum of all the amicable numbers under 10000.

Python Solution

I used the idea of creating a list and populating that list with values corresponding to the sum of divisors of numbers, and then I walk through the list and do some index and value checking.

import time
 
def sum_divisors(n):
    s = 0
    for i in range(1,n):
        if n % i == 0: s += i
    return s
 
def amicable_pairs_xrange(low,high):
    L = [sum_divisors(i) for i in range(low,high + 1)]
    pairs = []
    for i in range(high - low + 1):
        ind = L[i]
        if i + low < ind and low <= ind and ind <= high and L[ind - low] == i + low:
            pairs.append([i+low,ind])
    return pairs
 
def sum_pairs(pairs):
    return sum([sum(pair) for pair in pairs])
 
start = time.time()
 
ans = sum_pairs(amicable_pairs_xrange(1,10000))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

This returns the correct result.

31626 found in 7.86725997925 seconds

Cython Solution

One of the things I love about using Cython inside the Sage Notebook environment is that one can simply place a Cython statement inside Python code and often achieve some sort of speedup. All I’ve done in this first example is to do just that. I haven’t change the Python code at all yet.

%cython
 
import time
 
def sum_divisors(n):
    s = 0
    for i in range(1,n):
        if n % i == 0: s += i
    return s
 
def amicable_pairs_xrange(low,high):
    L = [sum_divisors(i) for i in range(low,high + 1)]
    pairs = []
    for i in range(high - low + 1):
        ind = L[i]
        if i + low < ind and low <= ind and ind <= high and L[ind - low] == i + low:
            pairs.append([i+low,ind])
    return pairs
 
def sum_pairs(pairs):
    return sum([sum(pair) for pair in pairs])
 
start = time.time()
 
ans = sum_pairs(amicable_pairs_xrange(1,10000))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

This runs faster than the original, and that blows my mind.

31626 found in 4.98208904266 seconds

We can improve on this obviously, by rewriting the involved functions using C datatypes.

%cython
 
import time
 
cdef sum_divisors(unsigned int n):
    cdef unsigned int s = 0, i = 1
    while i < n:
        if n % i == 0: s += i
        i += 1
    return s
 
cdef sum_amicable_pairs(unsigned int low, unsigned int high):
    cdef unsigned int a = low, b, sum = 0
    while a <= high:
        b = sum_divisors(a)
        if b > a and sum_divisors(b) == a:
            sum += a + b
        a += 1
 
    return sum
 
start = time.time()
 
ans = sum_amicable_pairs(1,10000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

We do achieve somewhat significant speedup now.

31626 found in 1.06677889824 seconds

C Solution

If I take the fast Cython version and rewrite it in C, I don’t get much better performance.

#include <stdio.h>
#include <stdlib.h>
 
unsigned int sum_divisors(unsigned int n) {
    unsigned int s = 0, i = 1;
    while(i < n) {
        if(n % i == 0) s = s + i;
        i++;
    }
    return s;
}
 
unsigned int sum_amicable_pairs(unsigned int low, unsigned int high) {
    unsigned int a = low, b, sum = 0;
    while(a <= high) {
        b = sum_divisors(a);
        if(b > a && sum_divisors(b) == a) sum = sum + a + b;
        a++;
    }
    return sum;
}
 
int main(int argc, char **argv) {
    unsigned int ans = sum_amicable_pairs(1,10000);
    printf("result: %d\n",ans);
    return 0;
}

The result, compiled with heavy optimization, is only slightly faster than the Cython code.

$ gcc -O3 -o problem-21 problem-21.c 
$ time ./problem-21
result: 31626
 
real	0m1.032s

About the only way to really improve this, beyond restructuring the algorithm, is to use OpenMP to run the process in parallel.

C with OpenMP

A parallel version using shared memory via OpenMP looks something like this. (I’ve rewritten it slightly.)

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
 
unsigned int sum_divisors(unsigned int n) {
    unsigned int s = 0, i = 1;
    while(i < n) {
        if(n % i == 0) s = s + i;
        i++;
    }
    return s;
}
 
int main(int argc, char **argv) {
    unsigned int a, b, nthreads, result = 0;
 
    #pragma omp parallel
    nthreads = omp_get_num_threads();
    printf("no of threads %d\n", nthreads);
 
    #pragma omp parallel for reduction(+:result) private(b)
    for(a=0; a < 10000; a++) {
        b = sum_divisors(a);
        if(b > a && sum_divisors(b) == a) result = result + a + b;
    }
 
    printf("result: %d\n",result);
    return 0;
}

Now, we get much better performance. I’m running this on a 16-core machine, where 3 of the cores are currently being used.

$ gcc -fopenmp -O3 -o problem-21-openmp problem-21-openmp.c 
$ time ./problem-21-openmp 
no of threads 16
result: 31626
 
real	0m0.140s

$n!$ means $n\times(n-1)\times\cdots\times 3\times 2\times 1$. For example, $10!=10\times 9\times\cdots\times 2\times 1=3628800$ and the sum of the digits in the number $10!$ is 27. Find the sum of the digits in $100!$.

Sage Solution

There is a single-line solution in Sage. I’ve added a bit more here just to determine the timing.

import time
 
start = time.time()
 
s = sum(factorial(100).digits())
 
elapsed = time.time() - start
 
print "Solution %s found after %s seconds" % (s,elapsed)

When executed, we find the following.

Solution 648 found after 0.000201940536499 seconds

Python Solution

Without the built-in functions of Sage, we can build a factorial function and use some string functionality in Python to achieve the same result.

import time
 
def factorial(n):
    if n == 0: return 1
    else: return n * factorial(n - 1)
 
def sum_digits(n):
    return sum([int(i) for i in str(n)])
 
start = time.time()
 
sumfac = sum_digits(factorial(100))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (sumfac,elapsed)

When executed, the same result is returned.

648 found in 0.000312805175781 seconds

We could also compute the factorial using a reduced lambda function as in the following code.

#!/usr/bin/python
 
import time
 
 
def factorial(n):
    return reduce(lambda x,y: x*y, range(1,n+1,1))
 
def sum_digits(n):
    return sum([int(i) for i in str(n)])
 
start = time.time()
 
sumfac = sum_digits(factorial(100))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (sumfac,elapsed)

Problem: You are given the following information, but you may prefer to do some research for yourself.

  • 1 Jan 1900 was a Monday.
  • Thirty days has September,
    April, June and November.
    All the rest have thirty-one,
    Saving February alone,
    Which has twenty-eight, rain or shine.
    And on leap years, twenty-nine.
  • A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

Approach

There are several different ways to approach this. The easiest, I think, is to use the Gaussian formula for day of week. It is a purely mathematical formula that I have encoded in the following Python code.

Python Solution

import time
from math import floor
 
"""
Gaussian algorithm to determine day of week
"""
def day_of_week(year, month, day):
    """
    w = (d+floor(2.6*m-0.2)+y+floor(y/4)+floor(c/4)-2*c) mod 7
 
    Y = year - 1 for January or February
    Y = year for other months
    d = day (1 to 31)
    m = shifted month (March = 1, February = 12)
    y = last two digits of Y
    c = first two digits of Y
    w = day of week (Sunday = 0, Saturday = 6)
    """
 
    d = day
    m = (month - 3) % 12 + 1
    if m &gt; 10: Y = year - 1
    else: Y = year
    y = Y % 100
    c = (Y - (Y % 100)) / 100
 
    w = (d + floor(2.6 * m - 0.2) + y + floor(y/4) + floor(c/4) - 2*c) % 7
 
    return int(w)
 
"""
Compute the number of months starting on a given day of the week in a century
"""
def months_start_range(day,year_start,year_end):
    total = 0
    for year in range(year_start, year_end + 1):
        for month in range(1,13):
            if day_of_week(year, month, 1) == day: total += 1
    return total
 
start = time.time()
 
total = months_start_range(0,1901,2000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (total,elapsed)

This returns the correct result.

171 found in 0.0681998729706 seconds

That will run faster if executed directed in Python. Remember, I’m using the Sage notebook environment to execute most Python here. I’m also writing Cython in the same environment.

Cython Solution

There is a nearly trivial rewriting to Cython of the above Python code.

%cython
 
import time
from math import floor
 
"""
Gaussian algorithm to determine day of week
"""
cdef day_of_week(int year, int month, int day):
    """
    w = (d+floor(2.6*m-0.2)+y+floor(y/4)+floor(c/4)-2*c) mod 7
 
    Y = year - 1 for January or February
    Y = year for other months
    d = day (1 to 31)
    m = shifted month (March = 1, February = 12)
    y = last two digits of Y
    c = first two digits of Y
    w = day of week (Sunday = 0, Saturday = 6)
    """
 
    cdef int d = day
    cdef int m = (month - 3) % 12 + 1
    cdef int Y
    if m &gt; 10: Y = year - 1
    else: Y = year
    y = Y % 100
    cdef int c = (Y - (Y % 100)) / 100
 
    cdef double w
    w = (d + floor(2.6 * m - 0.2) + y + floor(y/4) + floor(c/4) - 2*c) % 7
 
    return int(w)
 
"""
Compute the number of months starting on a given day of the week in range of years
"""
cdef months_start_range(int day, int year_start,int year_end):
    cdef unsigned int total = 0
    cdef int year, month
    for year in range(year_start, year_end + 1):
        for month in range(1,13):
            if day_of_week(year, month, 1) == day: total += 1
    return total
 
start = time.time()
 
total = months_start_range(0,1901,2000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (total,elapsed)

The code is a bit longer, but it executes much faster.

171 found in 0.00387215614319 seconds

The Cython code runs roughly 18 times faster.

C Solution

The Cython code was used as a model to create more efficient C code. The only issue here is in maintaining the correct datatypes (not too hard, but compared to Cython it is a pain).

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
 
int day_of_week(int year, int month, int day) {
    // Using the Gaussian algorithm
    int d = day;
 
    double m = (double) ((month - 3) % 12 + 1);
    int Y;
    if(m > 10) Y = year - 1;
    else Y = year;
    int y = Y % 100;
    int c = (Y - (Y % 100)) / 100;
 
    int w = ((d+(int)floor(2.6*m-0.2)+y+ y/4 + c/4 -2*c))%7;
 
    return w; 
}
 
long months_start_range(int day, int year_start, int year_end) {
    unsigned long total = 0;
    int year, month;
    for(year = year_start; year < year_end; year++) {
        for(month = 1; month <= 12; month++) {
            if(day_of_week(year, month, 1)==day) total++;
        }
    }
    return total;
 
}
 
int main(int argc, char **argv) {
 
    int iter = 0;
    long total;
    while(iter < 100000) {
        total = months_start_range(0,1901,2000);
        iter++;
    }
    printf("Solution: %ld\n",total);
 
    return 0;
}

Notice that this executes the loop 100,000 times, as I’m trying to get a good idea of what the average runtime is. We compile with optimization and the [[[-lm]]] math option. We get the following result.

gcc -O3 -o problem-19 problem-19.c -lm
$ time ./problem-19
Solution: 171
 
real	0m6.240s

The C code runs roughly 62 times as fast as the Cython and roughly 1124 times as fast as the Python. Each iteration executes in about 6.2000e-5 seconds.