Problem

Let $d(n)$ be defined as the sum of proper divisors of $n$ (numbers less than $n$ which divide evenly into $n$).
If $d(a) = b$ and $d(b) = a$, where $a\neq b$, then $a$ and $b$ are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore $d(220) = 284$. The proper divisors of 284 are 1, 2, 4, 71 and 142; so $d(284) = 220$.

Evaluate the sum of all the amicable numbers under 10000.

Python Solution

I used the idea of creating a list and populating that list with values corresponding to the sum of divisors of numbers, and then I walk through the list and do some index and value checking.

import time
 
def sum_divisors(n):
    s = 0
    for i in range(1,n):
        if n % i == 0: s += i
    return s
 
def amicable_pairs_xrange(low,high):
    L = [sum_divisors(i) for i in range(low,high + 1)]
    pairs = []
    for i in range(high - low + 1):
        ind = L[i]
        if i + low < ind and low <= ind and ind <= high and L[ind - low] == i + low:
            pairs.append([i+low,ind])
    return pairs
 
def sum_pairs(pairs):
    return sum([sum(pair) for pair in pairs])
 
start = time.time()
 
ans = sum_pairs(amicable_pairs_xrange(1,10000))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

This returns the correct result.

31626 found in 7.86725997925 seconds

Cython Solution

One of the things I love about using Cython inside the Sage Notebook environment is that one can simply place a Cython statement inside Python code and often achieve some sort of speedup. All I’ve done in this first example is to do just that. I haven’t change the Python code at all yet.

%cython
 
import time
 
def sum_divisors(n):
    s = 0
    for i in range(1,n):
        if n % i == 0: s += i
    return s
 
def amicable_pairs_xrange(low,high):
    L = [sum_divisors(i) for i in range(low,high + 1)]
    pairs = []
    for i in range(high - low + 1):
        ind = L[i]
        if i + low < ind and low <= ind and ind <= high and L[ind - low] == i + low:
            pairs.append([i+low,ind])
    return pairs
 
def sum_pairs(pairs):
    return sum([sum(pair) for pair in pairs])
 
start = time.time()
 
ans = sum_pairs(amicable_pairs_xrange(1,10000))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

This runs faster than the original, and that blows my mind.

31626 found in 4.98208904266 seconds

We can improve on this obviously, by rewriting the involved functions using C datatypes.

%cython
 
import time
 
cdef sum_divisors(unsigned int n):
    cdef unsigned int s = 0, i = 1
    while i < n:
        if n % i == 0: s += i
        i += 1
    return s
 
cdef sum_amicable_pairs(unsigned int low, unsigned int high):
    cdef unsigned int a = low, b, sum = 0
    while a <= high:
        b = sum_divisors(a)
        if b > a and sum_divisors(b) == a:
            sum += a + b
        a += 1
 
    return sum
 
start = time.time()
 
ans = sum_amicable_pairs(1,10000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (ans,elapsed)

We do achieve somewhat significant speedup now.

31626 found in 1.06677889824 seconds

C Solution

If I take the fast Cython version and rewrite it in C, I don’t get much better performance.

#include <stdio.h>
#include <stdlib.h>
 
unsigned int sum_divisors(unsigned int n) {
    unsigned int s = 0, i = 1;
    while(i < n) {
        if(n % i == 0) s = s + i;
        i++;
    }
    return s;
}
 
unsigned int sum_amicable_pairs(unsigned int low, unsigned int high) {
    unsigned int a = low, b, sum = 0;
    while(a <= high) {
        b = sum_divisors(a);
        if(b > a && sum_divisors(b) == a) sum = sum + a + b;
        a++;
    }
    return sum;
}
 
int main(int argc, char **argv) {
    unsigned int ans = sum_amicable_pairs(1,10000);
    printf("result: %d\n",ans);
    return 0;
}

The result, compiled with heavy optimization, is only slightly faster than the Cython code.

$ gcc -O3 -o problem-21 problem-21.c 
$ time ./problem-21
result: 31626
 
real	0m1.032s

About the only way to really improve this, beyond restructuring the algorithm, is to use OpenMP to run the process in parallel.

C with OpenMP

A parallel version using shared memory via OpenMP looks something like this. (I’ve rewritten it slightly.)

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
 
unsigned int sum_divisors(unsigned int n) {
    unsigned int s = 0, i = 1;
    while(i < n) {
        if(n % i == 0) s = s + i;
        i++;
    }
    return s;
}
 
int main(int argc, char **argv) {
    unsigned int a, b, nthreads, result = 0;
 
    #pragma omp parallel
    nthreads = omp_get_num_threads();
    printf("no of threads %d\n", nthreads);
 
    #pragma omp parallel for reduction(+:result) private(b)
    for(a=0; a < 10000; a++) {
        b = sum_divisors(a);
        if(b > a && sum_divisors(b) == a) result = result + a + b;
    }
 
    printf("result: %d\n",result);
    return 0;
}

Now, we get much better performance. I’m running this on a 16-core machine, where 3 of the cores are currently being used.

$ gcc -fopenmp -O3 -o problem-21-openmp problem-21-openmp.c 
$ time ./problem-21-openmp 
no of threads 16
result: 31626
 
real	0m0.140s

Comments are closed