### Problem

Let $d(n)$ be defined as the sum of proper divisors of $n$ (numbers less than $n$ which divide evenly into $n$).
If $d(a) = b$ and $d(b) = a$, where $a\neq b$, then $a$ and $b$ are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore $d(220) = 284$. The proper divisors of 284 are 1, 2, 4, 71 and 142; so $d(284) = 220$.

Evaluate the sum of all the amicable numbers under 10000.

### Python Solution

I used the idea of creating a list and populating that list with values corresponding to the sum of divisors of numbers, and then I walk through the list and do some index and value checking.

import time   def sum_divisors(n): s = 0 for i in range(1,n): if n % i == 0: s += i return s   def amicable_pairs_xrange(low,high): L = [sum_divisors(i) for i in range(low,high + 1)] pairs = [] for i in range(high - low + 1): ind = L[i] if i + low < ind and low <= ind and ind <= high and L[ind - low] == i + low: pairs.append([i+low,ind]) return pairs   def sum_pairs(pairs): return sum([sum(pair) for pair in pairs])   start = time.time()   ans = sum_pairs(amicable_pairs_xrange(1,10000))   elapsed = time.time() - start   print("%s found in %s seconds") % (ans,elapsed)

This returns the correct result.

31626 found in 7.86725997925 seconds

### Cython Solution

One of the things I love about using Cython inside the Sage Notebook environment is that one can simply place a Cython statement inside Python code and often achieve some sort of speedup. All I’ve done in this first example is to do just that. I haven’t change the Python code at all yet.

%cython   import time   def sum_divisors(n): s = 0 for i in range(1,n): if n % i == 0: s += i return s   def amicable_pairs_xrange(low,high): L = [sum_divisors(i) for i in range(low,high + 1)] pairs = [] for i in range(high - low + 1): ind = L[i] if i + low < ind and low <= ind and ind <= high and L[ind - low] == i + low: pairs.append([i+low,ind]) return pairs   def sum_pairs(pairs): return sum([sum(pair) for pair in pairs])   start = time.time()   ans = sum_pairs(amicable_pairs_xrange(1,10000))   elapsed = time.time() - start   print("%s found in %s seconds") % (ans,elapsed)

This runs faster than the original, and that blows my mind.

31626 found in 4.98208904266 seconds

We can improve on this obviously, by rewriting the involved functions using C datatypes.

%cython   import time   cdef sum_divisors(unsigned int n): cdef unsigned int s = 0, i = 1 while i < n: if n % i == 0: s += i i += 1 return s   cdef sum_amicable_pairs(unsigned int low, unsigned int high): cdef unsigned int a = low, b, sum = 0 while a <= high: b = sum_divisors(a) if b > a and sum_divisors(b) == a: sum += a + b a += 1   return sum   start = time.time()   ans = sum_amicable_pairs(1,10000)   elapsed = time.time() - start   print("%s found in %s seconds") % (ans,elapsed)

We do achieve somewhat significant speedup now.

31626 found in 1.06677889824 seconds

### C Solution

If I take the fast Cython version and rewrite it in C, I don’t get much better performance.

#include <stdio.h> #include <stdlib.h>   unsigned int sum_divisors(unsigned int n) { unsigned int s = 0, i = 1; while(i < n) { if(n % i == 0) s = s + i; i++; } return s; }   unsigned int sum_amicable_pairs(unsigned int low, unsigned int high) { unsigned int a = low, b, sum = 0; while(a <= high) { b = sum_divisors(a); if(b > a && sum_divisors(b) == a) sum = sum + a + b; a++; } return sum; }   int main(int argc, char **argv) { unsigned int ans = sum_amicable_pairs(1,10000); printf("result: %d\n",ans); return 0; }

The result, compiled with heavy optimization, is only slightly faster than the Cython code.

$gcc -O3 -o problem-21 problem-21.c$ time ./problem-21 result: 31626   real 0m1.032s

About the only way to really improve this, beyond restructuring the algorithm, is to use OpenMP to run the process in parallel.

### C with OpenMP

A parallel version using shared memory via OpenMP looks something like this. (I’ve rewritten it slightly.)

#include <stdio.h> #include <stdlib.h> #include <omp.h>   unsigned int sum_divisors(unsigned int n) { unsigned int s = 0, i = 1; while(i < n) { if(n % i == 0) s = s + i; i++; } return s; }   int main(int argc, char **argv) { unsigned int a, b, nthreads, result = 0;   #pragma omp parallel nthreads = omp_get_num_threads(); printf("no of threads %d\n", nthreads);   #pragma omp parallel for reduction(+:result) private(b) for(a=0; a < 10000; a++) { b = sum_divisors(a); if(b > a && sum_divisors(b) == a) result = result + a + b; }   printf("result: %d\n",result); return 0; }

Now, we get much better performance. I’m running this on a 16-core machine, where 3 of the cores are currently being used.

$gcc -fopenmp -O3 -o problem-21-openmp problem-21-openmp.c$ time ./problem-21-openmp no of threads 16 result: 31626   real 0m0.140s