A triangular pyramid is constructed using spherical balls so that each ball rests on exactly three balls of the next lower level.

Then, we calculate the number of paths leading from the apex to each position: A path starts at the apex and progresses downwards to any of the three spheres directly below the current position. Consequently, the number of paths to reach a certain position is the sum of the numbers immediately above it (depending on the position, there are up to three numbers above it).

The result is Pascal’s pyramid and the numbers at each level n are the coefficients of the trinomial expansion $(x + y + z)^n$. How many coefficients in the expansion of $(x + y + z)^{200000}$ are multiples of $10^{12}$?

## Solution Using the Multinomial Theorem

The generalization of the binomial theorem is the multinomial theorem. It says that multinomials raised to exponents can be expanded using the formula
$(x_1+x_2+\cdots+x_m)^n=\sum_{{k_1+k_2+\cdots+k_m=n}\atop{0\le k_i\le n}}\left({n}\atop{k_1,k_2,\ldots,k_m}\right)\prod_{1\le t\le m}x_t^{k_t}$
where
$\left({n}\atop{k_1,k_2,\ldots,k_m}\right)=\frac{n!}{k_1!k_2!\cdots k_m!}.$
Of course, when m=2 this gives the binomial theorem. The sum is taken over all partitions $k_1+k_2+\cdots+k_m=n$ for integers $k_i$. If n=200000 abd m=3, then the terms in the expansion are given by
$\left({200000}\atop{k_1,k_2,k_3}\right)x_1^{k_1}x_2^{k_2}x_3^{k_3}=\frac{200000!}{k_1!k_2!k_3!}x_1^{k_1}x_2^{k_2}x_3^{k_3}$
where $k_1+k_2+k_3=200000$. It’s worth pointing out that each of the coefficients is an integer, and thus has a unique factorization into products of prime integers. Of course, there’s no way that we’re going to calculate these coefficients. We only need to know when they’re divisible by $10^{12}$. Thus, we only need to consider how many factors of 2 and 5 are involved.

First, we’ll create a function $p(n,d)$ that outputs how many factors of $d$ are included in $n!$. We have that
$p(n,d)=\left\lfloor\frac{n}{d}\right\rfloor+\left\lfloor\frac{n}{d^2}\right\rfloor+\left\lfloor\frac{n}{d^3}\right\rfloor+ \cdots+\left\lfloor\frac{n}{d^r}\right\rfloor,$
where $d^r$ is the highest power of $d$ dividing $n$. For instance, there are 199994 factors of 2 in 200000!. Since we’re wondering when our coefficients are divisible by $10^{12}=2^{12}5^{12}$, we’ll be using the values provided by $p(n,d)$ quite a bit for $d=2$ and $d=5$. We’ll store two lists:
$p2=[p(i,2)\text{ for }1\le i\le 200000]\quad\text{and}\quad p5=[p(i,5)\text{ for }1\le i\le 200000].$
For a given $k_1,k_2,k_3$, the corresponding coefficient is divisible by $10^{12}$ precisely when
$p2[k_1]+p2[k_2]+p2[k_3]<199983\ \text{and}\ p5[k_1]+p5[k_2]+p5[k_3]<49987.$
That is, this condition ensures that there are at least 12 more factors of 2 and 5 in the numerator of the fraction defining the coefficients.

Now, we know that $k_1+k_2+k_3=200000$, and we can exploit symmetry and avoid redundant computations if we assume $k_1\le k_2\le k_3$. Under this assumption, we always have
$k_1\le\left\lfloor\frac{200000}{3}\right\rfloor=66666.$
We know that $k_1+k_2+k_3=200000$ is impossible since 200000 isn't divisible by 3. It follows that we can only have (case 1) $k_1=k_2 < k_3$, or (case 2) $k_1 < k_2=k_3$, or (case 3) $k_1 < k_2 < k_3$.

In case 1, we iterate $0\le k_1\le 66666$, setting $k_2=k_1$ and $k_3=200000-k_1-k_2$. We check the condition, and when it is satisfied we record 3 new instances of coefficients (since we may permute the $k_i$ in 3 ways).

In case 2, we iterate $0\le k_1\le 66666$, and when $k_1$ is divisible by 2 we set $k_2=k_3=\frac{200000-k_1}{2}$. When the condition holds, we again record 3 new instance.

In case 3, we iterate $0\le k_1\le 66666$, and we iterate over $k_2=k_1+a$ where $1\le a < \left\lfloor\frac{200000-3k_1}{2}\right\rfloor$. Then $k_3=200000-k_1-k_2$. When the condition holds, we record 6 instances (since there are 6 permutations of 3 objects).

## Cython Solution

I’ll provide two implementations, the first written in Cython inside Sage. Then, I’ll write a parallel solution in C.

%cython

import time
from libc.stdlib cimport malloc, free

cdef unsigned long p(unsigned long k, unsigned long d):
cdef unsigned long power = d
cdef unsigned long exp = 0
while power <= k:
exp += k / power
power *= d
return exp

cdef unsigned long * p_list(unsigned long n, unsigned long d):
cdef unsigned long i = 0
cdef unsigned long * powers = <unsigned long *>malloc((n+1)*sizeof(unsigned long))
while i <= n:
powers[i] = p(i,d)
i += 1
return powers

run_time = time.time()

# form a list of number of times each n! is divisible by 2.
cdef unsigned long * p2 = p_list(200000,2)

# form a list of number of times each n! is divisible by 5.
cdef unsigned long * p5 = p_list(200000,5)

cdef unsigned long k1, k2, k3, a
cdef unsigned long long result = 0

k1 = 0
while k1 <= 66666:
# case 1: k1 = k2 < k3
k2 = k1
k3 = 200000 - k1 - k2
if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
result += 3
# case 2: k1 < k2 = k3
if k1 % 2 == 0:
k2 = (200000 - k1)/2
k3 = k2
if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
result += 3
# case 3: k1 < k2 < k3
a = 1
while 2*a < (200000 - 3*k1):
k2 = k1 + a
k3 = 200000 - k1 - k2
if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
result += 6
a += 1
k1 += 1

free(p2)
free(p5)

elapsed_run = round(time.time() - run_time, 5)

print "Result: %s" % result
print "Runtime: %s seconds (total time: %s seconds)" % (elapsed_run, elapsed_head)

When executed, we find the correct result relatively quickly.

Result: 479742450
Runtime: 14.62538 seconds (total time: 14.62543 seconds)

## C with OpenMP Solution

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <omp.h>

/*****************************************************************************/
/* function to determine how many factors of 'd' are in 'k!'                 */
/*****************************************************************************/
unsigned long p(unsigned long k, unsigned long d) {
unsigned long power = d;
unsigned long exp = 0;
while (power <= k) {
exp += k/power;
power *= d;
}
return exp;
}

/*****************************************************************************/
/* create a list [p(0,d),p(1,d),p(2,d), ... ,p(n,d)] and return pointer      */
/*****************************************************************************/
unsigned long * p_list(unsigned long n, unsigned long d) {
unsigned long i;
unsigned long * powers = malloc((n+1)*sizeof(unsigned long));
for (i=0;i<=n;i++) powers[i] = p(i,d);
return powers;
}

/*****************************************************************************/
/* main                                                                      */
/*****************************************************************************/
int main(int argc, char **argv) {
unsigned long k1, k2, k3, a;
unsigned long long result = 0;

unsigned long * p2 = p_list(200000, 2);
unsigned long * p5 = p_list(200000, 5);

#pragma omp parallel for private(k1,k2,k3,a) reduction(+ : result)
for (k1=0;k1<66667;k1++) {
// case 1: k1 = k2 < k3
k2 = k1;
k3 = 200000 - k1 - k2;
if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
result += 3;
}
// case 2: k1 < k2 = k3
if (k1 % 2 == 0) {
k2 = (200000 - k1)/2;
k3 = k2;
if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
result += 3;
}
}
// case 3: k1 < k2 < k3
for (a=1;2*a<(200000-3*k1);a++) {
k2 = k1 + a;
k3 = 200000 - k1 - k2;
if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
result += 6;
}
}
}

free(p2);
free(p5);

printf("result: %lld\n", result);

return 0;
}

This can be compiled and optimized using GCC as follows.

$gcc -O3 -fopenmp -o problem-154-omp problem-154-omp.c When executed on a 16-core machine, we get the following result.$ time ./problem-154-omp
result: 479742450

real    0m1.487s

This appears to be the fastest solution currently known, according to the forum of solutions on Project Euler. The CPUs on the 16-core machine are pretty weak compared to modern standards. When running on a single core on a new Intel Core i7, the result is returned in about 4.7 seconds.

### Motivation

Many interesting computational problems, such as those on Project Euler require that one find the sum of proper divisors of a given integer. I had a fairly crude brute-force method for doing this, and was subsequently emailed a comment by Bjarki Ágúst Guðmundsson who runs the site www.mathblog.dk. He pointed me in the direction of this page and provided some sample code illustrating how such an approach runs asymptotically faster than the approach I had been taking. Awesome! I’m going to expand on that a bit here, providing some mathematical proofs behind the claims and providing code for those who may want to take advantage of this.

### The Mathematics Behind It All

Let the function $\sigma(n)$ be the sum of divisors for a positive integer $n$. For example,
$\sigma(6)=1+2+3+6=12.$

It should seem obvious that for any prime $p$ we have $\sigma(p)=1+p$. What about powers of primes? Let $\alpha\in\mathbb{Z}_+$, and then
$\sigma(p^\alpha)=1+p+p^2+\cdots+p^\alpha.$

We’d like to write that in a closed form, i.e., without the “$+\cdots+$”. We use a standard series trick to do that.

\begin{align} \sigma(p^\alpha) &= 1+p+p^2+\cdots+p^\alpha\cr p\sigma(p^\alpha) &= p+p^2+p^3+\cdots+p^{\alpha+1}\cr p\sigma(p^\alpha)-\sigma(p^\alpha) &= (p+p^2+\cdots+p^{\alpha+1})-(1+p+\cdots+p^\alpha)\cr p\sigma(p^\alpha)-\sigma(p^\alpha) &= p^{\alpha+1}-1\cr (p-1)\sigma(p^\alpha) &= p^{\alpha+1}-1\cr \sigma(p^\alpha) &=\frac{p^{\alpha+1}-1}{p-1}.\end{align}

That solves the problem of finding the sum of divisors for powers of primes. It would be nice if we could show that $\sigma$ is multiplicative on powers of primes, i.e., that $\sigma(p_1^{\alpha_1}p_2^{\alpha_2})=\sigma(p_1^{\alpha_1})\sigma(p_2^{\alpha_2})$. We’ll prove that this is the case, and solve the problem in general along the way.

Proposition: The function $\sigma$ is multiplicative on powers of primes.

Proof: Let $n$ be a positive integer written (uniquely, by the fundamental theorem of arithmetic) as
$n=\prod_{i=1}^m p_i^{\alpha_i}$
for $m$ distinct primes $p_i$ with $\alpha_i\in\mathbb{Z}_+$. Any divisor $k$ of $n$ then has the form
$k=\prod_{i=1}^m p_i^{\beta_i}$
where each $\beta_i$ satisfies $0\le\beta_i\le\alpha_i$. Then
$\sigma(n)=\sigma\left(\prod_{i=1}^m p_i^{\alpha_i}\right)$
is the sum of all divisors $k$ of $n$ and can be written by summing over all possible combinations of the exponents $\beta_i$. There are $\prod_{i=1}^m \alpha_i$ combinations, and we can form their sum and simplify it as follows.
\begin{align}\sigma(n) &= \sum_{1\le i\le m,\ 0\le\beta_i\le\alpha_i}p_1^{\beta_i}p_2^{\beta_2}\cdots p_m^{\beta_m}\cr &= \sum_{\beta_1=0}^{\alpha_1}p_1^{\beta_1}\left(\sum_{2\le i\le m,\ 0\le\beta_i\le\alpha_i}p_2^{\beta_2}p_3^{\beta_3}\cdots p_m^{\beta_m}\right)\cr &= \sum_{\beta_1=0}^{\alpha_1}p_1^{\beta_1}\sum_{\beta_2=0}^{\alpha_2}p_2^{\beta_2}\left(\sum_{3\le i\le m,\ 0\le\beta_i\le\alpha_i}p_3^{\beta_3}p_4^{\beta_4}\cdots p_m^{\beta_m}\right) \cr &= \vdots\cr &=\sum_{\beta_1=0}^{\alpha_1}p_1^{\beta_1}\sum_{\beta_2=0}^{\alpha_2}p_2^{\beta_2}\sum_{\beta_3=0}^{\alpha_3}p_3^{\beta_3}\ \cdots\ \sum_{\beta_m=0}^{\alpha_m}p_m^{\beta_m}\cr &=\sigma(p_1^{\alpha_1})\sigma(p_2^{\alpha_2})\cdots\sigma(p_m^{\alpha_m}).\end{align}
This completes the proof. Q.E.D.

Thus, we now have a formula for the sum of divisors of an arbitrary positive integer $n$ using the factorization of $n$. Namely,
$\sigma(n)=\sigma\left(\prod_{i=1}^m p_i^{\alpha_i}\right)=\prod_{i=1}^m\left(\frac{p_i^{\alpha_i+1}-1}{p_i-1}\right).$

This is something I use quite a bit for various problems and programming exercises, so I figured I could post it here. It’s a basic post that isn’t advanced at all, but that doesn’t mean that the implementation given below won’t save work for others. The idea is to create a list of primes in C by malloc’ing a sieve, then malloc’ing a list of specific length based on that sieve. The resulting list contains all the primes below a given limit (defined in the code). The first member of the list is an integer representing the length of the list.

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>

#define                 bool    _Bool

static unsigned long    prime_limit = 1000000;

unsigned long sqrtld(unsigned long N) {
int                 b = 1;
unsigned long       res,s;
while(1<<b<N) b+= 1;
res = 1<<(b/2 + 1);
for(;;) {
s = (N/res + res)/2;
if(s>=res) return res;
res = s;
}
}

unsigned long * make_primes(unsigned long limit) {
unsigned long      *primes;
unsigned long       i,j;
unsigned long       s = sqrtld(prime_limit);
unsigned long       n = 0;
bool *sieve = malloc((prime_limit + 1) * sizeof(bool));
sieve[0] = 0;
sieve[1] = 0;
for(i=2; i<=prime_limit; i++) sieve[i] = 1;
j = 4;
while(j<=prime_limit) {
sieve[j] = 0;
j += 2;
}
for(i=3; i<=s; i+=2) {
if(sieve[i] == 1) {
j = i * 3;
while(j<=prime_limit) {
sieve[j] = 0;
j += 2 * i;
}
}
}
for(i=2;i<=prime_limit;i++) if(sieve[i]==1) n += 1;
primes = malloc((n + 1) * sizeof(unsigned long));
primes[0] = n;
j = 1;
for(i=2;i<=prime_limit;i++) if(sieve[i]==1) {
primes[j] = i;
j++;
}
free(sieve);
return primes;
}

int main(void) {

unsigned long * primes = make_primes(prime_limit);

printf("There are %ld primes <= %ld\n",primes[0],prime_limit);

free(primes);
return 0;
}

Say one wanted to form a list of all primes below 1,000,000. That’s what the above program does by default, since “prime_limit = 1000000.” If one compiles this and executes, you would get something like what follows. The timing is relatively respectable.

$gcc -O3 -o prime-sieve prime-sieve.c$ time ./prime-sieve
There are 78498 primes <= 1000000

real	0m0.008s
user	0m0.004s
sys	0m0.000s

The code is linked here: prime-sieve.c