A triangular pyramid is constructed using spherical balls so that each ball rests on exactly three balls of the next lower level.

Then, we calculate the number of paths leading from the apex to each position: A path starts at the apex and progresses downwards to any of the three spheres directly below the current position. Consequently, the number of paths to reach a certain position is the sum of the numbers immediately above it (depending on the position, there are up to three numbers above it).

The result is Pascal’s pyramid and the numbers at each level n are the coefficients of the trinomial expansion $(x + y + z)^n$. How many coefficients in the expansion of $(x + y + z)^{200000}$ are multiples of $10^{12}$?

## Solution Using the Multinomial Theorem

The generalization of the binomial theorem is the multinomial theorem. It says that multinomials raised to exponents can be expanded using the formula

\[

(x_1+x_2+\cdots+x_m)^n=\sum_{{k_1+k_2+\cdots+k_m=n}\atop{0\le k_i\le n}}\left({n}\atop{k_1,k_2,\ldots,k_m}\right)\prod_{1\le t\le m}x_t^{k_t}

\]

where

\[

\left({n}\atop{k_1,k_2,\ldots,k_m}\right)=\frac{n!}{k_1!k_2!\cdots k_m!}.

\]

Of course, when m=2 this gives the binomial theorem. The sum is taken over all partitions $k_1+k_2+\cdots+k_m=n$ for integers $k_i$. If n=200000 abd m=3, then the terms in the expansion are given by

\[

\left({200000}\atop{k_1,k_2,k_3}\right)x_1^{k_1}x_2^{k_2}x_3^{k_3}=\frac{200000!}{k_1!k_2!k_3!}x_1^{k_1}x_2^{k_2}x_3^{k_3}

\]

where $k_1+k_2+k_3=200000$. It’s worth pointing out that each of the coefficients is an integer, and thus has a unique factorization into products of prime integers. Of course, there’s no way that we’re going to calculate these coefficients. We only need to know when they’re divisible by $10^{12}$. Thus, we only need to consider how many factors of 2 and 5 are involved.

First, we’ll create a function $p(n,d)$ that outputs how many factors of $d$ are included in $n!$. We have that

\[

p(n,d)=\left\lfloor\frac{n}{d}\right\rfloor+\left\lfloor\frac{n}{d^2}\right\rfloor+\left\lfloor\frac{n}{d^3}\right\rfloor+ \cdots+\left\lfloor\frac{n}{d^r}\right\rfloor,

\]

where $d^r$ is the highest power of $d$ dividing $n$. For instance, there are 199994 factors of 2 in 200000!. Since we’re wondering when our coefficients are divisible by $10^{12}=2^{12}5^{12}$, we’ll be using the values provided by $p(n,d)$ quite a bit for $d=2$ and $d=5$. We’ll store two lists:

\[

p2=[p(i,2)\text{ for }1\le i\le 200000]\quad\text{and}\quad p5=[p(i,5)\text{ for }1\le i\le 200000].

\]

For a given $k_1,k_2,k_3$, the corresponding coefficient is divisible by $10^{12}$ precisely when

\[

p2[k_1]+p2[k_2]+p2[k_3]<199983\ \text{and}\ p5[k_1]+p5[k_2]+p5[k_3]<49987.

\]

That is, this condition ensures that there are at least 12 more factors of 2 and 5 in the numerator of the fraction defining the coefficients.

Now, we know that $k_1+k_2+k_3=200000$, and we can exploit symmetry and avoid redundant computations if we assume $k_1\le k_2\le k_3$. Under this assumption, we always have

\[

k_1\le\left\lfloor\frac{200000}{3}\right\rfloor=66666.

\]

We know that $k_1+k_2+k_3=200000$ is impossible since 200000 isn't divisible by 3. It follows that we can only have (case 1) $k_1=k_2 < k_3$, or (case 2) $k_1 < k_2=k_3$, or (case 3) $k_1 < k_2 < k_3$.

In case 1, we iterate $0\le k_1\le 66666$, setting $k_2=k_1$ and $k_3=200000-k_1-k_2$. We check the condition, and when it is satisfied we record 3 new instances of coefficients (since we may permute the $k_i$ in 3 ways).

In case 2, we iterate $0\le k_1\le 66666$, and when $k_1$ is divisible by 2 we set $k_2=k_3=\frac{200000-k_1}{2}$. When the condition holds, we again record 3 new instance.

In case 3, we iterate $0\le k_1\le 66666$, and we iterate over $k_2=k_1+a$ where $1\le a < \left\lfloor\frac{200000-3k_1}{2}\right\rfloor$. Then $k_3=200000-k_1-k_2$. When the condition holds, we record 6 instances (since there are 6 permutations of 3 objects).

## Cython Solution

I’ll provide two implementations, the first written in Cython inside Sage. Then, I’ll write a parallel solution in C.

%cython
import time
from libc.stdlib cimport malloc, free
head_time = time.time()
cdef unsigned long p(unsigned long k, unsigned long d):
cdef unsigned long power = d
cdef unsigned long exp = 0
while power <= k:
exp += k / power
power *= d
return exp
cdef unsigned long * p_list(unsigned long n, unsigned long d):
cdef unsigned long i = 0
cdef unsigned long * powers = <unsigned long *>malloc((n+1)*sizeof(unsigned long))
while i <= n:
powers[i] = p(i,d)
i += 1
return powers
run_time = time.time()
# form a list of number of times each n! is divisible by 2.
cdef unsigned long * p2 = p_list(200000,2)
# form a list of number of times each n! is divisible by 5.
cdef unsigned long * p5 = p_list(200000,5)
cdef unsigned long k1, k2, k3, a
cdef unsigned long long result = 0
k1 = 0
while k1 <= 66666:
# case 1: k1 = k2 < k3
k2 = k1
k3 = 200000 - k1 - k2
if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
result += 3
# case 2: k1 < k2 = k3
if k1 % 2 == 0:
k2 = (200000 - k1)/2
k3 = k2
if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
result += 3
# case 3: k1 < k2 < k3
a = 1
while 2*a < (200000 - 3*k1):
k2 = k1 + a
k3 = 200000 - k1 - k2
if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
result += 6
a += 1
k1 += 1
free(p2)
free(p5)
elapsed_run = round(time.time() - run_time, 5)
elapsed_head = round(time.time() - head_time, 5)
print "Result: %s" % result
print "Runtime: %s seconds (total time: %s seconds)" % (elapsed_run, elapsed_head)

When executed, we find the correct result relatively quickly.

Result: 479742450
Runtime: 14.62538 seconds (total time: 14.62543 seconds)

## C with OpenMP Solution

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <omp.h>
/*****************************************************************************/
/* function to determine how many factors of 'd' are in 'k!' */
/*****************************************************************************/
unsigned long p(unsigned long k, unsigned long d) {
unsigned long power = d;
unsigned long exp = 0;
while (power <= k) {
exp += k/power;
power *= d;
}
return exp;
}
/*****************************************************************************/
/* create a list [p(0,d),p(1,d),p(2,d), ... ,p(n,d)] and return pointer */
/*****************************************************************************/
unsigned long * p_list(unsigned long n, unsigned long d) {
unsigned long i;
unsigned long * powers = malloc((n+1)*sizeof(unsigned long));
for (i=0;i<=n;i++) powers[i] = p(i,d);
return powers;
}
/*****************************************************************************/
/* main */
/*****************************************************************************/
int main(int argc, char **argv) {
unsigned long k1, k2, k3, a;
unsigned long long result = 0;
unsigned long * p2 = p_list(200000, 2);
unsigned long * p5 = p_list(200000, 5);
#pragma omp parallel for private(k1,k2,k3,a) reduction(+ : result)
for (k1=0;k1<66667;k1++) {
// case 1: k1 = k2 < k3
k2 = k1;
k3 = 200000 - k1 - k2;
if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
result += 3;
}
// case 2: k1 < k2 = k3
if (k1 % 2 == 0) {
k2 = (200000 - k1)/2;
k3 = k2;
if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
result += 3;
}
}
// case 3: k1 < k2 < k3
for (a=1;2*a<(200000-3*k1);a++) {
k2 = k1 + a;
k3 = 200000 - k1 - k2;
if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
result += 6;
}
}
}
free(p2);
free(p5);
printf("result: %lld\n", result);
return 0;
}

This can be compiled and optimized using GCC as follows.

$ gcc -O3 -fopenmp -o problem-154-omp problem-154-omp.c

When executed on a 16-core machine, we get the following result.

$ time ./problem-154-omp
result: 479742450
real 0m1.487s

This appears to be the fastest solution currently known, according to the forum of solutions on Project Euler. The CPUs on the 16-core machine are pretty weak compared to modern standards. When running on a single core on a new Intel Core i7, the result is returned in about 4.7 seconds.