If you run Debian Squeeze, you could simply add backport repositories to your apt sources if you want to force an upgrade to a new kernel. At the time of writing this, Squeeze is using Linux-image-2.6.32, while Wheezy (testing) and Sid (unstable) are both using Linux-image-3.2. However, on kernel.org, the latest stable release is Linux-image-3.8.2.

One of the few packages that can have multiple versions installed simultaneously, in apt or otherwise on the system, is the Linux kernel. This can be useful, for instance, if you find that your new hardware isn’t supported properly by a slightly old kernel.


Since we’re going to be compiling our own kernel, we need a few things. These are:

  • build-essential
  • module-assistant

Make sure those packages are installed (through apt or aptitude) before continuing. I’m guessing that most of those reading this post will already have the first installed, but perhaps not the second.

Downloading the Kernel Source

Visit kernel.org and decide on a kernel. There probably is a “latest stable release” version that is clearly highlighted, and that’s what I’d recommend. There are also some older, yet still maintained, “longterm” releases of the kernel that may provide updates that suite your needs.

Copy the link. In my case, I’m using the link to the “latest stable” 3.8.2.

Make yourself root, or otherwise superuser, for the rest of this process.

$ sudo -i

Download and unpack the kernel source.

$ cd /usr/src/
$ wget https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.8.2.tar.bz2
$ tar -jxf linux-3.8.2.tar.bz2
$ cd linux-3.8.2/

Configuring Old to New

Running the following command will check your current kernel configuration and prompt you when there is a new option available in your new kernel. Most of the time, accepting the preselected values is fine. Sometimes, this is a short process and sometimes it may take a while. If you already know that you want to accept the recommended values, holding the “enter” button down will work just fine and the script will exit back to your working directory.

$ make oldconfig

Build and Install the Kernel

Now, the standard build routine with a module install sandwiched inside.

$ make
$ make modules_install
$ make install

A triangular pyramid is constructed using spherical balls so that each ball rests on exactly three balls of the next lower level.

Then, we calculate the number of paths leading from the apex to each position: A path starts at the apex and progresses downwards to any of the three spheres directly below the current position. Consequently, the number of paths to reach a certain position is the sum of the numbers immediately above it (depending on the position, there are up to three numbers above it).

The result is Pascal’s pyramid and the numbers at each level n are the coefficients of the trinomial expansion $(x + y + z)^n$. How many coefficients in the expansion of $(x + y + z)^{200000}$ are multiples of $10^{12}$?

Solution Using the Multinomial Theorem

The generalization of the binomial theorem is the multinomial theorem. It says that multinomials raised to exponents can be expanded using the formula
(x_1+x_2+\cdots+x_m)^n=\sum_{{k_1+k_2+\cdots+k_m=n}\atop{0\le k_i\le n}}\left({n}\atop{k_1,k_2,\ldots,k_m}\right)\prod_{1\le t\le m}x_t^{k_t}
\left({n}\atop{k_1,k_2,\ldots,k_m}\right)=\frac{n!}{k_1!k_2!\cdots k_m!}.
Of course, when m=2 this gives the binomial theorem. The sum is taken over all partitions $k_1+k_2+\cdots+k_m=n$ for integers $k_i$. If n=200000 abd m=3, then the terms in the expansion are given by
where $k_1+k_2+k_3=200000$. It’s worth pointing out that each of the coefficients is an integer, and thus has a unique factorization into products of prime integers. Of course, there’s no way that we’re going to calculate these coefficients. We only need to know when they’re divisible by $10^{12}$. Thus, we only need to consider how many factors of 2 and 5 are involved.

First, we’ll create a function $p(n,d)$ that outputs how many factors of $d$ are included in $n!$. We have that
p(n,d)=\left\lfloor\frac{n}{d}\right\rfloor+\left\lfloor\frac{n}{d^2}\right\rfloor+\left\lfloor\frac{n}{d^3}\right\rfloor+ \cdots+\left\lfloor\frac{n}{d^r}\right\rfloor,
where $d^r$ is the highest power of $d$ dividing $n$. For instance, there are 199994 factors of 2 in 200000!. Since we’re wondering when our coefficients are divisible by $10^{12}=2^{12}5^{12}$, we’ll be using the values provided by $p(n,d)$ quite a bit for $d=2$ and $d=5$. We’ll store two lists:
p2=[p(i,2)\text{ for }1\le i\le 200000]\quad\text{and}\quad p5=[p(i,5)\text{ for }1\le i\le 200000].
For a given $k_1,k_2,k_3$, the corresponding coefficient is divisible by $10^{12}$ precisely when
p2[k_1]+p2[k_2]+p2[k_3]<199983\ \text{and}\ p5[k_1]+p5[k_2]+p5[k_3]<49987.
That is, this condition ensures that there are at least 12 more factors of 2 and 5 in the numerator of the fraction defining the coefficients.

Now, we know that $k_1+k_2+k_3=200000$, and we can exploit symmetry and avoid redundant computations if we assume $k_1\le k_2\le k_3$. Under this assumption, we always have
We know that $k_1+k_2+k_3=200000$ is impossible since 200000 isn't divisible by 3. It follows that we can only have (case 1) $k_1=k_2 < k_3$, or (case 2) $k_1 < k_2=k_3$, or (case 3) $k_1 < k_2 < k_3$.

In case 1, we iterate $0\le k_1\le 66666$, setting $k_2=k_1$ and $k_3=200000-k_1-k_2$. We check the condition, and when it is satisfied we record 3 new instances of coefficients (since we may permute the $k_i$ in 3 ways).

In case 2, we iterate $0\le k_1\le 66666$, and when $k_1$ is divisible by 2 we set $k_2=k_3=\frac{200000-k_1}{2}$. When the condition holds, we again record 3 new instance.

In case 3, we iterate $0\le k_1\le 66666$, and we iterate over $k_2=k_1+a$ where $1\le a < \left\lfloor\frac{200000-3k_1}{2}\right\rfloor$. Then $k_3=200000-k_1-k_2$. When the condition holds, we record 6 instances (since there are 6 permutations of 3 objects).

Cython Solution

I’ll provide two implementations, the first written in Cython inside Sage. Then, I’ll write a parallel solution in C.

import time
from libc.stdlib cimport malloc, free
head_time = time.time()
cdef unsigned long p(unsigned long k, unsigned long d):
    cdef unsigned long power = d
    cdef unsigned long exp = 0
    while power <= k:
        exp += k / power
        power *= d
    return exp
cdef unsigned long * p_list(unsigned long n, unsigned long d):
    cdef unsigned long i = 0
    cdef unsigned long * powers = <unsigned long *>malloc((n+1)*sizeof(unsigned long))
    while i <= n:
        powers[i] = p(i,d)
        i += 1
    return powers
run_time = time.time()
# form a list of number of times each n! is divisible by 2.
cdef unsigned long * p2 = p_list(200000,2)
# form a list of number of times each n! is divisible by 5.
cdef unsigned long * p5 = p_list(200000,5)
cdef unsigned long k1, k2, k3, a
cdef unsigned long long result = 0
k1 = 0
while k1 <= 66666:
    # case 1: k1 = k2 < k3
    k2 = k1
    k3 = 200000 - k1 - k2
    if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
        result += 3
    # case 2: k1 < k2 = k3
    if k1 % 2 == 0:
        k2 = (200000 - k1)/2
        k3 = k2
        if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
            result += 3
    # case 3: k1 < k2 < k3
    a = 1
    while 2*a < (200000 - 3*k1):
        k2 = k1 + a
        k3 = 200000 - k1 - k2
        if 199982 >= (p2[k1]+p2[k2]+p2[k3]) and 49986 >= (p5[k1]+p5[k2]+p5[k3]):
            result += 6
        a += 1
    k1 += 1
elapsed_run = round(time.time() - run_time, 5)
elapsed_head = round(time.time() - head_time, 5)
print "Result: %s" % result
print "Runtime: %s seconds (total time: %s seconds)" % (elapsed_run, elapsed_head)

When executed, we find the correct result relatively quickly.

Result: 479742450
Runtime: 14.62538 seconds (total time: 14.62543 seconds)

C with OpenMP Solution

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <omp.h>
/* function to determine how many factors of 'd' are in 'k!'                 */
unsigned long p(unsigned long k, unsigned long d) {
    unsigned long power = d;
    unsigned long exp = 0;
    while (power <= k) {
        exp += k/power;
        power *= d;
    return exp;
/* create a list [p(0,d),p(1,d),p(2,d), ... ,p(n,d)] and return pointer      */
unsigned long * p_list(unsigned long n, unsigned long d) {
    unsigned long i;
    unsigned long * powers = malloc((n+1)*sizeof(unsigned long));
    for (i=0;i<=n;i++) powers[i] = p(i,d);
    return powers;
/* main                                                                      */
int main(int argc, char **argv) {
    unsigned long k1, k2, k3, a;
    unsigned long long result = 0;
    unsigned long * p2 = p_list(200000, 2);
    unsigned long * p5 = p_list(200000, 5);
    #pragma omp parallel for private(k1,k2,k3,a) reduction(+ : result)
    for (k1=0;k1<66667;k1++) {
        // case 1: k1 = k2 < k3
        k2 = k1;
        k3 = 200000 - k1 - k2;
        if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
            result += 3;
        // case 2: k1 < k2 = k3
        if (k1 % 2 == 0) {
            k2 = (200000 - k1)/2;
            k3 = k2;
            if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
                result += 3;
        // case 3: k1 < k2 < k3
        for (a=1;2*a<(200000-3*k1);a++) {
            k2 = k1 + a;
            k3 = 200000 - k1 - k2;
            if (p2[k1]+p2[k2]+p2[k3]<199983 && p5[k1]+p5[k2]+p5[k3]<49987) {
                result += 6;
    printf("result: %lld\n", result);
    return 0;

This can be compiled and optimized using GCC as follows.

$ gcc -O3 -fopenmp -o problem-154-omp problem-154-omp.c

When executed on a 16-core machine, we get the following result.

$ time ./problem-154-omp 
result: 479742450
real    0m1.487s

This appears to be the fastest solution currently known, according to the forum of solutions on Project Euler. The CPUs on the 16-core machine are pretty weak compared to modern standards. When running on a single core on a new Intel Core i7, the result is returned in about 4.7 seconds.