$n!$ means $n\times(n-1)\times\cdots\times 3\times 2\times 1$. For example, $10!=10\times 9\times\cdots\times 2\times 1=3628800$ and the sum of the digits in the number $10!$ is 27. Find the sum of the digits in $100!$.

Sage Solution

There is a single-line solution in Sage. I’ve added a bit more here just to determine the timing.

import time
 
start = time.time()
 
s = sum(factorial(100).digits())
 
elapsed = time.time() - start
 
print "Solution %s found after %s seconds" % (s,elapsed)

When executed, we find the following.

Solution 648 found after 0.000201940536499 seconds

Python Solution

Without the built-in functions of Sage, we can build a factorial function and use some string functionality in Python to achieve the same result.

import time
 
def factorial(n):
    if n == 0: return 1
    else: return n * factorial(n - 1)
 
def sum_digits(n):
    return sum([int(i) for i in str(n)])
 
start = time.time()
 
sumfac = sum_digits(factorial(100))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (sumfac,elapsed)

When executed, the same result is returned.

648 found in 0.000312805175781 seconds

We could also compute the factorial using a reduced lambda function as in the following code.

#!/usr/bin/python
 
import time
 
 
def factorial(n):
    return reduce(lambda x,y: x*y, range(1,n+1,1))
 
def sum_digits(n):
    return sum([int(i) for i in str(n)])
 
start = time.time()
 
sumfac = sum_digits(factorial(100))
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (sumfac,elapsed)

Problem: You are given the following information, but you may prefer to do some research for yourself.

  • 1 Jan 1900 was a Monday.
  • Thirty days has September,
    April, June and November.
    All the rest have thirty-one,
    Saving February alone,
    Which has twenty-eight, rain or shine.
    And on leap years, twenty-nine.
  • A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

Approach

There are several different ways to approach this. The easiest, I think, is to use the Gaussian formula for day of week. It is a purely mathematical formula that I have encoded in the following Python code.

Python Solution

import time
from math import floor
 
"""
Gaussian algorithm to determine day of week
"""
def day_of_week(year, month, day):
    """
    w = (d+floor(2.6*m-0.2)+y+floor(y/4)+floor(c/4)-2*c) mod 7
 
    Y = year - 1 for January or February
    Y = year for other months
    d = day (1 to 31)
    m = shifted month (March = 1, February = 12)
    y = last two digits of Y
    c = first two digits of Y
    w = day of week (Sunday = 0, Saturday = 6)
    """
 
    d = day
    m = (month - 3) % 12 + 1
    if m > 10: Y = year - 1
    else: Y = year
    y = Y % 100
    c = (Y - (Y % 100)) / 100
 
    w = (d + floor(2.6 * m - 0.2) + y + floor(y/4) + floor(c/4) - 2*c) % 7
 
    return int(w)
 
"""
Compute the number of months starting on a given day of the week in a century
"""
def months_start_range(day,year_start,year_end):
    total = 0
    for year in range(year_start, year_end + 1):
        for month in range(1,13):
            if day_of_week(year, month, 1) == day: total += 1
    return total
 
start = time.time()
 
total = months_start_range(0,1901,2000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (total,elapsed)

This returns the correct result.

171 found in 0.0681998729706 seconds

That will run faster if executed directed in Python. Remember, I’m using the Sage notebook environment to execute most Python here. I’m also writing Cython in the same environment.

Cython Solution

There is a nearly trivial rewriting to Cython of the above Python code.

%cython
 
import time
from math import floor
 
"""
Gaussian algorithm to determine day of week
"""
cdef day_of_week(int year, int month, int day):
    """
    w = (d+floor(2.6*m-0.2)+y+floor(y/4)+floor(c/4)-2*c) mod 7
 
    Y = year - 1 for January or February
    Y = year for other months
    d = day (1 to 31)
    m = shifted month (March = 1, February = 12)
    y = last two digits of Y
    c = first two digits of Y
    w = day of week (Sunday = 0, Saturday = 6)
    """
 
    cdef int d = day
    cdef int m = (month - 3) % 12 + 1
    cdef int Y
    if m > 10: Y = year - 1
    else: Y = year
    y = Y % 100
    cdef int c = (Y - (Y % 100)) / 100
 
    cdef double w
    w = (d + floor(2.6 * m - 0.2) + y + floor(y/4) + floor(c/4) - 2*c) % 7
 
    return int(w)
 
"""
Compute the number of months starting on a given day of the week in range of years
"""
cdef months_start_range(int day, int year_start,int year_end):
    cdef unsigned int total = 0
    cdef int year, month
    for year in range(year_start, year_end + 1):
        for month in range(1,13):
            if day_of_week(year, month, 1) == day: total += 1
    return total
 
start = time.time()
 
total = months_start_range(0,1901,2000)
 
elapsed = time.time() - start
 
print("%s found in %s seconds") % (total,elapsed)

The code is a bit longer, but it executes much faster.

171 found in 0.00387215614319 seconds

The Cython code runs roughly 18 times faster.

C Solution

The Cython code was used as a model to create more efficient C code. The only issue here is in maintaining the correct datatypes (not too hard, but compared to Cython it is a pain).

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
 
int day_of_week(int year, int month, int day) {
    // Using the Gaussian algorithm
    int d = day;
 
    double m = (double) ((month - 3) % 12 + 1);
    int Y;
    if(m > 10) Y = year - 1;
    else Y = year;
    int y = Y % 100;
    int c = (Y - (Y % 100)) / 100;
 
    int w = ((d+(int)floor(2.6*m-0.2)+y+ y/4 + c/4 -2*c))%7;
 
    return w; 
}
 
long months_start_range(int day, int year_start, int year_end) {
    unsigned long total = 0;
    int year, month;
    for(year = year_start; year < year_end; year++) {
        for(month = 1; month <= 12; month++) {
            if(day_of_week(year, month, 1)==day) total++;
        }
    }
    return total;
 
}
 
int main(int argc, char **argv) {
 
    int iter = 0;
    long total;
    while(iter < 100000) {
        total = months_start_range(0,1901,2000);
        iter++;
    }
    printf("Solution: %ld\n",total);
 
    return 0;
}

Notice that this executes the loop 100,000 times, as I’m trying to get a good idea of what the average runtime is. We compile with optimization and the [[[-lm]]] math option. We get the following result.

gcc -O3 -o problem-19 problem-19.c -lm
$ time ./problem-19
Solution: 171
 
real	0m6.240s

The C code runs roughly 62 times as fast as the Cython and roughly 1124 times as fast as the Python. Each iteration executes in about 6.2000e-5 seconds.


In Linux, assuming that the program xclip is installed, the following command will copy the contents of a file to the clipboard. I find this useful for doing such things as posting programs here on this site.

cat filename | xclip -selection clipboard

This problem is the same as Project Euler Problem 18, but is done on a much, much larger triangle. Problem 18 can be done with a depth-first backtrack approach, but I showed that a much easier solution solves it. I use that same approach here.

Problem:  By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.

3

7    4

2    4    6

8    5    9    3

That is, $3+7+4+9=23$. Find the maximum total from top to bottom of the triangle contained in this triangle (text file)

Solution: Keep It Simple Stupid

The sums traveling from top to bottom will be the same as the sums traveling from bottom to top. Start at the second to bottom row, and for each number $N$ in that row, do the following: Take the maximum of the numbers directly down and left or down and right from $N$, and add that to $N$. Now do the same for the third-to-last row, then fourth-to-last, and so on. We’re modifying the triangle as we go to produce maximum partial sums from the bottom, and the last stage will be to replace to top number in the triangle, which will then be the maximum sum. So easy.

Python Solution

#!/usr/bin/env python
 
import time
 
# read file
rows = []
FILE = open("triangle.txt", "r")
for blob in FILE: rows.append([int(i) for i in blob.split(" ")])
 
start = time.time()
 
for i,j in [(i,j) for i in range(len(rows)-2,-1,-1) for j in range(i+1)]:
    rows[i][j] +=  max([rows[i+1][j],rows[i+1][j+1]])
 
elapsed = time.time() - start
 
print "%s found in %s seconds" % (rows[0][0],elapsed)

When executed, we find:

7273 found in 0.0093560218811 seconds

Problem:  By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.

3

7    4

2    4    6

8    5    9    3

That is, $3+7+4+9=23$. Find the maximum total from top to bottom of the triangle below.

75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
19 01 23 75 03 34
88 02 77 73 07 63 67
99 65 04 28 06 16 70 92
41 41 26 56 83 40 80 70 33
41 48 72 33 47 32 37 16 94 29
53 71 44 65 25 43 91 52 97 51 14
70 11 33 28 77 73 17 78 39 68 17 57
91 71 52 38 17 14 91 43 58 50 27 29 48
63 66 04 68 89 53 67 30 73 16 69 87 40 31
04 62 98 27 23 09 70 98 73 93 38 53 60 04 23

NOTE: As there are only 16384 routes, it is possible to solve this problem by trying every route. However, Project Euler Problem 67 is the same challenge, with a triangle containing one-hundred rows; it cannot be solved by brute force and requires a clever method.

Solution: Keep It Simple Stupid

The sums traveling from top to bottom will be the same as the sums traveling from bottom to top. Start at the second to bottom row, and for each number $N$ in that row, do the following: Take the maximum of the numbers directly down and left or down and right from $N$, and add that to $N$. Now do the same for the third-to-last row, then fourth-to-last, and so on. We’re modifying the triangle as we go to produce maximum partial sums from the bottom, and the last stage will be to replace to top number in the triangle, which will then be the maximum sum.

Python Solution

I stored the triangle in the file “problem-18-data”.

#!/usr/bin/env python
 
import time
 
# define a recursive function to create partial sums by row
def recSumAtRow(rowData, rowNum):
    # iterate over the given row
    for i in range(len(rowData[rowNum])):
        # add the largest of the values below-left or below-right
        rowData[rowNum][i] += max([rowData[rowNum+1][i],rowData[rowNum+1][i+1]])
    # base case
    if len(rowData[rowNum])==1: return rowData[rowNum][0]
    # recursive case
    else: return recSumAtRow(rowData, rowNum-1)
 
# read in the data
rows = []
with open('problem-18-data') as f:
    for line in f:
        rows.append([int(i) for i in line.rstrip('\n').split(" ")])
 
 
start = time.time()
result = recSumAtRow(rows, len(rows)-2) # start at second to last row
elapsed = time.time() - start
 
 
print "%s found in %s seconds" % (result ,elapsed)

When executed, we find:

1074 found in 6.79492950439e-05 seconds

Problem: If the numbers 1 to 5 are written out in words: one, two, three, four, five, then there are $3+3+5+4+4=19$ letters used in total.

If all the numbers from 1 to 1000 (one thousand) inclusive were written out in words, how many letters would be used?

Note: Do not count spaces or hyphens. For example, 342 (three hundred and forty-two) contains 23 letters and 115 (one hundred and fifteen) contains 20 letters. The use of “and” when writing out numbers is in compliance with British usage.

Python Solution

This is a relatively straightforward problem, where I don’t see much of a point in trying to optimize it using Cython or C.

import time
 
start = time.time()
 
S = [0,3,3,5,4,4,3,5,5,4,3,6,6,8,8,7,7,9,8,8]
D = [0,3,6,6,5,5,5,7,6,6]
H = 7
T = 8
 
total = 0
for i in range(1,1000):
    c = i % 10 # singles digit
    b = ((i % 100) - c) / 10 # tens digit
    a = ((i % 1000) - (b * 10) - c) / 100 # hundreds digit
 
    if a != 0:
        total += S[a] + H # "S[a] hundred
        if b != 0 or c != 0: total += 3 # "and"
    if b == 0 or b == 1: total += S[b * 10 + c]
    else: total += D[b] + S[c]
 
total += S[1] + T
elapsed = time.time() - start
 
print "%s found in %s seconds" % (total,elapsed)

When executed, we get the following.

21124 found in 0.028256893158 seconds

Problem:  For any prime $p$, the number $N(p,q)$ is defined by $N(p,q)=\sum_{n=0}^q T_np^n$ with $T_n$ generated by the following random number generator:

\[S_0=290797,\quad S_{n+1}=S_n^2\text{ mod }50515093,\quad T_n=S_n\text{ mod }p\]

Let $Nfac(p,q)$ be the factorial of $N(p,q)$. Let $NF(p,q)$ be the number of factors $p$ in $Nfac(p,q)$. You are given that $NF(3,10000)\text{ mod }3^{20}=624955285$. Find $NF(61,10^7)\text{ mod }61^{10}$.

Approach

I want to make sure I understand the problem correctly. The first thing I am going to do is try to reproduce the $NF(3,10000)\text{ mod }3^{20}$ result. We need a few tools in order to do that. This may not be the fastest approach, but the sequence $S_i$ can be stored in a list and used for the generation of each $T_i$. That can be done with the following code. (We’ll first get something that we know works, and then try to make it faster.)

"""
Compute and return a list of elements in the sequence S_i
"""
def SeqGen(p,q):
    S = [290797]
    for i in range(q):
        S.append(S[i]**2 % 50515093)
    return S

Now, how long does it take to form the sequences involved? The check solution is where $p=3$ and $q=10000$.

import time
start = time.time()
S = SeqGen(3,10000)
elapsed = time.time() - start
print "%s seconds" % elapsed
0.0155670642853 seconds

The problem we’re trying to solve is where $p=61$ and $q=10^7$. That should take a bit longer to calculate.

start = time.time()
S = SeqGen(61,10**7)
elapsed = time.time() - start
print "%s seconds" % elapsed
4.84225988388 seconds

So, we have the sequence generated, and there’s really no way to get around that. (At least we’re storing the sequence for repeated use instead of computing each $S_i$ and $T_i$ from scratch when they are needed.) But, the problem here is that we’re considering a sum of $T_i$ that will end up being absolutely huge, and then we’re supposed to consider the factorial of that! Um, how big is that number that we’re going to be taking the factorial of?

start = time.time()
p = 3
q = 10000
S = SeqGen(p,q)
summand = 0
for i in range(q + 1):
    summand += (S[i] % p) * p**i
elapsed = time.time() - start
print "%s found in %s seconds" % (summand,elapsed)

That gives us the following HUGE number.

730559684346872487356308424468789381739382472243104512256910727234612146\
700534188007434920975662394367813322448421745790130109894995629026500374\
895011594186028875546881525383692190901857070383833601818290785177074579\
911743377542081623320139528346723997438895367511366325656859170069501892\
782888954053372302132596260888870616595592246062506668938053947188269010\
400553898377000621315460037720267470765385723925294615221799419878348634\
941697100134717918504949375867315531887209110886823403867771125638452655\
419963863463676818960699876392271245411287465034804904299149434829574363\
770468053954258743419442793921037036133984801241622614666871400079764805\
900359305411895238129224243327849475948701119129980806157988796433970429\
767434356953911596347357745083143620729117746678220096159021424980589161\
244984215592267871406818552160832420391632566752817475768883436736038842\
522000945336450020910807258558940388286502953874414804758436489924366403\
947302687002094708689116314943448323020960512070613984468171097011413793\
815513010319096054358875584978751325763116736267690167927829753762880555\
556492702423860742161506131781311489223910177639224622324264799978209792\
137516266242000584912257651924323790209254150744514288660959513060668035\
973225327239076026979768488390491444263297199193699839519881450073826859\
308373986510545986393307440612730124553757195296560232991330312047030365\
439140122350528090824215785078244449718422816025901223160225749799081881\
988430664995829601897673857754766692861499127643968276276623422007471468\
172908403505783118651898682362325563123083437839559958668622113095448097\
838918074208742911144089970606803829614386725740786000635368447203667950\
919203899690655183541386287564659803875910034486817250700675054831856742\
035247475944246669523244511489033077185989751177302056006622400102322784\
989845436319501972904106278352402251267862987843212251575486159493488621\
419004070699786286867072388160751058892527781090980214514736036820101984\
021575154483845837502770313982784775558983857516011444428943761280375624\
001952338205797561885398526808997109816279089774072149087781516188880702\
608233247734329861677612419964265181378869947517853199977037079229291120\
464625858155739642990433998513357423180305979877365154314842676895851795\
760576022818502562105123566673123108580786247852033516496746978362884952\
942857915868486448373175276670655594702334217465802036651469475179854123\
576334647440171805787579774605496977007910876530142161323040167528309301\
393422160543176919470220970372232485279534778999286622468261538067226561\
555097826633524553445830259964898613993557192047585752617190210902952965\
023859119496947566516318311636273280475133934551257178698148121016914217\
601635282811274292951764677139952566037328156035799548655023949268547284\
809803753032119590215375306603046215706177234216996024131857639663170520\
040443209001844694801103623963784228403949731061696509229838383141346412\
396840736097533127298186296546021026067585925153039919837286307851902378\
608106584940719518001399972148818734755824896042240388003248509954172230\
457875696005298711270591168072531988900269290984533265472277713834821552\
688627849442548426730677565761857748700949921996084459144768478980887361\
308570015798008752191153537433637864164115257325737451302878323350513806\
294637073800220289831038499134290753821824937995156685004102694409025050\
677506077860480650134181164350989085407457828530788590469734236728651105\
676262006578011380829166976653790493746638376475989405224819273383529385\
532863549938170077470214978095447081535900000723694048581893421673406131\
109533480459713318059149162064159952801277745096099653953042257676645734\
033409098349440753210352634374498292831878619296112463252781883559685421\
067134390006626690231502559111513270668170841237110979180067720007739317\
585970923377914844103905254029879477702295672447225795344081751213856674\
211906754788178078616402870316493923729160455051745371549667665511104695\
586653367386191427354784072444333174771530684870185759160433248337062693\
241539671265073012920758822756618788271313509549860869792768997274766995\
313142628609896166587869086036134423061561270679939737235411692366776742\
931830299523846661763825091147013824676353740657480723066545922030438851\
336813999896498288327780291320769916220990607522889614427866105922093737\
605505823866292382841719392400184241564923478503374222476232576395511449\
778688491299558506777311849692101013641615409477217600277465271536417044\
126580096565238246605862418216134296755489469606953685229425485095505145\
354340285520945045486011095376520333083833266407481003815982570630207580\
757117499879596617711378303148935948900974043533904686493402149617885320\
110564180214417176241073001553538422708853000908450781412877954930208499\
833275276736919712641870420034477342646070330401801478161590958515312176\
8998208978920939146 found in 0.629118919373 seconds

Clearly, we’re going to need to do things differently, since we can’t and don’t want to take the factorial of a number that big. So, let’s consider the mathematics behind this.

Organizing a Solution

First, the main point of the problem is to count the number of factors of some prime $p$ in $N!$ where $N$ is some big number. We don’t actually need to compute $N!$ in order to do that. Instead, we can find the factors of $p$ in $N!$ by just looking at the factors of $p$ in $n$ for $1\le n\le N$. Moreover, we’re considering this total number of factors of $p$ modulo some integer, and so we can trim down our calculations a bit. In fact, we don’t even need to consider the sum of the $T_i$, since we know that $T_i = S_i\text{ mod }p$ must be somewhere between 0 and $p-1$. We can put this all together in a relatively efficient Python script.

Python Solution

Here’s a first attempt at achieving the correct solution for the test case.

import time
 
"""
Compute and return a list of elements in the sequence T_i
"""
def T(p,q):
    S = [290797]
    for i in range(q):
        S.append(S[i]**2 % 50515093)
    T = [(i % p) for i in S]
    return T
 
"""
How many factors of p does N! have modulo mod?
"""
def num_p_in_factorial(p,N,mod=None):
    quo = N//p
    num = quo
    while quo >= p:
        quo //= p; num += quo
        if mod is not None and num > mod: num = num % mod
    return num
 
def Nfac(p,q,n):
    # form the sequence S
    Tseq = T(p,q)
    iter = q
    num_factors = 0
    while iter >= 0:
        if Tseq[iter] == 0:
            iter -= 1
            continue
        elif iter >= n: num_factors += num_p_in_factorial(p,Tseq[iter]*p**n)
        else: num_factors += num_p_in_factorial(p,Tseq[iter]*p**iter)
        iter -= 1
    if num_factors >= p**n: num_factors = num_factors % p**n
    return num_factors
 
start = time.time()
N = Nfac(3,10000,20)
 
elapsed = (time.time() - start)
print "found %s in %s seconds" % (N,elapsed)

When executed, we get the following result.

found 624955285 in 0.059849023819 seconds

This would work to solve the bigger problem, so let’s see how fast an initial solution is.

start = time.time()
N = Nfac(61,10**7,10)
 
elapsed = (time.time() - start)
print "found %s in %s seconds" % (N,elapsed)

And we find the solution to the problem in around 72 seconds.

found 605857431263981935 in 72.1750190258 seconds

I’m not happy with that timing. Since we’re really only calculating the num_p_in_factorial routine for exponents of $p$ that are below $n$ and we know that each $T_i$ is below $p$, we should be able to cache some of these results to limit redundant calculations. Let’s do that.

import time
 
"""
Compute and return a list of elements in the sequence T_i
"""
def T(p,q):
    S = [290797]
    for i in range(q):
        S.append(S[i]**2 % 50515093)
    T = [(i % p) for i in S]
    return T
 
"""
How many factors of p does N! have modulo mod?
"""
def num_p_in_factorial(p,N,mod=None):
    quo = N//p
    num = quo
    while quo >= p:
        quo //= p; num += quo
        if mod is not None and num > mod: num = num % mod
    return num
 
def Nfac(p,q,n):
    # form the sequence S
    Tseq = T(p,q)
    iter = q
    num_factors = 0
    num_factors_cached = [None] * p
    while iter >= 0:
        t = Tseq[iter]
        if t == 0:
            iter -= 1
            continue
        elif iter >= n:
            if num_factors_cached[t] is None:
                num_factors_cached[t] = num_p_in_factorial(p,t*p**n)
            num_factors += num_factors_cached[t]
        else: num_factors += num_p_in_factorial(p,Tseq[iter]*p**iter)
        iter -= 1
    if num_factors >= p**n: num_factors = num_factors % p**n
    return num_factors
 
start = time.time()
#N = Nfac(3,10000,20)
N = Nfac(61,10**7,10)
 
elapsed = (time.time() - start)
print "found %s in %s seconds" % (N,elapsed)

Now, we get something that is a bit more reasonable.

found 605857431263981935 in 10.3537530899 seconds

Problem: $2^{15}=32768$ and the sum of its digits is $3+2+7+6+8=26$.
What is the sum of the digits of the number $2^{1000}$?

Sage Solution

Sage’s built-in Python and functions makes this easy.

import time
 
start = time.time()
a = 2^1000
s = sum(a.digits())
elapsed = time.time() - start
 
print "%s found in %s seconds" % (s,elapsed)

It executes pretty quickly too.

1366 found in 0.000343084335327 seconds

An Easy Python Solution

Python itself also makes this problem too easy, due to string functions.

import time
 
def pow2sum(exp):
    pow = list(str(2**1000))
    return sum([int(i) for i in pow])
 
start = time.time()
n = pow2sum(1000)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

And it is fast:

1366 found in 0.000911951065063 seconds

Python Without Strings Attached

Let’s do our arithmetic without string functionality. Then, I’d note that $2^{1000}<10^{1000}$ and so we know that, at most, we're dealing with 1001 digits. So, we can create a routine to multiply 2 by itself 1000 times, maintaining the result at each step in a list instead of a single integer. (This is how you'd be forced to do things in C, where arbitrary length integers are pure fiction.) Since we're only multiplying by 2 at each iteration, we know that we'll either carry a zero or one to the next stage... which does make this routine a bit more simple than your typical multiply-by-list routine.

import time
 
def pow2sum(exp):
    L = [0] * exp # make a list exp long
    L[0] = 1
    for power in range(exp):
        carry, add = 0, 0
        for index in range(exp):
            prod = L[index] * 2 + carry
            if prod > 9:
                carry = 1
                prod = prod % 10
            else: carry = 0
            L[index] = prod
    return sum(L)
 
start = time.time()
n = pow2sum(1000)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

It runs relatively quickly, although not as quickly as the string version runs.

1366 found in 0.361020803452 seconds

Cython Solutions

We can first trivially rewrite the string Python version.

%cython
 
import time
 
cdef pow2sum(unsigned int exp):
    cdef list pow = list(str(2**1000))
    return sum([int(i) for i in pow])
 
start = time.time()
n = pow2sum(1000)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

When executed, we find that the string Cython code runs about 1.14 times as fast as the string Python code.

1366 found in 0.000799894332886 seconds

Of course, I’m more interested in seeing how much faster the arithmetic version runs.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef pow2sum(unsigned int exp):
    cdef int *L = <int *>malloc(exp * sizeof(int))
    cdef unsigned int power = 0,index = 0
    cdef unsigned int prod, carry, add
    while index < exp:
        L[index] = 0
        index += 1
    L[0] = 1
    while power < exp:
        carry, add, index = 0, 0, 0
        while index < exp:
            prod = L[index] * 2 + carry
            if prod > 9:
                carry = 1
                prod = prod % 10
            else: carry = 0
            L[index] = prod
            index += 1
        power += 1
    cdef int sum = 0
    index = 0
    while index < exp:
        sum += L[index]
        index += 1
    free(L)
    return sum
 
start = time.time()
n = pow2sum(1000)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

When executed, we get the following result.

1366 found in 0.00858902931213 seconds

Problem: Starting in the top left corner of a $2\times 2$ grid and moving only down and right, there are 6 routes to the bottom right corner.

How many routes are there through a $20\times 20$ grid?

A Great Interview Problem

This is precisely the sort of problem I expect to see in a technical coding interview, and knowing the various ways of solving a problem such as this will help you get far in those situations. But, even if you’re an experienced coder, the solutions may not be obvious at first. I want to walk through a few potential solutions and see how they perform.

Recursive Python Solution

I think the easiest solution, and one you should know (but possible don’t) is the recursive approach. The main idea here is to develop a function that will call itself, inching along to the right and down in all possible combinations, returning a value of 1 whenever it reaches the bottom-right and summing all of those 1s along the way. You should convince yourself that the procedure actually terminates (at what we call “the base case”) and returns the correct solution. Try doing it on paper on the 2×2 grid, or a 3×3 grid, and see what happens. Here’s what the Python code looks like.

#!/usr/bin/python
 
import time
 
 
gridSize = [20,20]
 
def recPath(gridSize):
    """
    Recursive solution to grid problem. Input is a list of x,y moves remaining.
    """
    # base case, no moves left
    if gridSize == [0,0]: return 1
    # recursive calls
    paths = 0
    # move left when possible
    if gridSize[0] > 0:
        paths += recPath([gridSize[0]-1,gridSize[1]])
    # move down when possible
    if gridSize[1] > 0:
        paths += recPath([gridSize[0],gridSize[1]-1])
 
    return paths
 
start = time.time()
result = recPath(gridSize)
elapsed = time.time() - start
 
print "result %s found in %s seconds" % (result, elapsed)

That’s great, and it will actually work, but it may take some time. Actually it takes a lot of time. By that, I mean it really, really takes a lot of time. When we run it on the 2×2 output, we get the following.

result 6 found in 9.05990600586e-06 seconds

When we run it on the 20×20 input, as the problem requires, it runs for about 4 hours before I kill it. Python is OK at recursive function calls, and it can handle/collapse the memory required moderately well, but what we’re doing here is manually constructing ALL possible paths to a solution, which isn’t incredibly efficient. Still, during a technical interview, this is definitely the first idea for a solution that should pop into your head. It’s quick and easy to write, and many problems can be solved like this.

Dynamic Python Solution

Our recursive approach suffers from the problem that we’re doing a lot of similar operations over and over. Can we learn anything from the smaller cases and build up from there? In this case, the answer is yes, but it requires us to build up some mathematics a bit more. This is actually quite easy if approached correctly. The idea is to construct the solution recursively, but differently this time.

Claim: Let $n$ be any natural number and consider the 2-dimensional sequence $S_{i,j}$ defined by

\[S_{i,j}=\begin{cases}1 &\text{ if }j=0\\ S_{i,j-1}+S_{i-1,j} &\text{ if }0<j<i\\ 2S_{i,j-1} &\text{ if }i=j\end{cases}\]

where $0\le i\le n$ and $0\le j\le i$. Then the number of non-backtracking paths from top-left to bottom-right through an $n\times n$ grid is $S_{n,n}$.

Proof: Consider a grid of $m$ rows and $n$ columns (we do not need to assume that the grid is square). Counting from the upper-left and starting at zero, denote the intersection/node in the $i$-th row and $j$-th column by $N_{i,j}$. Thus, the upper-left node is $N_{0,0}$, the bottom-left is $N_{m,0}$ and the bottom-right is $N_{m,n}$. Clearly, the number of paths from $N_{0,0}$ to any node along the far left or far top of the grid is only 1 (since we may only proceed down or left). Now, consider how many paths there are to $N_{1,1}$. We must first travel through $N_{0,1}$ or $N_{1,0}$. This yields only two paths to $N_{1,1}$. We can continue this process. In order to determine the total number of paths to any node $N_{i,j}$, we only need to sum together the total number of paths to $N_{i,j-1}$ and $N_{i-1,j}$. The process is understood graphically in the following diagram, where each new integer represents the number of paths to a node.

Thus, in a $4\times 4$ grid, there are 70 non-backtracking paths. How does this relate to the sequence $S_{i,j}$? Simply put, $S_{i,j}$ is the number of paths to node $N_{i,j}$. If we write out the sequence $S_{i,j}$ for $0\le i\le 4$ and $0\le j\le i$, we simply obtain the lower diagonal sequence embedded in the diagram above.

\[\begin{array}{ccccc} S_{0,0} = 1 & & & & \cr S_{1,0} = 1 & S_{1,1}=2 & & & \cr S_{2,0} = 1 & S_{2,1}=3 & S_{2,2}=6 & & \cr S_{3,0} = 1 & S_{3,1}=4 & S_{3,2}=10 & S_{3,3}=20 & \cr S_{4,0} = 1 & S_{4,1}=5 & S_{4,2}=15 & S_{4,3}=35 & S_{4,4}=70\end{array}\]

That completes the proof.

Let’s code that into a Python solution and see how fast it runs. My bet is that this will be considerably faster than our initial recursive solution. We can record the sequence $S_{i,j}$ as a single dimensional list that is simply rewritten at each iteration.

#!/usr/bin/python
 
import time
 
def route_num(cube_size):
    L = [1] * cube_size
    for i in range(cube_size):
        for j in range(i):
            L[j] = L[j]+L[j-1]
        L[i] = 2 * L[i - 1]
    return L[cube_size - 1]
 
start = time.time()
n = route_num(20)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

When executed, we get the following.

137846528820 found in 0.000205039978027 seconds

Cython Solution

We’ll recode things in Cython and see how much faster we can get the result returned.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef route_num(short cube_size):
    cdef unsigned long *L = <unsigned long *>malloc((cube_size + 1) * sizeof(unsigned long))
    cdef short j,i = 0
    while i <= cube_size:
        L[i] = 1
        i += 1
    i = 1
    while i <= cube_size:
        j = 1
        while j < i:
            L[j] = L[j]+L[j-1]
            j += 1
        L[i] = 2 * L[i - 1]
        i += 1
    cdef unsigned long c = L[cube_size]
    free(L)
    return c
 
start = time.time()
cdef unsigned long n = route_num(20)
elapsed = (time.time() - start)
print "%s found in %s seconds" % (n,elapsed)

We now get the result a bit more quickly.

137846528820 found in 2.21729278564e-05 seconds

The Cython code executes roughly 9 times as fast as the Python.


Problem: The following iterative sequence is defined for the set of positive integers:

\[n\rightarrow\begin{cases}n/2 & n \text{ even}\\ 3n+1 & n \text{ odd}\end{cases}\]

Using the rule above and starting with 13, we generate the following sequence:

\[13\rightarrow 40\rightarrow 20\rightarrow 10\rightarrow 5\rightarrow 16\rightarrow 8\rightarrow 4\rightarrow 2\rightarrow 1\]

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain? Note: Once the chain starts the terms are allowed to go above one million.

Idea Behind a Solution

I’ll refer to the “Collatz length of $n$” as the length of the chain from an integer $n$ to 1 using the above described sequence. If we were to calculate the Collatz length of each integer separately, that would be incredibly inefficient. In Python, that would look something like this.

First Python Solution

import time
 
start = time.time()
 
def collatz(n, count=1):
    while n > 1:
        count += 1
        if n % 2 == 0:
            n = n/2
        else:
            n = 3*n + 1
    return count
 
max = [0,0]
for i in range(1000000):
    c = collatz(i)
    if c > max[0]:
        max[0] = c
        max[1] = i
 
elapsed = (time.time() - start)
print "found %s at length %s in %s seconds" % (max[1],max[0],elapsed)

Now, this will actually determine the solution, but it is going to take a while, as shown when we run the code.

found 837799 at length 525 in 46.6846499443 seconds.

A Better Python Solution

What I’m going to do is to cache any Collatz numbers for integers below one million. The idea is that we can use the cached values to make calculations of new Collatz numbers more efficient. But, we don’t want to record every single number in the Collatz sequences that we’ll be using, because some of the sequences actually reach up into the hundreds of millions. We’ll make a list called TO_ADD, and we’ll only populate that with numbers for which Collatz lengths are unknown. Once known, the Collatz lengths will be stored for repeated use.

import time
 
start = time.time()
 
limit = 1000000
collatz_length = [0] * limit
collatz_length[1] = 1
max_length = [1,1]
 
for i in range(1,1000000):
    n,s = i,0
    TO_ADD = [] # collatz_length not yet known
    while n > limit - 1 or collatz_length[n] < 1:
        TO_ADD.append(n)
        if n % 2 == 0: n = n/2
        else: n = 3*n + 1
        s += 1
    # collatz_length now known from previous calculations
    p = collatz_length[n]
    for j in range(s):
        m = TO_ADD[j]
        if m < limit:
            new_length = collatz_length[n] + s - j
            collatz_length[m] = new_length
            if new_length > max_length[1]: max_length = [i,new_length]
 
elapsed = (time.time() - start)
print "found %s at length %s in %s seconds" % (max_length[0],max_length[1],elapsed)

This should return the same result, but in significantly less time.

found 837799 at length 525 in 5.96128201485 seconds

A First Cython Solution

If we take our original approach of computing each Collatz length from scratch, this might actually work slightly better in Cython.

%cython
 
import time
 
cdef collatz(unsigned int n):
    cdef unsigned count = 1
    while n > 1:
        count += 1
        if n % 2 == 0:
            n = n/2
        else:
            n = 3*n + 1
    return count
 
cdef find_max_collatz(unsigned int min, unsigned int max):
    cdef unsigned int m = 1
    cdef unsigned long num = 1
    cdef unsigned int count = 1
    cdef unsigned long iter = min
    while iter < max:
        count = collatz(iter)
        if count > m:
            m = count
            num = iter
        iter += 1
    return num
 
start = time.time()
max_found = find_max_collatz(1,1000000)   
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_found,elapsed)

In fact, when executed, we find that it is significantly better than our efficient Python code.

found 837799 in 0.604798078537 seconds

This just goes to show that even low efficiency machine/compiled code can drastically outperform efficient Python. But, how far can we take this Cython refinement? What if we were to recode our more efficient algorithm in Cython? It may look something like this.

A Better Cython Solution

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef find_max_collatz(unsigned long int max):
    cdef int *collatz_length = <int *>malloc(max * sizeof(int))
    cdef list TO_ADD # holds numbers of unknown collatz length
    cdef unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0
 
    # set initial collatz lengths
    iter = 0
    while iter < max:
        collatz_length[iter] = 0
        iter += 1
    collatz_length[1] = 1
 
    # iterate to max and find collatz lengths
    iter = 1
    while iter < max:
        n,s = iter,0
        TO_ADD = []
        while n > max - 1 or collatz_length[n] < 1:
            TO_ADD.append(n)
            if n % 2 == 0: n = n/2
            else: n = 3*n + 1
            s += 1
        # collatz length now known from previous calculations
        p = collatz_length[n]
        j = 0
        while j < s:
            m = TO_ADD[j]
            if m < max:
                new_length = collatz_length[n] + s - j
                collatz_length[m] = new_length
                if new_length > max_length:
                    max_length = new_length
                    ind = m
            j += 1
        iter += 1
 
    free(collatz_length)
    return ind
 
start = time.time()
max_collatz = find_max_collatz(1000000)
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_collatz,elapsed)

This gives us some relatively good results:

found 837799 in 0.46523308754 seconds

Still, it isn’t a great improvement over the naive Cython code. What’s going on? I bet that the TO_ADD data structure could be changed from a Python list (notice the “cdef list” definition) to a malloc’d C array. That will be a bit more work, but my gut instincts tell me that this is probably the bottleneck in our current Cython code. Let’s rewrite it a bit.

%cython
 
import time
from libc.stdlib cimport malloc, free
 
cdef find_max_collatz(unsigned long int max):
    cdef int *collatz_length = <int *>malloc(max * sizeof(int))
    cdef int *TO_ADD = <int *>malloc(600 * sizeof(int))
    cdef unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0
 
    # set initial collatz lengths and TO_ADD numbers
    iter = 0
    while iter < max:
        collatz_length[iter] = 0
        iter += 1
    collatz_length[1] = 1
    iter = 0
    while iter < 600:
        TO_ADD[iter] = 0
        iter += 1
 
    # iterate to max and find collatz lengths
    iter = 1
    while iter < max:
        n,s = iter,0
        while n > max - 1 or collatz_length[n] < 1:
            TO_ADD[s] = n
            if n % 2 == 0: n = n/2
            else: n = 3*n + 1
            s += 1
        # collatz length now known from previous calculations
        p = collatz_length[n]
        j = 0
        while j < s:
            m = TO_ADD[j]
            TO_ADD[j] = 0
            if m < max:
                new_length = collatz_length[n] + s - j
                collatz_length[m] = new_length
                if new_length > max_length:
                    max_length = new_length
                    ind = m
            j += 1
        iter += 1
 
    free(collatz_length)
    free(TO_ADD)
    return ind
 
start = time.time()
max_collatz = find_max_collatz(1000000)
elapsed = (time.time() - start)
print "found %s in %s seconds" % (max_collatz,elapsed)

Now, when we execute this code we get the following.

found 837799 in 0.0465848445892 seconds

That’s much better. So, by using Cython and writing things a bit more efficiently, the code executes 1119 times as fast.

C Solution

If I structure the algorithm in the same way, I don’t expect to gain much by rewriting things in C, but I’ll see what happens.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
 
int find_max_collatz(unsigned long max) {
    unsigned int collatz_length[max];
    unsigned int TO_ADD[600];
    unsigned long iter, j, m, n, s, p, ind, new_length, max_length = 0;
 
    // set initial collatz lengths and TO_ADD numbers
    iter = 0;
    while(iter < max) {
        collatz_length[iter] = 0;
        iter++;
    }
    collatz_length[1] = 1;
    iter = 0;
    while(iter < 600) {
        TO_ADD[iter] = 0;
        iter++;
    }
    // iterate to max and find collatz lengths
    iter = 1;
    while(iter < max) {
        n = iter;
        s = 0;
        while(n > max - 1 || collatz_length[n] < 1) {
            TO_ADD[s] = n;
            if(n % 2 == 0) n = n/2;
            else n = 3*n + 1;
            s++;
        }
        // collatz length now known from previous calculations
        p = collatz_length[n];
        j = 0;
        while(j < s) {
            m = TO_ADD[j];
            TO_ADD[j] = 0;
            if(m < max) {
                new_length = collatz_length[n] + s - j;
                collatz_length[m] = new_length;
                if(new_length > max_length) {
                    max_length = new_length;
                    ind = m;
                }
            }
            j++;
        }
        iter++;
    } 
    return ind;
}
 
 
int main(int argc, char **argv) {
    unsigned int max, i;
    time_t start, end;
    double total_time;
 
    start = time(NULL);
 
    for(i=0;i<1000;i++) max = find_max_collatz(1000000);
 
    end = time(NULL);
    total_time = difftime(end,start);
 
    printf("%d found in %lf seconds.\n",max,total_time);
 
    return 0;
}

We’re using 1,000 iterations to try and get a good idea of how this runs.

$ time ./problem-14
837799 found in 38.000000 seconds.
 
real	0m38.871s
user	0m38.120s

So, in C we’re getting each iteration in roughly 0.038 seconds, which is ever so slightly faster than the Cython code.