Let’s get right to the 3rd problem on ProjectEuler.net.

Question 3:

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143 ?

That’s a good question, and one that had me scratching my head for a second or two. I will be the first to admit that my knowledge of math revolving around primes is lacking. As such, my very first Prime algorithm was very bad. I attempted to find all of the primes between 1 and limit/2 (in this case: 300425737571). I then attempted to perform a modulus on each prime (iteratively) to find if it was a factor. Once the loop completed, I would know my largest prime factor. Sound in theory, terrible in practice. 20 minutes into the program running, I was still calculating all of the prime numbers (granted, that calculation could have used some optimization as well). 10 minutes after that, I was finally testing my numbers against my limit, but at that point it was a forgone conclusion that this was not going to work. I am not even going to list that code here as it would just be a waste of time. This leads me to my revised code.

Psuedo-Prime: After doing some research and trying a few things out, I came up with the following:

double Run()
{
    limit = 600851475143;
    double factor = 2;

    while(factor < limit)
    {
        if(fmod(limit, factor) == 0)
            limit /= factor;
        else
            factor += 1;
    }

    return limit;
}

Results:

Limit             Speed
600851475143      216 microseconds
9512811111        137 milliseconds
600851475111      95 minutes

I realize now how simple this really is. Assuming you start at 2 and divide it out as many times as possible (very important) before moving on, by the time your factor is bigger than your limit, it will be prime (you have removed all of the smaller non-prime factors already). This algorithm finds the solution to the given question in 216 microseconds. It wasn’t until I tried to gauge the efficiency for a larger number that I realized just how tricky that was going to be. You see, a bigger number does not always mean a more difficult number. For example, given this algorithm, the number 600851475143 is found quickly, but the number 9512811111 (a number ~63 times smaller than the first) takes 137 milliseconds. This is because the original number has a relatively small (4 digits) prime factor, however the prime factor of the second number is 4510579. Obviously it takes many more iterations to solve the second number than the first. Even a number like 1*10^300 would finish in 0 milliseconds because the largest prime factor is 5 (that goes for any number in the 10,100,1000,10000…. series). Imagine if I tried the same algorithm on the number 600851475111. It is still a smaller number than our first, but its largest prime factor is 200283825037. Completing that number took 95 minutes!

Optimization 1: When thinking about optimizing this problem, the first thing I considered was recursion. If I could quickly get all of the primes between 1 and limit/2, I could start at the end and work my way down. In fact, if I did that, I could solve the question for my third number (600851475111) in no time. Would this be an improvement in efficiency? The answer is yes and no. Flipping the order doesn’t do much to improve the algorithm, it just flips the order. With recursion, I would find answers to problems that had huge prime factors very quickly, but finding the prime factor of 1*10^300 would take a vast amount of time since the largest prime factor is in fact very small. This excludes the time “tax” that would be involved in getting the primes to begin with (some languages have fast built in methods, some don’t).

With this in mind, I decide to continue with iteration and try to find a way to cut the number of loops to improve performance. The first thing I notice is that we are starting with the prime 2 and incrementing by 1 from there. Since 2 is the only even prime, we can do 2 first and then start with 3 and increment by 2 thereafter. This should theoretically cut the number of iterations by half (a good number in my book). The code:

double Run()
{
    limit = 600851475143;
    double factor = 2;

    while(factor == 2)
    {
        if(fmod(limit, factor) == 0)
            limit /= factor;
        else
            factor += 1;
    }
    while(factor < limit)
    {
        if(fmod(limit, factor) == 0)
            limit /= factor;
        else
            factor += 2;
    }

    return limit;
}

Results:

Limit             Speed
600851475143      107 microseconds
9512811111        69 milliseconds
600851475111      48 minutes

While it is more code, it can be seen to have increased by speed by a ratio of about 1:2 (which is exactly what we guessed).

Optimization 2: The Project Euler website has an optimization mentioned that I touched upon at the beginning of this process. Originally, I was finding all of the prime numbers between 1 and limit / 2 (where limit is the given number). This was in the right track, but not exactly correct. The actual rule is that no number ‘n’ can have more than 1 prime factor higher than Sqrt(n). This in essence reduces the number of iterations required by a square. I believe my code also has some minor improvements over the code proposed by Project Euler in that I do not store the “last factor” and instead directly modify the variable ‘limit’ which will eventually contain the answer. The optimization is minimal, but it saves a couple extra comparisons and variable swaps. Here is my code:

double Run()
{
    limit = 600851475143;
    double factor = 2;
    double maxFactor = sqrt(limit);

    while(factor == 2)
    {
        if(fmod(limit, factor) == 0)
            limit /= factor;
        else
            factor += 1;
    }
    while(factor < limit && factor < maxFactor)
    {
        if(fmod(limit, factor) == 0){
            limit /= factor;
            maxFactor = sqrt(limit);
        }
        else
            factor += 2;
    }

    return limit;
}

Results:

Limit             Speed
600851475143      25 microseconds
9512811111        36 microseconds
600851475111      9 milliseconds

Obviously this method works very well. We shaved a ton of time off of our calculations. Comparing Optimization 2 against our Prime algorithm yields the following speed ratios (in order): 1:9, 1:3805, 1:633333

I think we can call it a day.

Optimization 3: So, it turns out we couldn’t call it a day. For whatever reason, I had just assumed that doubles would be the best variable to use since we were dealing with very large numbers. I neglected the 64-bit integer variable as a possible solution. Even though _int64 needs to be converted to double to use the sqrt function, it allows us to use the much faster % operator (as opposed to fmod). The results speak for themselves. The code:

double Run()
{
	_int64 limit = 600851475111;
	_int64 factor = 2;
	_int64 maxFactor = sqrt((double)limit);

	while(factor == 2)
	{
		if(limit % factor == 0)
			limit /= factor;
		else
			factor += 1;
	}
	while(factor < limit && factor < maxFactor)
	{
		if(limit % factor == 0){
			limit /= factor;
			maxFactor = sqrt((double)limit);
		}
		else
			factor += 2;
	}

	return limit;
}

Results:

Limit             Speed
600851475143      9 microseconds
9512811111        13 microseconds
600851475111      2 milliseconds

As you can see, we have speed things up significantly. Now, dare I say, we can call it a day (again)?