Sieve of Eratosthenes

Thursday, June 08, 2006

Sieve of Eratosthenes

At work we had a little challenge in December to see who could come up with a program to count the number of primes between any two numbers (0 to 2^32-1 inclusive) as fast as possible. To this end I wrote this optimized Sieve of Eratosthenes (algorithm) that counts all the primes from 0 to 4,294,967,295 in about 13.75 seconds on my (admittedly fast) development machine at work.

If you come up with any improvements to it, let me know!

(2/2/2007 - I'm going to go ahead and add the link for the multicore Sieve of Eratosthenes here. On a 3GHz dual core Xeon, 7.15s!)

8 comments:

Anonymous said...: Hi Paul, cool blog!

Just thought I'd add that there are several known prime counting functions, Pi(x), that return the number of primes less (and/or equal) to x. One such function is Lehmer's Formula:

http://mathworld.wolfram.com/LehmersFormula.html

These functions are pretty straightforward to implement and would be much faster than a sieve. Given an implementation of such a function, the asked for value is directly given by Pi(upper limit) - Pi(lower limit).; 2:19 AM
Paul Senzee said...: Thank you for this information, that's interesting! One question - do you know if that's exact or approximate? I know a variety of prime count estimation functions exist, but I did not believe an exact function existed.

One of my colleagues had an approximation approach that relied on long splines which was interested.; 11:50 AM
Anonymous said...: A little late but, the above formula gives an upper boundary on the number of primes less than x, so it's not an exact number of primes.

Alexander; 10:51 AM
Anonymous said...: primegen uses the Sieve of Aitken, the following is output on a P4 3.2Hgz:

203280221 primes up to 4294967295.
Timings are in ticks. Nanoseconds per tick: approximately 0.313275.
Overall seconds: approximately 7.785068.

I'm somewhat surprised that Sieve of Eratosthenes is that fast, but it won't compile under gcc, I think because of the inline assembly. After uncommenting the C before the inline asm and fixing some scope errors, I got it to compile cleanly, but it produces incorrect results (always returns 1), so I guess something is messed up. Anyway, if your work computer is comparable to mine, it looks like the Sieve of Aitken is about 2x faster, without any inline assembly.

http://cr.yp.to/primegen.html; 2:16 PM
Paul Senzee said...: I checked out the Sieve of Atkin on Wikipedia and I can't say that I really understood it. Anyhow I have a multicore version of my SofE that runs in 7.15s on my dual core work machine - kind of cheating. ;)

This Sieve of Eratosthenes implementation can be that fast in this case because:

A.) It's limited to 32 bits.
B.) It uses every trick in the book to make it run like hell.

It really is just an example of a very simple brute force algorithm being optimized to the point of absurdity.; 7:25 PM
Anonymous said...: This is a very late response, but yes, the prime counting functions are exact (despite anonymous statement to the contrary). See e.g.:

http://mathworld.wolfram.com/PrimeCountingFunction.html

These funcions have been discussed ad nauseum in sci.math in the various "JSH" threads over the last N years.

Here's one of the posts which contain a lot of useful information (if you're into this sort of thing):

http://groups.google.com/group/sci.physics/msg/fc3424e2de072253?dmode=source&hl=en&output=gplain

BTW, it would be helpful if the comments were date stamped, and listed somewhere on the front page (so you knew there was a reply). I just happened to slip on a banana peel again, ending up here for the second time! ;); 3:44 AM
Anonymous said...: Recompiled the multi-core Sleeve of Eratosthenes, with a tiny improvement and came in at 1.640 seconds on quadcore 9650 at 3.67ghz running in 64-bit mode using intel compiler. Don't have any of your blogger id's but call me Thomas; 8:07 PM
Unknown said...: I realize this thread is rather old, but I stumbled on it and while in a curious mood, modified senzee.c to compile on gcc. It wasn't hard to port the ASM code to gcc's syntax, but it wasn't necessary since gcc -O3 gives quite a bit faster code with the for loop than with using the hand-rolled ASM. It's really hard to beat optimizing compilers these days. I also made it a bit faster by using a lookup table of byte masks rather than computing 1 << (i & 7). This is a trick I learned somewhere when I wrote my own sieve. I have no problem sharing the modified code, but I don't know how to attach something to a post...

Also, while your sieve is nicely optimized and fast, there is one significant optimization lacking: small prime pre-sieving using a wheel. I use this in my sieve for a speedup of over 4x. The code isn't supremely documented, but is available as part of the YAFU integer factorization package at: http://sites.google.com/site/bbuhrow/; 12:25 PM

Senzee 5 - Paul Senzee

Thursday, June 08, 2006

Sieve of Eratosthenes

8 comments:

Links