In an earlier post we started generating prime candidates testing only numbers of the form 6k+1 or 6k-1 less than or equal to the square root of the next integer. The reason we were doing this is that all prime numbers greater than 3 are of the form 6k+1 or 6k+2 and we only have to modulo test primes. The reason we only have to test primes is that the Fundamental Theorem of Arithmetic indicates that all non-prime numbers greater than one is the product of primes.
What that means is that we only have to modulo test prime numbers. Of course in pursuing that particular optimization we actually neglected to test first whether the next integer we were testing was of the form 6k+1 or 6k-1. The point of this post is to highlight how in the course of developing SQL (or any other code for that matter), we can read our requirements in a number of ways and will need to step out of it and reconsider whether our first reading was complete or even correct.
Consider the following statements: “We only need to modulo test primes for a given integer.” and “We need to test whether or not their exists a prime factor for a given integer.” These two statement mean essentially the same thing. When combined with the form of primes being 6k+1 or 6k-1 led us to skip testing the integer itself for being of that form, but assume we have already filled a prime table to a certain number and are testing the prime candidates. Those two statements can lead us to two very different approaches to SQL.
The first statement: “We only need to modulo test primes for a given integer.” could lead us to write something like:
DECLARE @n INT
SELECT TOP 1 1 FROM prime_candidates pc INNER JOIN primes p ON pc.p % p.p = 0 WHERE pc.p=@n
Whereas the second statement: “We need to test whether or not their exists a prime factor for a given integer.” might lead us to write something like:
SELECT TOP 1 1 FROM primes p WHERE @n % p = 0
Since our requirement was to test for prime factors of a given integer and not to produce a set of prime factors performing we do not need the non-equijoin that produces a less efficient execution plan.
Seeks that lead to lookups are usually more efficient than scans. Of course unnecessary operations are inherently inefficient. In either case it is important to test that we do in fact get the results we expect.