OPTION (MAXRECURSION 32767)

Earlier this year I posted a series of posts about prime numbers to demonstrate coding principles and techniques. As recursion is typically a very efficient approach and in my last post on the subject I stated that we could create a stored procedure to call our stored procedure recursion that pulls from our function recursion. Of course as a parent procedure that procedure would contribute to the maximum procedure nesting of 32. That approach would look something like this:

-- =============================================
-- Author:		Thomas Marshall
-- Create date: 4/23/2015
-- Description:	
-- =============================================
CREATE PROCEDURE iteratingParent3100PrimeTestCalls 
	 @max_iterations INT = 0,
	 @demo_override INT = 2
AS
BEGIN
	SET NOCOUNT ON;
	DECLARE @n INT = 1;
WHILE @n <= @max_iterations
BEGIN
	EXECUTE [dbo].[thirtyTwoHundredPrimalityTests] @demo_override --2
	SET @n += 1; 
END	
END
GO

With the default value for the max iterations parameter nothing will happen. If we pass the demo override parameter a value of 1 or less we will get an error because our procedure will nest more than 32 levels. As you may know iteration is dramatically less efficient than recursion. If you didn’t already guess from the title of this post we are going to try and get more out of the recursive part. What’s great is just how easy it is.

First we are going to make a variation on our recursive CTE function. You’ll notice the only difference from the hundred more prime tests function besides the name is that we restrict our CTE to 2 to the power of 15 rows. It’s important to note that the query inside the function will throw an error without OPTION (MAXRECURSION 23767) and the create statement will error with it. It’s tangential to note that (2^15)-1 is the largest signed 16 bit integer and the zip code for Paisley Florida.

CREATE FUNCTION [dbo].[sixteenBitsMorePrimeTests](@n int)
RETURNS TABLE
AS RETURN
(

WITH recurseMore AS (
SELECT @n AS p, 1 AS n 
UNION ALL 
SELECT p + 1 , n + 1 FROM recurseMore
	WHERE n < 32768
					)
SELECT p, n FROM recurseMore
WHERE dbo.intIsPrime(p) = 1

)

GO

Fortunately for us we can specify the OPTION (MAXRECURSION 32767) in the call to the procedure so we can avoid erring on the recursion.

-- =============================================
-- Author:		Thomas W Marshall
-- Create date: 
-- Description:	
-- =============================================
CREATE PROCEDURE [dbo].[thirtyTwoTimeTwoToThePowerOfSixteenPrimalityTests] 
	@c INT = 1
AS
BEGIN
	SET NOCOUNT ON;
DECLARE @n INT; 
SELECT @n = MAX(p) FROM dbo.primes;
INSERT INTO dbo.primes 
	SELECT p FROM dbo.[sixteenBitsMorePrimeTests](@n) OPTION (MAXRECURSION 32767);
SET @n = @c + 1;
IF @n < 32 
EXEC thirtyTwoHundredPrimalityTests @n ;    
	
END
GO

Now if we like we can have our new procedure called iteratively to keep inserting primes into our table. The hardest part should be naming the parent procedure.

FOR XML and STUFF you might not know

If you are like me it’s not enough for you dynamic SQL to just be executed at design time, you have to make time to design the code you are dynamically building. What am I talking about? The first part is about not executing dynamic SQL as part of a data tier application wherever possible. That is to say dynamic SQL is not going to be able to reuse cached plans the way static and parameterized SQL can. The second part is about including formatting in your dynamic SQL, at least enough to make it readable.

Using dynamic SQL to write your code can be a huge time saver when you have a lot of code to write. It can also be a huge time suck when you have debug, optimize, or just get it all into one string ((N)VARCHAR field).

Dynamic SQL can make debugging challenging for a number of reasons. For starters Management Studio will be unable to detect a syntax error until its executed. Even after it executes you won’t be able to double click on the error message and jump to the line in the query editor. You can however use GO to separate your code into batches which double clicking the error will jump to beginning of. It is also difficult to divine from the error message (ERROR_LINE()) which line is actually causing the error.

ThisIsNotStuffOrXML

Let’s not get sidetracked. If you want to noodle on those debugging consider this screenshot from SQL Server Management Studio 2014. If I were going to go off on a tangent it might be about the fact that the GO in the dynamic string does not behave the way it does outside of the string. That is to say the entire string fails in spite of the GO where only the batch containing the error does in the static SQL. And don’t get me started on how the execution plan calls a script a batch when determining cost relative to batch. Also worth noting is that the breakpoint encompasses both statements and the assignment where with static SQL a single statement is broken on.

When it comes to optimization of dynamic SQL I am going to simply link some reading material and comment. Jeremiah Peschka at Brent Ozar Unlimited has a great post about filtered indexes that kind of contradicts what I am saying about optimizing dynamic SQL. I say kind of because Jeremiah is using dynamic SQL to take advantage of a filtered index he created when a parameter might cause it to be skipped over. Andrew Kelly at SQL Server magazine mentions in this article that compiling is cpu intensive, he also goes on to show how sp_executesql allows us to cache plan. Derik Hammer explains and demonstrates the need to match the SQL text in order to get plan reuse. Turgay Sahtiyan echoes Derek, plus he is a Senior Premier Field Engineer (I was almost a PFE once). And last but certainly not least, Kimberly Tripp gave us some code to play with and a vigorous discussion with here post on multipurpose procedures. Of course, for all that good stuff we can use to avoid using dynamic SQL that is bad at execute time, our monitoring tools are probably only going to show us the generated SQL. As I tend to fall into the DBA roll, I make it a point to always attempt to optimize without changing the code that executes… at least when I don’t own it. But that is a topic for a whole other discussion perhaps on whether you can separate workloads into separate instances for ad hoc and prepared workloads.

The last obstacle is a matter of practical consideration. When we have multiple statements we might build and execute them separately. When we have nesting strings building dynamic SQL for us we might find it easier to have parts of our code in separate columns that copy as a row and paste as a single script. But what about when we are building dynamic strings that roll up some one to many values like columns and tables. Are we going to have to paste one long line and then format it? That would be another time suck to avoid. Enough teasing the title of the post said we were going to show an example of FOR XML and STUFF.
singleline Well, that example makes a single line, and I don’t feel like formatting myself. Sure, I could use a tool to auto-format but I don’t have it installed and I really want to automate the generation of these scripts and output them directly to script files. What if we try to stick in carriage return and a line feed?

multilineAscii

What the heck is this weird emoticon? Is it a winking fish or something? No, it’s proof enough that the output of the FOR XML PATH statement is typed as XML. That is to say its an XML escape character. Specifically it’s the XML break character for carriage return. What happened to the line feed? They don’t use it paired with carriage return in the UTF world so SQL Server was goodly enough to drop it. You can try using NCHAR instead of CHAR but the results are the same. What if we use the break character?

nopenotit

Darn, now the ampersand is being escaped. Ok, we’ll just have to replace it.

Ok

There you have it. FOR XML and STUFF you might not know.

#cherylsbirthday

This past week the hash tags were all chirping about how challenging this logic puzzle was. It seems that Singaporean middle school students are posed with a question that challenges their deductive reasoning as well as their ability to remain objective even when the posing of the question invites them to assume a subjective position within the problem. Technically, it is a contest question for the Singaporean and Asian School Math Olympiad.

You the solver are shown that one of two characters in the problem possesses enough information that they can deduce that the other does not in fact possess enough information to deduce the solution. This fact when announced provides the second character with enough information that he claims to now be able to deduce the answer. That proclamation of deducibility incites the first character to then declare that he too has deduced the answer.

The problem goes something like this:

Cheryl gives a list of dates that could be her birthday to Albert and Bernard. She tells Albert the month and Bernard the day. Albert declares that he does not know the date but that he does know that Bernard does not. Bernard declares that he didn’t know the date but now does. Albert then declare that he too knows. So what is Cheryl’s birthday? Since SQL is my hammer let’s see if this is a nail.

For starters, we have a short list of dates so let’s put them into a table variable.

Next, we can select those dates not in months with days that are unique to a month. We do this because if it were a day unique to the month it would be possible for Bernard to determine which. Albert has decided is impossible (we are assuming these characters are not wrong in their deductions).

After that we select those dates not sharing a day. This is because if it were one, it would be impossible to have deduced the answer knowing only the day and now that it is not in a month with a unique day. It was this second piece of information that enabled Bertrand to deduce the date already knowing the first.

What remains are three dates only one of which is unique to the month, which is a requirement to deduce it knowing only the month and that solution can be deduced knowing the day and that it is not in a month with a unique day. Gosh, Albert and Bernard are smart.

DECLARE @candidates TABLE (rowid INT IDENTITY(1,1), candiDATE DATE);

INSERT INTO @candidates VALUES

(‘May 15 2015’),(‘May 16 2015’),(‘May 19 2015’),

(‘June 17 2015’),(‘June 18 2015’),

(‘July 14 2015’),(‘July 16 2015’),

(‘August 14 2015’),(‘August 15 2015’),(‘August 17 2015’);

WITH COUNTS AS (

SELECT candiDATE, DATEPART(DAY, candiDATE) [Day], DATEPART(MONTH, candiDATE) [Month],

COUNT(DATEPART(DAY, candiDATE))

OVER (PARTITION BY DATEPART(DAY, candiDATE)) [IfItWereAMonthWithAUnqiueDayBernardCouldKnow]

FROM @candidates

) , NEXTCOUNT AS (SELECT candiDATE,

DATEPART(DAY, candiDATE) [Day],

DATEPART(MONTH, candiDATE) [Month],

COUNT(DATEPART(DAY, candiDATE))

OVER (PARTITION BY DATEPART(DAY, candiDATE)) [IfItWasNotAUniqueDayRemainingBernardWouldNotKnow]

FROM @candidates WHERE DATEPART(MONTH, candiDATE) NOT IN (SELECT [Month] FROM COUNTS WHERE [IfItWereAMonthWithAUnqiueDayBernardCouldKnow]=1)

)

SELECT candiDATE, COUNT(DATEPART(DAY, candiDATE))

OVER (PARTITION BY DATEPART(MONTH, candiDATE)) [IfBernardDidnotKnowItWouldnotBeUniqueTotheMonth]

FROM @candidates WHERE candiDATE IN

(SELECT candiDATE FROM NEXTCOUNT n WHERE n.[IfItWasNotAUniqueDayRemainingBernardWouldNotKnow]=1)

Which came first? The Select or the Into?

This morning while discussing things vaguely reminiscent of work with a colleague of mine the topic of Russell’s paradox was broached. Russell’s paradox relies the concept of a set of sets that do no contain themselves as members. It is a paradox because if that set does not contain itself it satisfies the requirement of membership and then must therefore contain itself, which would be a violation of the rules required for membership. Long story short humanity innovated and a guy named Ernst Zermelo contributed some rules that don’t allow such paradoxes. Zermelo actually discovered the paradox first but didn’t publish. That was confirmed a guy named David Hilbert who drew a really great looking space filling curve with a few simple steps repeated infinitely. The talk of Russell’s paradox prompted me to write the following SQL:

SELECT * INTO #temp FROM tempdb.sys.tables;

SELECT * FROM #temp;

I then posed the question: Will #temp appear as a member of the rows returned from #temp?

Capture

You can see an instance of #temp does in fact appear. The suffix appended to it uniquefies the name. This is what allows for multiple temp tables of the same name to be created. We can see this by running the above query in another tab (which opens another session).

Capture2The reason that we don’t see the table excluded from the results set is that in order for the temp table to hold the data from the sys.tables catalog view it must exist. And if it exists it will appear in the catalog. This helps to illustrate that SQL is not (always) a procedural language, or that INTO does not logically mean CREATE and INSERT.

SQL is usually typified of a declarative programming language. Being a declarative programming language means that we are not telling the database how to do what we want, only what we want. In this case we told it we want a table that matches the definition of sys.tables and we want it filled with the contents of sys.tables. Contrast this with the having asked SQL to take the contents of sys.tables and put it into a table that matches the definition of sys.tables. In the first case we have to first make the table which puts it into sys.tables, in the second we take the contents of sys.tables and then put it into our newly created table.

This might become clearer if we examine a query plan.

Capture3

We can clearly see from the plan that sys.tables is a view that draws from the inaccessible mssqlsystemresource system database. It might look like all of those branches contribute to building the table to insert into, but that is just where the data comes from. A simpler select into plan can be seen if we select from a table valued function (TVF).

Capture4

Here we can see that the Table Insert showplan operator occurs before the Select Into operator. That doesn’t actually create the table, though. The table is created by a Data Definition Language (DDL) operation. DDL operations don’t produce showplan operators. So there you have it, moderately conclusive proof that the table does exist before the read of data to insert occurs, and some vague syntactic hints at why.

.

Did you check the registry settings

Earlier tonight I was faced with a corrupted installation of SQL Server. This is perhaps a consequence of my having installed a default instance of SQL Server 2014 after uninstalling an instance of SQL Server 2005. The Engine and SQL Agent would not start. To correct this I had to set the configuration for the engine’s service account to start the engine in SQL Configuration Manager. I then had to add the SQL Agent service account to the login and grant it required permissions.

Self Driving Manual

Kevin was your normal everyday car buyer. He wasn’t an earlier adopter or gadget obsessed. He worked with technology but not in it. As he sat there in the driver’s seat of ALR-300 listening to beep after beep staring at the flashing light, he couldn’t help but wonder if this was some sort of malicious prank. At least he wasn’t late for an important meeting.

He had bought the ALR-300 so he could just sit and ride. Self driving cars were everywhere. There was another one zipping along, not stopped on the shoulder beeping. He wished he had signed up for the remote care and technology package so he could ask the car why it was beeping. Staring at the dashboard and that blinking light thinking “if only it would…”, he decided to call for help. After a few minutes of checking his email and posting to Facebook about how horrible his day was starting out, he found a tow truck service and called.

The longest 43 minutes dragged by and at least a dozen ALR-300’s zipped past. How many 400’s and 500’s he’d lost count. He was almost regretting buying a self driving car. At least with a manual he could drive it himself. Ah, but how many hours would he have spent staring at break lights instead of his tablet.

The tow truck drove him and his vehicle to the nearest dealership. He explained the problem to the receptionist. She made a face and walked off to get the service manager. He could see them both talking about him. The service manager looked concerned. He picked up the phone.

After a brief conversation he put the phone down and came out of his office. “Kevin, thank you for coming here today. I had to double check with the manufacturer but I am confident that you can drive home. You see the beeping red light that says drive indicates that you need to drive.”

My Kickstarter is Launched

I want to share with you my Kickstarter. The Crash Surveillance Drone is a UAV deployed from a vehicle by the deployment of an air bag. It can alert the authorities, get real time imagery, and signal other drivers of the accident. From multi-car pileups to people struck exiting their vehicles in an accident, I truly believe that my Kickstarter will lead to many saved lives. Please, share this with everyone you know.

 

https://www.kickstarter.com/projects/358359447/crash-surveillance-drone-csd

 

And Recursion(…)

In my last post I discussed the topic of modularity of code in the context of generating prime numbers. In this post I am going to discuss the topic of recursion in SQL server in the context of modular code subsequently in the context of generating prime numbers.

In the prior post we showed how modularizing would allow us more opportunity to improve and correct past work without redoing everything from scratch. We created a function for testing primality called intIsPrime. We call it intIsPrime because if we really want a lot of primes (as many as we can get from SQL Server without inventing a new data type or doing some column and constraint tricks or both) we would need a bigintIsPrime function for all of those primes bigger than the integer data type which ends with the 2,147,483,647 which is prime. 2,147,483,647 is in fact the 3rd double Mersenne prime.

We then went on to query for the primes in the next 100 or so numbers using a technique called a recursive common table expression (CTE). In the spirit of modularity and reaching beyond a mere hundred tests through modular design consider this function:

CREATE FUNCTION dbo.hundredMorePrimeTests(@n int)
RETURNS TABLE
AS RETURN
(

WITH recurseMore AS (
SELECT @n AS p, 1 AS n 
UNION ALL 
SELECT p + 1 , n + 1 FROM recurseMore
	WHERE n < 101
					)
SELECT p, n FROM recurseMore
WHERE dbo.intIsPrime(p) = 1
)

By putting our recursive CTE into a function. We can pass it the parameter we want to test the next hundred integers for primality. This allows us to call it in a stored procedure that bases the call on the last prime in our prime table. We can also have that stored procedure call its self (another type of recursive technique) which enables us to run the stored procedure once and have it execute 32 times (an error occurs is a stored procedure nests more than 32 levels).

CREATE PROCEDURE thirtyTwoHundredPrimalityTests 
	@c INT = 1
AS
BEGIN
	SET NOCOUNT ON;
DECLARE @n INT; 
SELECT @n = MAX(p) FROM dbo.primes;
INSERT INTO dbo.primes 
	SELECT p FROM dbo.hundredMorePrimeTests(@n);
SET @n = @c + 1;
IF @n < 32 
EXEC thirtyTwoHundredPrimalityTests @n ;    
	
END

There you have it, modular and recursive. If we wanted to we could call thirtyTwoHundredPrimalityTests from another stored procedure that could serve to limit the run based on how long we want to run, or how many integers we want to test. We can pass @c as any integer and the result would be that many 100s of integers tested.

Generating More Prime Numbers

In a previous post I wrote about this new (I think) set of prime numbers and how I generated them with T-SQL. Now, I want to discuss a more philosophical and practical issue. No, I am not going to split hairs about how the prime numbers were there already, like all the integers, and we are just testing them for primality. I also don’t want to discuss why 1 is or is not a prime number (I am not a mathematician). I am actually looking to discuss modularity and code reuse.

Take a look at this function:

DROP FUNCTION intIsPrime;
GO
CREATE FUNCTION intIsPrime ( @n INT)
RETURNS BIT
AS
BEGIN
DECLARE @return BIT; SET @return = 1;
IF @n IN (1,2,3) GOTO finished;
IF @n % 2 = 0 
OR
   @n % 3 = 0
OR 
   FLOOR(SQRT(@n)) = SQRT(@n) 
	--I can't be the first person to note that an integer squareroot means non-prime.
SET @return = 0; GOTO finished;

DECLARE @k INT; SET @k = 5; --By starting at five we are starting at 6k-1
	WHILE @k <= FLOOR(SQRT(@n))
	BEGIN
		IF @n % @k = 0
		OR @n % (@k + 2) = 0 --which makes this 6k+1 
	SET @return = 0; GOTO finished;
	SET @k = @k + 6 ;
	END

finished:
RETURN @return
END

By encapsulating the prime test into a function we separate the operation of testing primality from that of generating integers, as well as from inserting into the table. Why is this advantageous? Consider a scenario where we have a table that we filled with prime numbers using an algorithm that we believed was the most efficient method for testing primality. Later we realized that this super efficient algorithm used by “famous” mathematicians actually only determined that number were likely prime. Uh, oh. Chances are that we were using some sort of special built prime detecting super-computer and didn’t generate nearly enough numbers to have a non-prime that was merely likely. That is still a question of chance rather than certainty.

So now if we were to modify the script from our previous post we would have to get rid of all of numbers and generate new ones. If we had implemented the algorithm for testing primality as above (a better algorithm than before which serves to highlight the advantages of modularity) as a function we can now use the improved algorithm to test those numbers already in the table. We can also use the more efficient approach of selecting the numbers where the prime test fails rather than iterating over all of them. If we wanted to put in new primes we can select from a recursive CTE those integers that pass the primality test rather than incrementing through each number.

WITH recurseMore AS (
SELECT max(p) AS p, 1 AS n FROM primes
UNION ALL 
SELECT p + 1, n + 1 FROM recurseMore
	WHERE n < 101
					)
SELECT p FROM recurseMore
WHERE dbo.intIsPrime(p) = 1

Using the recursive CTE (which is more efficient, but that’s a topic for a different post) further serves to highlight the value of modularity. Note that we are restricting the recursive portion of the CTE to n < 101. This is because a CTE can only recurse 100 levels (+1 for the anchor row). This limits us to selecting primes that are within 100 of our max prime. We can work around this by encapsulating the CTE in a stored procedure or table valued function that accepts the max prime as a parameter. Technically it would accept an integer and generate a list of primes less than or equal to that integer + 100. By calling this in a wrapper we can test the return of that function inserting new primes, and increasing the input parameter. If we get no primes for a group of 100 (again I am not a mathematician so don’t make fun if its impossible to go a hundred integers without a prime), we can just pass the next number above it and carry on.