That’s not a recursion. This is a recursion.

kangourou boxeur
I may only be an expert of sorts in computer information systems as opposed to a computer scientist, and one specializing in algorithms at that, but I’d like to think I know the difference between iterating on row set that diminishes with each iteration, and set constrained self calling operation. I am also pretty sure that iterating on a row set that diminishes with each iteration is parent form of a for each type iteration. I do know that SQL Server 2012 introduces for us a deployment model that makes the business of building recursive SSIS packages wholly more tenable.

The project based deployment model allows us to reference our packages in development. This is a welcome change as it allows us to make better use of a more modular approach to package development. One of the ways that this is manifested is in the Execute package task. Instead of creating a connection manager of either a SQL Server or File System type, you can use an internal reference. In SQL Server 2012 an execute package task can be used to call a package that is either internally referenced in the same project as the parent package, or externally referenced. An externally referenced package is essentially the only reference type in SQL 2008 R2. This requires the use of a connection manager which will need to change if we are developing anywhere other than our production environment, which if you do, you really shouldn’t.

Beyond being able to call packages stored in the project we are working in the internally referenced package call allows us to do something that is simply amazing. We can call the package we are building. That means that we can have a child package that can modify the parent package’s variable values as they are the same variables. In the example I have created a package that reads a list of packages and executes them asynchronously. I included annotations to help keep this relatively simple package conceptually simple.

Package

The package begins with an Execute SQL Task. I use an expression(which we can see on the design space, new to SSDT) to set the disable parameter of this task. This is simply a test of the initialization parameter. If we are making the initial call to this package the parameter defaults to 1 which tells the populate Args SQL Task whether or not to run. For fun I pass the actual SQL to run to it through another parameter. Actually, it is quite a bit more than just fun but we will have to discuss environments, parameters and programmatic execution in package stores in later posts.

The next task is a scripting task in which I manipulate the array (I need to be mum on the specifics of this for various legal reasons I err on the side of caution for). I achieve two primary things by this task. The first and more obvious item accomplished is popping first argument off of the Args stack. The second is checking that there is at least one row left in the set (there are a lot of ways to skin this cat). If there isn’t we change our initialization value to -1. Otherwise we set it to continue.

Next we use a sequence container to assure that the previous steps have executed before we execute the remaining two in parallel. If we had an array length of zero for our Args set the -1 initialization parameter would disable this container and prevent the execution of its children members. Once it knows it’s disabled, the package completes its execution.

The next task is an execute package task. This simply executes the package we are in. We have access to and set the same variables with our actions. Since we have enforced precedence outside of the container and set our next Args and already disabled our Args population SQL Task, this task will inevitably lap our work task, allowing us to start all of our work without waiting for any of it to complete. If any of our work items are such that they require precedence constraints between them, we can engineer some sort of messaging service for them, or build parent packages that enforce the precedence between those packages and call them instead of their children.

Our final task is the one that performs the actual work. In this case I chose an execute process task that calls DTEXEC. Our Args set member retrieved in our script task is one of the parameters for the DTEXEC calls to packages that may have a different deployment model or project than our recursion. We could of just as easily used this approaches to spawn workers to operate on separate folders or servers.

Because some strings are code and some code contains strings that are code.

Not to dicker on about grammar and in particular grammatical flourishes in blog post ostensibly about syntactic inversions, but wouldn’t it be great if we allowed the title Because some strings are code and some codes contain strings that are code. I certainly do not intend to post about semantics proper, not to be confused with semantic web ontologies which is likewise a different topic and in many respects a different expert credential than I have been alluding to this past year.

Blogs I Follow, Found, Read, or Otherwise Link To

Blogs I Follow, Found, Read, or Otherwise Link To

Here I am making a blog post about adding blogs to the blogs I follow feature here on Word Press. With the pending New Year only a week or so away it dawns on me that I still don’t have the blog presence I resolved to in at least 3 of the last 5 new years. I have started and abandoned at least 3 blogs so far. I have tried to blog on databases in general, software development, systems administration, gardening, politics and various combinations thereof. I read as much as Blogging for Dummies as I could without buying a coffee refill at Barnes and Noble. I think I was reading at a venti level at that time. I am pretty sure it told me that one of the keys to getting good traffic to your site was by following blogs and linking to articles. It might also help to review a book.

How Marketers Can Avoid Big Data Blind Spots

I find it interesting that the short term sales gain doesn’t contribute to brand value. It would seem to me that barring a demographic that responds only to recent campaigns would the exposure lead to long term brand value. I’d imagine the proportion of short term gains to long term growth or leakage mitigation might be an important metric for the valuation of large datasets. The problem that arises from this, from the point of view of may data professional, is one of the valuation of large unstructured datasets. Big data tends to be unstructured and requires processing to extract patterns that can be used for analysis. This processing incurs costs that need to be justified.

How an Auction Can Identify Your Best Talent

As a database administrator I often find the swiftest route to the completion of development work often depends on the consumer of that work and the workflow through the departments. It seems that such an approach could be immensely beneficial as an approach to balancing workflow loads between departments with relatively interchangeable skillsets (at least the entry level). For example, software development, database administration, analytics, reporting, and systems administration will often have team members with skillsets that while more advanced in their specific discipline are sufficient for junior level work in other disciplines. Competing for project tasks with valuations that would presumably climb with the scarcity of the skills required to complete them would enable other departments the opportunity to add value providing lower level work that has the added benefit of strengthening the organization’s resilience as those portable skills grow sharper with use.

When You Criticize Someone, You Make It Harder for that Person to Change

Criticism can be perceived as an attempt to undermine the self esteem of the criticism. In turn attacks on self esteem can be seen as attempts to manipulate the criticized person to the effect of causing them to believe they are unworthy of the satisfaction of their goals. This is a seed of conflict that it is natural to resist. If we did not such manipulation would act contrary to our long term survival by forcing us into deprivation of necessary resources.

EXEC sp_helptext ‘sp_addlinkedsrvlogin’

In the course of developing the expertise I have made it a point to assert as the point several past posts I have encountered a number of philosophies regarding the comparative value of various areas of expertise capable of being applied in a manner such that they overlap competitively. I in this time made it a point to refrain from passing judgement prematurely favoring one methodology to another. It has been my experience that often enough the positions of opposing schools of thought on most any subject are such that they focus primarily on the strengths of their position neglecting concern for their weaknesses which oft as not are the strengths of the other. In short thesis and antithesis give rise to synthesis.

Having said that brings us to the specific example to which this particular generalization most immediately applies. Recently I was asked by a novice SSIS developer if there a way using only SSIS package tasks to display a result set such as to provide inline with the aggregated values the final aggregates. This is something that over time I have developed a reflex to provide an SQL solution to. For brevity’s sake I am glad the question was not one of a running total.

First let me show you the solution I provided in SSIS:

SSIS Left Joining Aggregates

In the ETL Solution we join together the original source and the aggregate run against it. In SSIS this requires an additional sorting step. This sorting step is perfomed implicitly in SQL by the database engine. Generally speaking sorting is one of the most if not the most expensive operation that can be performed on a set of data, whether it is implied or explicitly commanded.

2012-7-12 Left join for an inline total

In either case we are performing the sorting of sets to be joined and in the SQL statement we are explicitly sorting the output. Of course results speak for themselves.

2012-7-12 Left join output2012-7-12 SSIS Output

In the left image we see the output of the SQL Statement and in the right the output of the SSIS task. There are differences in the names these are primarily owing to a lack of rigor on my behalf regarding the naming. We can see from the outputs that the data values are identical for both approaches.

So why should we choose one approach over the other? What you may not have noticed earlier in this post is that the SSIS solution output shown is the consequence of my having added a data viewer to the data flow task’s merge join output. That is to say is that what you are seeing in “output” of the SSIS solution is the output of the extract and transformation steps in the extract, transform, load data flow and not the output of the ETL flow which would be a confirmation of the data being loaded while in debug mode or an entry in the sysjobhistory table should it be run as an SQL Agent job with logging enabled.

I have to apologize for the trickery though I think it appropriate to demonstrating the differences between the two technologies. A proper comparison would have had a SQL solution that looked like this:

2012-7-12 Left join for an inline total ETL

You’ll note that if our only purpose was to select the data, using SSIS would be an inappropriate choice. However, since we are moving the data it becomes a question of establishing benchmarks for comparison of performance. Having said depending on the compatibility level of the database and surface area configurations we cannot connect between  instances using the OPENDATASOURCE clause. That of course is just a cursory manner of saying, there exists numerous administrative issues related to the connecting data sources and the securitization as such that often make the business of ETL more complicated than questions of SQL statements and sort algorithms.

 

RAISERROR 2147483647 ‘Hello World!’

In my last post I made it a point to demonstrate that a ratio would better show my expertise than an absolute number. In that post I also alluded to such a simple concept as “book” being possibly vague in a larger sense than what is realized by the concretization of the term “book” in the form of ISBN. Of course those in the know know that an ISBN only uniquely identifies an edition of a title, a title being that part of the cover we are judging our books by. The cover in turn being the thing between which the pages of writing are presumed bound (or at least sorted). While my logic may at this point seem a bit meandering, please, bear with me.

An ISBN uniquely identifies a title and not in the sense that it uniquely identifies an instance of a title that may be on a particular cover. So it might stand to reason that more than one book may have the same title, book again in this case being less a synonym for title and more an intance of it. Now from the point of view of expertise this does little for our sumarization but it does broach the topic of said expertise and begs the question as to why we might have more than one copy of Itzik Ben-Gan’s Microsoft SQL Server 2012 High Performance T-SQL Using Window Functions (ISBN 978-0-7356-5836-3),  which we don’t, but we might and we might want to know which if any of the other books contributing to this library of expertise we might have more than one copy of, as we might want to sell them or trade them or what have you.

The phrase (we’ll call these queries going forward) that might tell us how many titles we have more than one of might look something like:

2012-7-12 Count books