#IoE The Internet of Everything (Things) #IoT

Posted by thomaswmarshall

If you didn’t know it tech is rampant and wonton in its generation of buzzwords for all sorts of things, or more specifically for sexing up all sorts of things that are more than the sum of their parts. The singularity, Web 2.0, e-this, i-that, social, mobile, the cloud, big data, and the Internet of Everything (Things) to name a few. Sometimes these things overlap. Sometimes they overlap in a so many ways that a Venn diagram starts to need an expert algebraic topologies to explain. Lately, the buzz has been around the cloud, big data, and the internet of things. As a technology professional and in particular a database professional it is important to me to sort through the really cool things.

One of the things we now have an is internet of the light bulb. This is real unlike the hyper text coffee pot protocol which is a 16 year April fool’s joke the IETF created, or the Linux toaster which is a malware vector. Phillips Hue is a light bulb (system) that represents the culmination of incremental improvements to lighting bringing together a number of preexisting lighting features and adding them to an API. Hue combines, timers, dimmers, color, and circuit switching to provide a system that allows us to further take for granted one of the greatest technical achievements of the last millennium. That of course is the internet of things, at least the command side of it. I wonder if people have it in them to take their beverages according to a queue, or perhaps a locational aware protocol that starts my coffee when I approach the break room during a certain period of time.

Other things there are internets of will have sensors. These will write data somewhere. Lately, it looks like some Hadoop file system will be the standard. I wish I were more abreast of the internals the way a storage engineer would be so I could form a better opinion of why HFL would be a better choice than WAFL for writing streams. I suspect it has to do primarily with Map Reduce (at least public domain Map Reduce) only being implemented on Hadoop and MongoDB (and …). Of course if we need to respond to events in real time we need to put our event triggers upstream from storage.

SQL Server Files, Drives, and “Performance”

Posted by thomaswmarshall

Last night after our monthly PSSUG meeting our presenter Jason Brimhall found himself waiting for a cab at Microsoft’s Malvern PA Office. Having waited for a taxi in Philly’s suburbs before I felt bad for his wait that might not end so I took it on myself to offer him a ride to the airport Marriot. On the drive I took it upon myself to ask his opinion on different strategies for allocating database files in SQL Server. This is a topic I find myself drawn to and I welcome the opportunities to hear other people’s (especially a Microsoft Certified Master’s) opinion on the subject.

I tend to find myself drawn to a simple approach that increases the number of files proportional to the size of the database. I admit that this is a matter of convenience and portability that has its limits. While Microsoft publishes some file numbers as the maximum (4,294,967,295) and the number above which issues may occur (50,000), a much lower number of files will tend to practical depending on needs. In one case I have encountered I found that at about 200GB I struggled to get reasonable copy and move speeds even with ESEUTIL. This was of course a single site scenario with fiber channel SAN as target and destination. Other scenarios require much smaller file sizes as well as other tools for copying (XCOPY, ROBOCOPY). Copies over a WAN (should be over a VPN\VLAN, but that is a different discussion Sebastian Meine could comment on as a segway to his presentation on new security roles in SQL Server)… copies over a WAN will traverse network segments of varying speeds, few if any of which will ever approach the throughput of fiber SAN. In these cases it is a matter of balancing the need for a manageable number of files and folders and keeping files small enough to copy. But for each of these cases it is important to note that the performance is being measured on the dimensions of speed of database file copy and file manageability (as an intersection of scripting for file and folder creation and the flexibility of the physical data model to make use of files), these are not frequently dimension of performance that our users notice.

Jason pointed out that from the point of view of user any significant query slowness will be perceived as an outage, and was a point added to his case against the separation of data and indexes into separate files specifically relevant to cases of piecemeal file based recovery. Our discussion started with my asking him his opinion on the comparative merits of creating separate database files for each processor (core) vs. using IO affinity masks to delegating IO related processing to a specific processor. I feel it important to note that some NUMA systems have separate IO channels for each NUMA node in which case IO affinity would deprive those member processors those channels (I think). Jason was supportive of allocating files per processor in the more general case. He mentioned that it wasn’t necessary (or always possible) to have separate disks for these files to realize performance.

What are you thoughts?

Are you familiar with the proportional fill algorithm?

What do you think about the impact of using solid state local disk in tempdb in 2012 FCI? or tiered storage?

Proportional fill reading:

http://sqlserver-performance-tuning.net/?p=2552

http://www.patrickkeisler.com/2013/03/t-sql-tuesday-40-proportional-fill.html

http://technet.microsoft.com/en-us/library/ms187087(v=SQL.105).aspx

Does that really need an app?

Posted by thomaswmarshall

What feature does your app provide that cannot be provided via a mobile web site?

Kiwi Puts Its All-Purpose Wearable Up For Pre-Order, Aims To Be Everything To Everyone

Posted by thomaswmarshall

Now your talking about wearables. Hype all you want about smart watches that only work with certain phones, but things like this (and perhaps cufflinks), and Martian Commander watches are what are going to make the difference. The new Mac Pro may be a revolution in design, that transforms a desktop computer into an objet d’art, but I doubt I could find a matching pair of shoes if someone called it a hat.

SQL SERVER – Copy Statistics from One Server to Another Server

Posted by thomaswmarshall

I was watching SANMAN’s most recent video DEVOPS: Mission Impossible and it reminded me of Pinal Dave’s article here. THis is a great way to avoid moving data and diminish the weight of arguments for allowing development access to production databases and data.

New Year’s Resolutions

Posted by thomaswmarshall

Like so many people I indulge in the tradition of casting off my inhibitions in preparation for casting off my bad habits as far into the New Year as can be predicted. At the very least I am going to attempt to take on or renew some good habits.

I am going to blog.
I am going to budget.
I am going to eat right.
I am going to exercise.
I am going to find ways to shoehorn SQL or NoSQL into each of those things.

It is to the effect of accomplishing the first, second, and fifth of these I am writing this post.

I bought this book from this guy who made a puzzle I was busy to solve, and now I think I know everything

Posted by thomaswmarshall

In my last post I discussed one of my favorite new features in SQL Server OVER the last few versions. I also capitalized an entire word as if I were shouting it at you. REST assured that there is more method() than madness in this. In this CASE (stop please) I was alluding to the introduction of window functions and the OVER clause.

DATEADD(HOUR,DATEDIFF(HOUR,GETUTCDATE(),GETDATE()),GETDATE())

Posted by thomaswmarshall

There were a number of things brought into SQL Server OVER the past few versions that tickle me in such a way as to beg the question Why haven’t we been doing that all along? One of these I only gave a brief glance when I first saw it, much to my detriment. Datetimeoffset is a positively marvelous datatype. Not only does it allow us to store date and time values together with a precision of as little as 100 nano-seconds, but it can also be used to determine if two datetimeoffset values match across time zones with zero coded date math.

Unfortunately, only the latest version of SQL Server contains the DATETIMEOFFSETFROMPARTS. I say this is unfortunate because I am writing this from a SQL 2008 R2 machine, and the example that follows could be much more demonstrative the time zone sensitivity of the datetimeoffset data type if I hadn’t commented that code out. Nevertheless, let’s look at some comparison of datetime types.

If we look at the output we can see the differences in the precision of the different datatypes. We can also see the time zone component to the datetimeoffset type. The standard date time data type cuts its digits off at the milliseconds’ place. This can be somewhat misleading in appearance. We’ll look at that later. But first let’s look at how fast these rows being written. We can also see from this result set that even at 100 nanoseconds precision we really can’t count on more precise time to uniqueify our rows. Perhaps, in a later post we’ll discuss TIMESTAMPS.

If we look closer at these values we’ll notice something. The right most digit in each and every value is limited to just a few numbers. In the case of the datetime values, I have come to understand that we can actually only capture 1/300th of a second as opposed to 1/1000th. As for the more precise data types it is unfortunate that in addition to leaving my SQL 2012 instance at work, I left my Cesium clock at the gym. That of course means, that I lack the tools to accurately determine how many nano-seconds precision we are realizing with our ability to store 100-nanoseconds.

Its getting close to 2013-12-26 20:30:10.6208135 -05:00 and I am trying to keep these posts short and too the point, in that meandering way that whets your whistle for more (I hope). Speaking of whetting your whistle, look at this version of the above with a TIMESTAMP, which is not to be confused with a TIMESTAMP.

That’s not a recursion. This is a recursion.

Posted by thomaswmarshall

I may only be an expert of sorts in computer information systems as opposed to a computer scientist, and one specializing in algorithms at that, but I’d like to think I know the difference between iterating on row set that diminishes with each iteration, and set constrained self calling operation. I am also pretty sure that iterating on a row set that diminishes with each iteration is parent form of a for each type iteration. I do know that SQL Server 2012 introduces for us a deployment model that makes the business of building recursive SSIS packages wholly more tenable.

The project based deployment model allows us to reference our packages in development. This is a welcome change as it allows us to make better use of a more modular approach to package development. One of the ways that this is manifested is in the Execute package task. Instead of creating a connection manager of either a SQL Server or File System type, you can use an internal reference. In SQL Server 2012 an execute package task can be used to call a package that is either internally referenced in the same project as the parent package, or externally referenced. An externally referenced package is essentially the only reference type in SQL 2008 R2. This requires the use of a connection manager which will need to change if we are developing anywhere other than our production environment, which if you do, you really shouldn’t.

Beyond being able to call packages stored in the project we are working in the internally referenced package call allows us to do something that is simply amazing. We can call the package we are building. That means that we can have a child package that can modify the parent package’s variable values as they are the same variables. In the example I have created a package that reads a list of packages and executes them asynchronously. I included annotations to help keep this relatively simple package conceptually simple.

The package begins with an Execute SQL Task. I use an expression(which we can see on the design space, new to SSDT) to set the disable parameter of this task. This is simply a test of the initialization parameter. If we are making the initial call to this package the parameter defaults to 1 which tells the populate Args SQL Task whether or not to run. For fun I pass the actual SQL to run to it through another parameter. Actually, it is quite a bit more than just fun but we will have to discuss environments, parameters and programmatic execution in package stores in later posts.

The next task is a scripting task in which I manipulate the array (I need to be mum on the specifics of this for various legal reasons I err on the side of caution for). I achieve two primary things by this task. The first and more obvious item accomplished is popping first argument off of the Args stack. The second is checking that there is at least one row left in the set (there are a lot of ways to skin this cat). If there isn’t we change our initialization value to -1. Otherwise we set it to continue.

Next we use a sequence container to assure that the previous steps have executed before we execute the remaining two in parallel. If we had an array length of zero for our Args set the -1 initialization parameter would disable this container and prevent the execution of its children members. Once it knows it’s disabled, the package completes its execution.

The next task is an execute package task. This simply executes the package we are in. We have access to and set the same variables with our actions. Since we have enforced precedence outside of the container and set our next Args and already disabled our Args population SQL Task, this task will inevitably lap our work task, allowing us to start all of our work without waiting for any of it to complete. If any of our work items are such that they require precedence constraints between them, we can engineer some sort of messaging service for them, or build parent packages that enforce the precedence between those packages and call them instead of their children.

Our final task is the one that performs the actual work. In this case I chose an execute process task that calls DTEXEC. Our Args set member retrieved in our script task is one of the parameters for the DTEXEC calls to packages that may have a different deployment model or project than our recursion. We could of just as easily used this approaches to spawn workers to operate on separate folders or servers.

Because some strings are code and some code contains strings that are code.

Posted by thomaswmarshall

Not to dicker on about grammar and in particular grammatical flourishes in blog post ostensibly about syntactic inversions, but wouldn’t it be great if we allowed the title Because some strings are code and some codes contain strings that are code. I certainly do not intend to post about semantics proper, not to be confused with semantic web ontologies which is likewise a different topic and in many respects a different expert credential than I have been alluding to this past year.

Thomas [W] Marshall

Database and Business Intelligence

Author: thomaswmarshall

#IoE The Internet of Everything (Things) #IoT

SQL Server Files, Drives, and “Performance”

Does that really need an app?

Kiwi Puts Its All-Purpose Wearable Up For Pre-Order, Aims To Be Everything To Everyone

SQL SERVER – Copy Statistics from One Server to Another Server

New Year’s Resolutions

I bought this book from this guy who made a puzzle I was busy to solve, and now I think I know everything

DATEADD(HOUR,DATEDIFF(HOUR,GETUTCDATE(),GETDATE()),GETDATE())

That’s not a recursion. This is a recursion.

Because some strings are code and some code contains strings that are code.