Recently I’ve been doing a talk about “tempdb parasites” at SQL Saturdays. The presentation has a bit about table variables, and how they’re stored in tempdb.
The discussion at that point usually turns to how they can’t be indexed and that they don’t have statistics. At the most recent event (SQL Saturday Kansas City), an attendee asked if using a clustered key on a table variable allowed statistics to be generated and used for building an execution plan.
To this, I actually had no answer, mainly because I use them so infrequently that I didn’t even know you could even define a clustered key. So, all I could do was admit that I didn’t know enough about the subject to comment on it, and that I would look into it.
Well, I did look into it, and I discovered some interesting things.
Clustered Keys on Table Variables
First off, you can indeed have a clustered key on a table variable. The syntax to define it is actually pretty simple.
To test it out I just did something like this:
DECLARE @t TABLE
(
ID INT IDENTITY(1, 1) PRIMARY KEY,
x SYSNAME,
y INT
);
INSERT INTO @t (x, y) SELECT TOP (10000) name, [object_id] FROM sys.all_columns;
SELECT x FROM @t WHERE x = 'id';
This runs fine against my 2012 instance, and in the query plan I can see the clustered index being used (albeit for a scan):
SQL Server 2012 plan, using the clustered index
So, thanks to the question that came up in the talk, I’ve now learned something new! However, we still haven’t answered the question, which was, will statistics be generated when the table variable has a clustered key?
For that, I turned to Books Online. In the documentation on the generic “table” type here http://msdn.microsoft.com/en-us/library/ms175010.aspx, under the section on “Limitations and Restrictions,” we can find the following (with a couple of minor cosmetic corrections):
Table variables do not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
The key phrase being the first one stating that table variables just do not have distribution statistics. This goes on to point out another area that we discuss during the talk, which is that the optimizer makes an assumption that the table has no rows. Now, here is another point of difference. For a while now, I’ve been hearing from several sources that the optimizer assumes that the table variable has 1 row. Which is correct? 1 row or 0 rows? So, I started digging into that.
The answer is that both are right, in a way. In this (awesome) stack exchange answer, under the “Cardinality” section, we can see that the plan will show an estimate of 1 row, but SQL Server is really assuming 0 rows. Further, Paul White (b|t) explains why that is the case here.
So now, there is the answer. No, there are no statistics, and yes, the optimizer will estimate 1 row.
Non-clustered indexes
There is yet another thing I’ve just discovered on this journey. I’ve been telling everyone that you can’t create non-clustered indexes on table variables, and apparently, in SQL Server 2014, that isn’t true anymore.
See this blog post by Klaus Aschenbrenner (b|t), in which we discover that you can indeed create non-clustered indexes on table variables in SQL Server 2014. As BOL mentions though, you still can’t create them explicitly after the fact, which makes sense. To test this out, I used a derivative query of the one Klaus used.
And in my plan, just as in his, you can see the seek operation on the index:
SQL Server 2014 plan, using the non-clustered index
Finally, just to prove I can’t do it, I try to declare and populate the table, then add the index afterward. With the two batches below, the first gives a syntax error, and the second runs just fine:
DECLARE @t TABLE
(
ID INT IDENTITY(1, 1) PRIMARY KEY,
x SYSNAME,
y INT
);
INSERT INTO @t (x, y) SELECT TOP (1000000) name, [object_id] FROM sys.all_columns;
CREATE NONCLUSTERED INDEX ix_x ON @t(x);
SELECT x FROM @t WHERE x = 'cid';
GO
CREATE TABLE #t
(
ID INT IDENTITY(1, 1) PRIMARY KEY,
x SYSNAME,
y INT
);
INSERT INTO #t (x, y) SELECT TOP (1000000) name, [object_id] FROM sys.all_columns;
CREATE NONCLUSTERED INDEX ix_x ON #t(x);
SELECT x FROM #t WHERE x = 'cid';
DROP TABLE #t;
Conclusion
So, I think I’m still not convinced to use table variables very often, but I might use them more than I have in the past, and I’ve learned several useful things just by traveling the path I was sent down by trying to answer a question from a SQL Saturday attendee.
So, the moral here I guess is two-fold. If you have a question, ask it; and, if you get asked a question that you don’t know, make sure to follow up on it. There’s plenty out there for all of us to learn, and as you can see from this post, things change all the time.
Until next time,
-JRH
Jason has worked in technology for over 20 years. He joined SentryOne in 2006 having held positions in network administration, database administration, and software engineering. During his tenure at SentryOne, Jason has served as senior software developer and founded both Client Services and Product Management. His diverse background with relevant technologies made him the perfect choice to build out both of these functions.
As SentryOne experienced explosive growth, Jason returned to lead SentryOne Client Services, where he ensures that SentryOne customers receive the best possible end to end experience in the ever-changing world of database performance and productivity.