If you ever drop columns, you probably already know that in order to truly reclaim the space, you need to rebuild the clustered index. Back in 2010, Michael Swart (@mjswart) treated this topic, focusing on the before and after wasted space aspects.
Unfortunately, wasted space is not the worst thing that can happen if you drop a column and don’t immediately rebuild the clustered index (or heap). A defect has been discovered where subsequent updates can actually delete data in a completely different column, with no warning or error message – you might also call this undetectable data loss or even – gasp – corruption.
I don’t have a repro script handy (and it would probably be irresponsible of me to provide one anyway). There are some further details in KB #3120595, but my take on what happens is basically that SQL Server simply points at the wrong part of the page, likely because offsets are wrong due to the old column slot that is still “kind of” there. This is reminiscent of a different issue involving dropped columns, KB #2504090, though that symptom is isolated to partitioned tables.
On April 2nd, the KB article added a script to check if you have any tables that are potentially vulnerable due to dropped columns. I’ve re-written it to conform to my conventions (most recently to not use a filter in the where clause for an outer joined table, which turns that into an inner join), and to generate ALTER
commands that account for both clustered indexes and heaps:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT N'ALTER ' + CASE p.index_id
WHEN 0 THEN N'TABLE ' ELSE N'INDEX ALL ON ' END
+ QUOTENAME(s.name) + N'.' + QUOTENAME(o.name)
+ N' REBUILD WITH (MAXDOP = 1'
+ N',ONLINE = ON' -- depending on edition
+ N'); /* rows affected: ' + CONVERT(VARCHAR(11), SUM(p.[rows])) + N' */'
FROM sys.objects AS o
INNER JOIN sys.schemas AS s
ON o.[schema_id] = s.[schema_id]
INNER JOIN sys.partitions AS p
ON o.[object_id] = p.[object_id]
WHERE p.index_id IN (0,1)
AND o.is_ms_shipped = 0
AND EXISTS
(
SELECT 1
FROM sys.system_internals_partition_columns AS pc
WHERE pc.is_dropped = 1
AND pc.[partition_id] = p.[partition_id]
)
AND EXISTS
(
SELECT 1
FROM sys.columns
WHERE [object_id] = o.[object_id]
AND system_type_id NOT IN (48,52,56,61,62,104,127,173)
)
GROUP BY p.index_id, s.name, o.name
ORDER BY SUM(p.[rows]) DESC;
On April 25th, Microsoft’s Pedro Lopes published a blog post with further details about repro scenarios, including the fact that they have only seen this in customers who have upgraded to 2012 or 2014 from 2005 (whether that be an in-place upgrade, backup/restore, or detach/attach).
Fixes have been published for SQL Server 2012 (SP2 and SP3) and for SQL Server 2014 (RTM and SP1). Until you’re patched – and maybe even afterward – you should plan to rebuild tables after dropping columns as a matter of course and, most importantly, as part of the same maintenance operation to minimize the risk of corruption.
Major Version | Service Pack Branch | Cumulative Update with Fix | Build # |
---|---|---|---|
SQL Server 2012 | Service Pack 2 | Cumulative Update #10 | 11.0.5644 |
Service Pack 3 | Cumulative Update #2 | 11.0.6523 | |
SQL Server 2014 | RTM | Cumulative Update #13 | 12.0.2568 |
Service Pack 1 | Cumulative Update #6 | 12.0.4449 |
If you’re on 2012 or 2014 and not on these supported branches, here’s one more kick to motivate you to get on board the current service pack and most recent cumulative update train. In any case, regardless of version, drop columns with care, especially before you get patched.
Still no word if 2008 and 2008 R2 are affected, but even if they are, it is unlikely they will get a fix.
Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com.
Aaron’s blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.