Wednesday, July 3, 2024
HomeDatabasesBad Habits: Being CarELesS about cAsE

Bad Habits: Being CarELesS about cAsE

[See an index of all bad habits / best practices posts]

As a frequent reviewer of queries, and a regular contributor to our answers site and dba.stackexchange.com, I come across a lot of code that seems to have been written, shall we say, quickly. There are a lot of transgressions of course, but one that I notice frequently is inconsistency in case. People will have a table like this:

CREATE TABLE dbo.MyTable
(
  PersonID INT,
  FirstName SYSNAME
);

 

And then a query like this:

SELECT personId, FIRSTName 
  FROM dbo.myTable
  ORDER BY firstname;

 

The code compiled and the query returned results, so party time, right?

This can cause problems.

For one, being inconsistent about case can lead to situations like this, where another user executes the exact same logical query, but uses different case:

SELECT PersonId, Firstname 
  FROM dbo.Mytable
  ORDER BY firstName;

 

There are a few subtle differences you might spot right away, and if you look at the plan cache, you’ll see that you actually stored two copies of the plan (since the query_hash is case- and whitespace-sensitive). I talked about this a bit more thoroughly and showed an example late last year:

More importantly, though: neither query uses identifiers that exactly match the base table or its columns. And while the query may work on your machine, it will break if it gets deployed to a case-sensitive database or to a server using a case-sensitive collation.

PROOF.

I set up two SQL Server instances, one with the default SQL_Latin1_General_CP1_CI_AS collation, and one with its case-sensitive cousin, SQL_Latin1_General_CP1_CS_AS. On each instance, I set up two databases, one using each collation. What I wanted to do was simulate what would happen if I wrote code – sloppily, on purpose – in the best case scenario (case insensitive collation on a case insensitive instance), then tried to deploy it and run it in the other environments.

So, on that initial CI/CI instance, I created the following objects – a table and a matching TVP, along with a stored procedure that intentionally uses the wrong case for the table and column names, and even uses the wrong case for its own input parameter:

CREATE TABLE dbo.Customers
(
  CustomerID INT PRIMARY KEY,
  FirstName NVARCHAR(32) NOT NULL, 
  LastName NVARCHAR(32) NOT NULL
);
GO
 
CREATE TYPE dbo.tvpCustomer AS TABLE
(
  CustomerID INT PRIMARY KEY,
  FirstName NVARCHAR(32) NOT NULL,
  LastName NVARCHAR(32)
);
GO
 
CREATE PROCEDURE dbo.AddCustomers
  @TVP dbo.tvpcustomer READONLY
AS
BEGIN
  SET NOCOUNT ON;
  INSERT dbo.customers(customerid,firstname,lastname)
    SELECT customerid,firstname,lastname FROM @tvp;
END
GO

 

First, I tried deploying the same objects to the case sensitive database on the case insensitive server. I can’t get very far, because as soon as I try to create the procedure, I get all kinds of problems:

Msg 2715, Level 16, State 3, Procedure AddCustomers
Column, parameter, or variable #1: Cannot find data type dbo.tvpcustomer.
Parameter or variable '@TVP' has an invalid data type.
Msg 1087, Level 16, State 1, Procedure AddCustomers
Must declare the table variable "@TVP".

I next tried the case insensitive database on the case sensitive server. There was only one error this time; interesting to note, though, that the error message uses the lower-case version of the table variable name (which comes from the procedure body), unlike above where it used the upper-case version (which comes from the parameter list):

Msg 1087, Level 15, State 2, Procedure AddCustomers
Must declare the table variable "@tvp".

 

 

The case sensitive database on the case sensitive server returned to the more verbose error messages above. So then I tried partially correcting the procedure in that environment, making the parameter casing match and the type name correct:

CREATE PROCEDURE dbo.AddCustomers
  @TVP dbo.tvpCustomer READONLY
--------------^ fixed this
AS
BEGIN
  SET NOCOUNT ON;
  INSERT dbo.customers(customerid,firstname,lastname)
    SELECT customerid,firstname,lastname FROM @TVP;
    ----------------------------- and this ----^^^
END
GO

 

I was expecting this to actually get created, perhaps with that warning about deferred name resolution, because it shouldn’t have been able to find dbo.customers in the system metadata. But no, it matched the table regardless of case, but noticed that the column names didn’t match:

Msg 207, Level 16, State 1, Procedure AddCustomers
Invalid column name 'customerid'.
Msg 207, Level 16, State 1, Procedure AddCustomers
Invalid column name 'firstname'.
Msg 207, Level 16, State 1, Procedure AddCustomers
Invalid column name 'lastname'.

  

So, this made me investigate another avenue: fixing the column names, but not the table name. Surely then the procedure wouldn’t get created?

CREATE PROCEDURE dbo.AddCustomers
  @TVP dbo.tvpCustomer READONLY
AS
BEGIN
  SET NOCOUNT ON;
  INSERT dbo.customers(CustomerID,FirstName,LastName)
  --- fixed these -----^-------^^-^----^----^---^
    SELECT CustomerID,FirstName,LastName FROM @TVP;
  -- and --^-------^^-^----^----^---^
END
GO

 

Lo and behold:

Command(s) completed successfully.

 

 

Okay, so what if we call it (with code that doesn’t get any of the case wrong, of course)?

DECLARE @tt dbo.tvpCustomer;
INSERT @tt VALUES(1, N'Bob', N'Frank');
EXEC dbo.AddCustomers @TVP = @tt;

 

The engine must look in a different place now, because this yields:

Msg 208, Level 16, State 1, Procedure AddCustomers
Invalid object name 'dbo.customers'.

  

This is an even bigger problem, of course, because the code was deployed successfully, and now the error might not happen until an end user tries to add (or become) a customer…

A LITTLE SIMPLER.

Here’s an even simpler set of code that I wanted to try on each database/instance combination:

-- wrong case for variable
DECLARE @foo INT; SELECT @FOO;
GO
CREATE TABLE dbo.foo(bar INT);
GO
-- wrong case for column name
SELECT BAR FROM dbo.foo;
GO
-- wrong case for object name
SELECT bar FROM dbo.FOO;
GO
-- wrong case for database name
USE Master;
GO

 

Here’s how they worked out:

Database / Instance variable column object database
CI / CI OK OK OK OK
CS / CI OK Invalid column Invalid object OK
CI / CS Must declare variable OK OK Database does not exist
CS / CS Must declare variable Invalid column Invalid object Database does not exist

The reasons for these patterns quickly become fairly evident: variables use instance collation, object and column names use the collation of sys.objects etc. (which use database collation), and database names use sys.databases (which uses instance collation). If we look at those in each case:

SELECT collation_name 
  FROM sys.all_columns 
  WHERE name = N'name'
  AND [object_id] = OBJECT_ID(N'sys.objects');
 
SELECT collation_name
  FROM master.sys.all_columns
  WHERE name = N'name'
  AND [object_id] = OBJECT_ID(N'sys.databases');

 

This is what they look like:

Database / Instance sys.all_objects sys.all_columns
CI / CI SQL_Latin1_General_CP1_CI_AS SQL_Latin1_General_CP1_CI_AS
CS / CI SQL_Latin1_General_CP1_CS_AS SQL_Latin1_General_CP1_CI_AS
CI / CS SQL_Latin1_General_CP1_CI_AS SQL_Latin1_General_CP1_CS_AS
CS / CS SQL_Latin1_General_CP1_CS_AS SQL_Latin1_General_CP1_CS_AS

There are some peculiarities, though. For example, with system objects, you can get the case of a column name wrong no problem, even in a case sensitive database on a case sensitive instance. This works no problem:

SELECT NAME FROM sys.objects;

 

However if you get the case wrong for the view itself instead of the column:

SELECT name FROM sys.Objects;

 

Still an error message:

Msg 208, Level 16, State 1
Invalid object name 'sys.Objects'.

 

So it looks like different rules are used to determine whether case sensitivity matters for system objects, depending on whether you are looking at the object itself or its columns. Even more reason for you to just strive to have code that uses correct, matching case from the start, in all cases.

SUMMARY.

Long story short, a lot of things can go wrong when you develop in a case insensitive collation but code might ultimately get deployed to a case sensitive database or instance (or both). In most cases the errors will be caught at deployment time, but in a few, the objects could be created successfully anyway, and problems would not be detected until testing/QA time (or, worse, even later). This is one of those things that is fairly easy to get right from the start, and a lot more cumbersome to fix later.

PREVENTION.

There are some code analysis tools that can check for things like this; when I pinged George Mastros (@gmmastros), he agreed that it was a good idea, and is going to try to put it in an upcoming release of SQLCop.

But you can prevent these problems far earlier than that, if you’re willing to do it. I have long been a proponent of developing – at least for code that does not cross databases or servers – on a case sensitive collation. This makes sure these errors are caught up front, and not six months from now when a database is migrated or consolidated. “But won’t that slow me down?” Initially, maybe, a little. But in all honesty, IntelliSense (or tools like Redgate SQL Prompt) should prevent you from ever having to type this stuff correctly on your own anyway.

[See an index of all bad habits / best practices posts]

Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com.

Aaron’s blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments