Bad Habits: Being CarELesS about cAsE

[See an index of all bad habits / best practices posts]

As a frequent reviewer of queries, and a regular contributor to our answers site and dba.stackexchange.com, I come across a lot of code that seems to have been written, shall we say, quickly. There are a lot of transgressions of course, but one that I notice frequently is inconsistency in case. People will have a table like this:

CREATE TABLE dbo.MyTable
(
  PersonID INT,
  FirstName SYSNAME
);

And then a query like this:

SELECT personId, FIRSTName 
  FROM dbo.myTable
  ORDER BY firstname;

The code compiled and the query returned results, so party time, right?

This can cause problems.

For one, being inconsistent about case can lead to situations like this, where another user executes the exact same logical query, but uses different case:

SELECT PersonId, Firstname 
  FROM dbo.Mytable
  ORDER BY firstName;

There are a few subtle differences you might spot right away, and if you look at the plan cache, you’ll see that you actually stored two copies of the plan (since the query_hash is case- and whitespace-sensitive). I talked about this a bit more thoroughly and showed an example late last year:

Multiple Plans for an “Identical” Query

More importantly, though: neither query uses identifiers that exactly match the base table or its columns. And while the query may work on your machine, it will break if it gets deployed to a case-sensitive database or to a server using a case-sensitive collation.

PROOF.

I set up two SQL Server instances, one with the default SQL_Latin1_General_CP1_CI_AS collation, and one with its case-sensitive cousin, SQL_Latin1_General_CP1_CS_AS. On each instance, I set up two databases, one using each collation. What I wanted to do was simulate what would happen if I wrote code – sloppily, on purpose – in the best case scenario (case insensitive collation on a case insensitive instance), then tried to deploy it and run it in the other environments.

So, on that initial CI/CI instance, I created the following objects – a table and a matching TVP, along with a stored procedure that intentionally uses the wrong case for the table and column names, and even uses the wrong case for its own input parameter:

CREATE TABLE dbo.Customers
(
  CustomerID INT PRIMARY KEY,
  FirstName NVARCHAR(32) NOT NULL, 
  LastName NVARCHAR(32) NOT NULL
);
GO
 
CREATE TYPE dbo.tvpCustomer AS TABLE
(
  CustomerID INT PRIMARY KEY,
  FirstName NVARCHAR(32) NOT NULL,
  LastName NVARCHAR(32)
);
GO
 
CREATE PROCEDURE dbo.AddCustomers
  @TVP dbo.tvpcustomer READONLY
AS
BEGIN
  SET NOCOUNT ON;
  INSERT dbo.customers(customerid,firstname,lastname)
    SELECT customerid,firstname,lastname FROM @tvp;
END
GO

First, I tried deploying the same objects to the case sensitive database on the case insensitive server. I can’t get very far, because as soon as I try to create the procedure, I get all kinds of problems:

Msg 2715, Level 16, State 3, Procedure AddCustomers
Column, parameter, or variable #1: Cannot find data type dbo.tvpcustomer.
Parameter or variable '@TVP' has an invalid data type.
Msg 1087, Level 16, State 1, Procedure AddCustomers
Must declare the table variable "@TVP".

I next tried the case insensitive database on the case sensitive server. There was only one error this time; interesting to note, though, that the error message uses the lower-case version of the table variable name (which comes from the procedure body), unlike above where it used the upper-case version (which comes from the parameter list):

Msg 1087, Level 15, State 2, Procedure AddCustomers
Must declare the table variable "@tvp".

The case sensitive database on the case sensitive server returned to the more verbose error messages above. So then I tried partially correcting the procedure in that environment, making the parameter casing match and the type name correct:

CREATE PROCEDURE dbo.AddCustomers
  @TVP dbo.tvpCustomer READONLY
--------------^ fixed this
AS
BEGIN
  SET NOCOUNT ON;
  INSERT dbo.customers(customerid,firstname,lastname)
    SELECT customerid,firstname,lastname FROM @TVP;
    ----------------------------- and this ----^^^
END
GO

I was expecting this to actually get created, perhaps with that warning about deferred name resolution, because it shouldn’t have been able to find dbo.customers in the system metadata. But no, it matched the table regardless of case, but noticed that the column names didn’t match:

Msg 207, Level 16, State 1, Procedure AddCustomers
Invalid column name 'customerid'.
Msg 207, Level 16, State 1, Procedure AddCustomers
Invalid column name 'firstname'.
Msg 207, Level 16, State 1, Procedure AddCustomers
Invalid column name 'lastname'.

So, this made me investigate another avenue: fixing the column names, but not the table name. Surely then the procedure wouldn’t get created?

CREATE PROCEDURE dbo.AddCustomers
  @TVP dbo.tvpCustomer READONLY
AS
BEGIN
  SET NOCOUNT ON;
  INSERT dbo.customers(CustomerID,FirstName,LastName)
  --- fixed these -----^-------^^-^----^----^---^
    SELECT CustomerID,FirstName,LastName FROM @TVP;
  -- and --^-------^^-^----^----^---^
END
GO

Lo and behold:

Command(s) completed successfully.

Okay, so what if we call it (with code that doesn’t get any of the case wrong, of course)?

DECLARE @tt dbo.tvpCustomer;
INSERT @tt VALUES(1, N'Bob', N'Frank');
EXEC dbo.AddCustomers @TVP = @tt;

The engine must look in a different place now, because this yields:

Msg 208, Level 16, State 1, Procedure AddCustomers
Invalid object name 'dbo.customers'.

This is an even bigger problem, of course, because the code was deployed successfully, and now the error might not happen until an end user tries to add (or become) a customer…

A LITTLE SIMPLER.

Here’s an even simpler set of code that I wanted to try on each database/instance combination:

-- wrong case for variable
DECLARE @foo INT; SELECT @FOO;
GO
CREATE TABLE dbo.foo(bar INT);
GO
-- wrong case for column name
SELECT BAR FROM dbo.foo;
GO
-- wrong case for object name
SELECT bar FROM dbo.FOO;
GO
-- wrong case for database name
USE Master;
GO

Here’s how they worked out:

Database / Instance	variable	column	object	database
CI / CI	OK	OK	OK	OK
CS / CI	OK	Invalid column	Invalid object	OK
CI / CS	Must declare variable	OK	OK	Database does not exist
CS / CS	Must declare variable	Invalid column	Invalid object	Database does not exist

The reasons for these patterns quickly become fairly evident: variables use instance collation, object and column names use the collation of sys.objects etc. (which use database collation), and database names use sys.databases (which uses instance collation). If we look at those in each case:

SELECT collation_name 
  FROM sys.all_columns 
  WHERE name = N'name'
  AND [object_id] = OBJECT_ID(N'sys.objects');
 
SELECT collation_name
  FROM master.sys.all_columns
  WHERE name = N'name'
  AND [object_id] = OBJECT_ID(N'sys.databases');

This is what they look like:

Database / Instance	sys.all_objects	sys.all_columns
CI / CI	SQL_Latin1_General_CP1_CI_AS	SQL_Latin1_General_CP1_CI_AS
CS / CI	SQL_Latin1_General_CP1_CS_AS	SQL_Latin1_General_CP1_CI_AS
CI / CS	SQL_Latin1_General_CP1_CI_AS	SQL_Latin1_General_CP1_CS_AS
CS / CS	SQL_Latin1_General_CP1_CS_AS	SQL_Latin1_General_CP1_CS_AS

There are some peculiarities, though. For example, with system objects, you can get the case of a column name wrong no problem, even in a case sensitive database on a case sensitive instance. This works no problem:

SELECT NAME FROM sys.objects;

However if you get the case wrong for the view itself instead of the column:

SELECT name FROM sys.Objects;

Still an error message:

Msg 208, Level 16, State 1
Invalid object name 'sys.Objects'.

So it looks like different rules are used to determine whether case sensitivity matters for system objects, depending on whether you are looking at the object itself or its columns. Even more reason for you to just strive to have code that uses correct, matching case from the start, in all cases.

SUMMARY.

Long story short, a lot of things can go wrong when you develop in a case insensitive collation but code might ultimately get deployed to a case sensitive database or instance (or both). In most cases the errors will be caught at deployment time, but in a few, the objects could be created successfully anyway, and problems would not be detected until testing/QA time (or, worse, even later). This is one of those things that is fairly easy to get right from the start, and a lot more cumbersome to fix later.

PREVENTION.

There are some code analysis tools that can check for things like this; when I pinged George Mastros (@gmmastros), he agreed that it was a good idea, and is going to try to put it in an upcoming release of SQLCop.

But you can prevent these problems far earlier than that, if you’re willing to do it. I have long been a proponent of developing – at least for code that does not cross databases or servers – on a case sensitive collation. This makes sure these errors are caught up front, and not six months from now when a database is migrated or consolidated. “But won’t that slow me down?” Initially, maybe, a little. But in all honesty, IntelliSense (or tools like Redgate SQL Prompt) should prevent you from ever having to type this stuff correctly on your own anyway.

[See an index of all bad habits / best practices posts]

Aaron Bertrand

Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com.

Aaron’s blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.

Bad Habits: Being CarELesS about cAsE

This can cause problems.

PROOF.

A LITTLE SIMPLER.

SUMMARY.

PREVENTION.

Aaron Bertrand

Software Testing Tutorial

Software Testing Tutorial

Analysis Services Memory Limits

LEAVE A REPLY Cancel reply

Most Popular

7 Best Books for Learning SQL [2024 Edition]

Introduction to Web Scraping

Must Do Coding Questions for Product Based Companies

Algorithm to solve Rubik’s Cube

Recent Comments

EDITOR PICKS

Comparator comparingLong() method in Java with examples

Python – tensorflow.DeviceSpec.make_merged_spec()

Javascript Program To Find Minimum Insertions To Form A Palindrome | DP-28

POPULAR POSTS

Laravel 9 Form Validation Tutorial with Example

Maximum possible sum of squares of stack elements satisfying the given properties

Python | fabs() vs abs()

POPULAR CATEGORY

ABOUT US

FOLLOW US