One of the easier ways to improve the performance of your SQL Server and Azure SQL database queries is to ensure you choose the right data types for your data, and the data types in your application’s code match the ones in your stored procedures and queries. Choosing the right data type conserves space, because doing something like choosing a variable character type for data of fixed, regular length like a phone number or national ID number is wasteful. Other reasons for choosing your data types carefully will be addressed later in this post.
What Are Data Types and What Do They Mean?
In SQL Server, every column, variable, expression, and parameter is associated with a data type. A data type defines the characteristics and storage of data in that column. Some common data types are INT for integers, CHAR and VARCHAR for strings (character data), and DATE and DATETIME for date and time data. SQL Server also provides the ability to define custom data types in .NET, but in this blog post, you’ll learn about the built-in data types in the engine.
Different data types consume different amounts of space on disk. For example, integer data presents three different options for storage, described in Table 1.
Data type |
Range |
Storage |
TINYINT |
(0-255) |
1 Byte |
SMALLINT |
(-32,768-32,767) |
2 Bytes |
INT |
(2,147,483,6480- 2,147,483,647) |
4 Bytes |
BIGINT |
(-9,223,372,036,854,775,808- 9,223,372,036,854,775,807) |
8 Bytes |
Table 1: Integer data types
The range of storage required for each of those data types is widespread, so choosing the right data type is a trade-off in space and business requirements. It’s important to include the business decision, because changing the data type of a column (especially the key column in a clustered index) can require downtime to change the data type.Table 1: Integer data types
Fixed vs. Variable Length Datatypes
SQL Server has both fixed and variable length data types, each with unique storage requirements. Thus, strings can be stored in the CHAR, VARCHAR, and NVARCHAR data types. Storing the letter “a” in a char(1) field requires a single byte, but storing this same “a” in a varchar(1) field would require two additional bytes for overhead. The expansion of space can be even greater if you encode your strings in Unicode (international data with special characters). Unicode’s UTF-32 format requires four bytes per character.
Note: The use of varchar(1) is just an example to highlight the data storage requirements of the varchar data type. Varchar is typically used with larger values such as names or email addresses.
Fixed width data types should be used for data with a known length: phone numbers for a particular country, U.S. Social Security numbers, credit card numbers, and so on. Variable length fields should be used for values without static lengths, such as names, email addresses, and street names.
Why Did the Development Team Use All MAX Values?
A common issue DBAs may encounter when working with third-party applications is all the fields in tables use VARCHAR(MAX) or NVARCHAR(MAX) data types. While this design is easy, it’s inefficient and can lead to poor performance. The common reason for this is the way SQL Server used to handle text values that were too large for the data type. SQL Server has historically experienced difficulties that have led to many applications using vastly oversized data types. One manifestation is the dreaded 8152 error message shown in Figure 1.
Figure 1: Data overflow error message (SQL Server, pre-2016)
Figure 1 shows a very basic example of SQL Server behavior when you overflow a text column. Although in this instance the failure is obvious, you can imagine the difficulty of trying to find the record in a bulk loading process that causes this error. Thankfully, Microsoft resolved the error in the SQL Server 2016. Now, your developers can see the truncated value, as shown in Figure 2. This allows you to more easily troubleshoot potential data issues and avoid the blind oversizing of data types that happened in the past.
Figure 2: Data overflow error message (SQL Server, 2016 or later)
What’s the Impact of Using the Wrong Data Types?
You may ask why data types matter, since we’re only talking about a handful of bytes of data. And, after all, storage is cheap nowadays. So, really, who cares whether database objects have wasteful data types? There are a couple of pretty significant performance issues here. The first is called implicit conversion. I’ll use a very simple example:
This example creates a table called numbers with an INT column and inserts a record with the value 1. The subsequent SELECT command quotes the number, so SQL Server accepts the value as character data. But because the V1 column is INT, before the engine can retrieve the value from the table, it has to convert the character into an INT value. This process, known as implicit conversion, adds CPU overhead and can limit the options the query optimizer has to retrieve the data. While this basic example runs quickly, implicit conversions can dramatically impact the performance of your application. You should ensure all your application code variables match the data types of your parameters in SQL Server to avoid this performance drain.
The other negative impact on performance is the sheer size of the table. Incorrect data types make your table larger than it needs to be, which affects many downstream operations such as memory grants, page splits, and writes to the transaction log. Each of these activities changes the behavior of SQL Server in different ways, which limit resources available to other queries in your database, and thus in turn limits the concurrency of your application performance.
Size Your Objects
As you can see, properly sizing your database objects is important for storage efficiency and the overall performance of your server. When you start researching database design tools, look at SolarWinds. Its free DBA xPress tool can help provide the necessary visibility into your data, keeping your pipeline humming. Download the free tool to see what it can do for you.
Joey D’Antoni is an ActualTech media contributor and a principal consultant at Denny Cherry and Associates, Microsoft Data Platform MVP, and VMware vExpert.