Have you been tasked with making queries run faster? Indexes can often help with query performance, but do you know where to start?
Good news! I recently participated in a SentryOne webinar with SQL Server expert, Sebastian Meine, Ph.D., on “SQL Server Indexing for Developers and Accidental DBAs.” In the interactive session, Sebastian answered common index questions like:
- Which columns should I add to my index?
- How many indexes should I add to my new table?
- Does the column order in my index have to match that in my query?
- Does it hurt to have too many indexes?
- When should I consolidate Indexes?
- Are there queries that get slower after I create an index?
He also explained how indexes are organized in SQL Server and what mechanism is responsible for the amazing performance gains you can achieve with them.
If you missed the live webinar, view the on-demand version here.
During the webinar, we received questions from attendees. We’ve shared the answers below.
Webinar Q&A
Q: Will using table variable in UDF kill performance?
Sebastian: That’s not necessarily an indexing problem. Both UDFs and table variables have their own performance problems. And giving you advice without knowing more is a little difficult. In general, try to reduce copy operation (moving data from one table to another). Also, if you have more than 100 rows in the table variable, consider using a temp table instead.
Kevin: Table variables are notorious for not having useful statistics, always assuming either 100 rows (for later releases of SQL Server) or 1 row (for earlier). The more rows stored in your table variable, the worse performance might be. On the UDF side, it’s possible that the UDF may cause the query optimizer to only do scans on a given indexed column or otherwise thwart good performance. Read the execution plan to see more about your performance problem. Naturally, I recommend using Plan Explorer.
Q: If I’m currently dealing only with pages in memory, when the page is evicted from memory is the data written to the owning data table?
Sebastian: The table consists of pages. So, when we write to the page, we actually write to the table (and vice versa). When the page is evicted, it gets written to disk (assuming it was changed).
Kevin: SQL Server allows us to write to pages in memory without those pages being written to disk. Pages in this state are known as dirty pages. SQL Server has an internal process called the checkpoint, which you can also invoke manually, to write dirty pages to disk. Checkpoints also occur at the interval set by sp_configure setting recovery interval and at other times when SQL Server deems it necessary. That’s probably more than you wanted to know. But the bottom line is that SQL Server will never evict a page from memory, if it has changed, without writing those changed values to disk.
Q: What is the selective indexed column and unselective indexed column? What is the effect of performance?
Sebastian: The selectivity of an index is a measure for how unique the index key is. A highly selective index would be a unique index. An unselective index example would be an index on a binary flag. In general, the more selective, the more effective an index is (but as always, there are exceptions).
Q: We have Always On, primary node for application (heavy DMLs), and secondary node for reporting (select statements). Is it possible to add more indexes in node2 only for SELECT queries?
Sebastian: I’m not aware of a way to do that. If you use transactional replication, additional indexes on the subscriber side are possible. Always On requires the databases to be in the same state.
Q: Why are the pages limited to 8K? Wouldn’t larger pages minimize the data movement and page splits?
Sebastian: Pages are organized in blocks of 8 called an “extent.” And there’s a setting that allows you to have that be the smallest unit per table. However, in general, I’d leave it at the default setting.
Kevin: Definitely leave those at the default. For what it’s worth, other database platforms allow you to define your own page sizes. SQL Server doesn’t. Why? The answer is that SQL Server is very closely tied to the underlying Windows OS and, for many years, actually ceded control of those aspects of the system. On the other hand, other database platforms like Oracle and IBM DB2 ran on dozens or hundreds of OS’es, each with their own file control and management subsystems. Consequently, you’ll see some discussion among DBAs of those platforms about what file/block/segment size is optimal. Thankfully, we don’t have to worry about that issue because it was settled long ago as an artifact of the Windows-based heritage of SQL Server.
Side-bar: Windows OS and SQL Server support another memory model called Large Pages. It is useful for some situations on high-end, dedicated SQL Server machines with lots of RAM. It can make things a lot worse when implemented recklessly. Read more about large pages from the OS viewpoint here and about implementing in in SQL Server here. You’ll also need to enable a setting, depending on the version of Windows, called Lock Pages in Memory. Again, this is a very temperamental setting. So, don’t go here without testing to confirm its value.
Q: It would seem that one should avoid include columns within indexes. Do you agree?
Sebastian: Everything is a trade-off. So, yes, including columns in an index, particularly as part of the key (instead of using the INCLUDE clause), is expensive, but it is often still the right thing to do.
Kevin: I don’t have an aversion to using INCLUDE columns. But I do have an aversion to constructing a hypothesis without thorough testing. In cases like this, adding an INCLUDE column to an index might vastly improve the performance of a single query and the entire workload of a SQL Server. But the only way to be sure is to test.
Q: How do Index Rebuilds and Reorganize affect the pages?
Sebastian: That is the topic of another webinar. In general, SQL Server will try to clean things up as much as possible. On a rebuild, it will also take the page fill factor into consideration. And it will reduce skew (which doesn’t really have a performance impact).
Kevin: It depends. I always laugh when I answer a question with this response, because it’s always true for every technology question. 😊 There are some minor differences in the two when it comes to handling pages, with most of the differences happening under the REORGANIZE process, not the REBUILD process. Read more from SQL Server uber-genius, Paul Randal, in his blog post, “SQLskills SQL101: REBUILD vs. REORGANIZE.”
Additional Resources
I’d like to extend a special thanks to Sebastian for participating in this webinar and answering attendees’ questions.
If you haven’t already, I hope you’ll view the SQL Server Indexing for Developers and Accidental DBAs webinar on-demand at your convenience. Here are some additional resources for further information and best practices.
Kevin (@kekline) serves as Principal Program Manager at SentryOne. He is a founder and former president of PASS and the author of popular IT books like SQL in a Nutshell.
Kevin is a renowned database expert, software industry veteran, Microsoft SQL Server MVP, and long-time blogger at SentryOne. As a noted leader in the SQL Server community, Kevin blogs about Microsoft Data Platform features and best practices, SQL Server trends, and professional development for data professionals.