Friday, November 15, 2024
Google search engine
HomeData Modelling & AIA Complete Introduction to SQL Server Transactions

A Complete Introduction to SQL Server Transactions

One of the most fundamental concepts in any relational database management system (RDBMS), such as SQL Server, is the transaction. During my consulting career, I’ve seen many performance problems caused by developers not understanding how transactions work in SQL Server. In this tutorial, I’ll explain what transactions are, why they’re necessary, and how they work in SQL Server.

What Is a Transaction?

A transaction is a unit of work in the database. Every transaction has a defined starting point and ending point. The ending point may be the transaction committed (i.e., completed successfully) or transaction finished rolling back (i.e., didn’t complete successfully), and I’ll discuss the meaning of those terms a little later.

The basic syntax for transactions is as follows:

  • BEGIN TRANSACTION or BEGIN TRAN: starts the transaction.
  • COMMIT TRANSACTION or COMMIT TRAN: ends the transaction successfully.
  • ROLLBACK TRANSACTION or ROLLBACK TRAN: causes the transaction to end unsuccessfully, so all operations performed in the transaction are reversed.

You can also specify a transaction name, but this isn’t required, and I don’t often see them used.

It’s crucial to think about what a unit of work is. It means all changes to the database within the confines of the transaction, typically DML data modification operations like insert statements, update statements, and delete statements. Depending on the kind of transaction used, it might be a single T-SQL statement or multiple statements. If it’s a single statement, it doesn’t necessarily mean a single change—consider a table with 1,000 rows, and someone runs an UPDATE statement with a WHERE clause. A single statement will cause a change to all the rows in the table and at least 1,000 changes to the database within the transaction.

Even if the UPDATE statement only operates on a single row in the table, there are still at least two changes to the database: the update of the row itself on a data file page and the differential bitmap page to mark the portion of the database as changed (so the next differential backup will back up the extent the page is part of). You can read more about differential backups in the article Native SQL Server Backup Types and How-To Guide. I’ve seen plenty more examples over the years where a single statement can cause many changes to the database depending on data types like varchar, table formats, whether nonclustered indexes exist, and so on.

Why Are Transactions Necessary?

Transactions are part of how SQL Server implements the ACID properties of a database (atomicity, consistency, isolation, and durability) along with mechanisms like locking and logging.

A transaction guarantees its unit of work is either wholly present in the database or wholly not present due to the atomicity in the ACID properties (I’ll explain how this is done later). Transactions are useful for SQL Server developers to control whether a set of operations (e.g., implementing some business logic) completely succeeds. There are no partially executed sets of operations to leave the database inconsistent from a business perspective.

A classic example is moving money (a debit) from a checking account to a deposit account. Implemented as a single transaction, if the debit succeeds and the credit fails, the transaction as a whole fails, and the debit is reversed when the transaction rolls back.

In all cases, changes to the database are performed under locks held by the transaction and not released until the transaction ends. Using the default isolation level, which is called read committed, other transactions won’t see these changes until the transaction has committed (ended), hence the name of the isolation level—the isolation in the ACID properties.

For instance, a change to a row will involve the row being exclusively locked by the transaction. Another transaction wanting to read the row will usually require a share lock on the row, which will also be blocked. This behavior can be changed if the reading transaction changes to the read uncommitted isolation level (or uses the NOLOCK option on the SELECT statement), which doesn’t require share locks for reading rows but introduces the possibility of anomalies occurring.

Types of Transactions in SQL Server

There are three basic types of transactions in SQL Server:

  1. Explicit transactions, as the name suggests, must be explicitly started with a BEGIN TRANSACTION statement and explicitly ended with either a COMMIT TRANSACTION statement or a ROLLBACK TRANSACTION statement. In other words, the SQL Server developer controls when the unit of work is committed or not.
  2. Autocommit transactions are when the developer doesn’t control the starting and ending points of the transaction. Each T-SQL statement is its own transaction SQL Server begins and commits automatically under the covers. There’s no concept of making a change to a SQL Server database without a transaction starting, as SQL Server must have the ability to roll back the change if something goes wrong.
  3. Implicit transactions are when a transaction is automatically started by SQL Server as soon as a change is made to the database but remains active until it’s explicitly ended. At this point, a new transaction starts automatically. This behavior isn’t the default and must be specifically enabled using a SET IMPLICIT_TRANSACTIONS statement, which isn’t normally done except to allow behavior compatibility with another RDBMS where this is the default behavior. I’ve seen this problem when developers don’t realize implicit transactions are enabled and don’t think they need to explicitly commit the transaction (more on this in the “common mistakes” section below).

With all three of these transaction types, if SQL Server encounters a problem, the entire transaction will automatically roll back.

There are also two more advanced kinds of transactions in SQL Server, though they’re beyond the scope of this article:

  1. Batch-scoped transactions are only used during Multiple Active Result Sets sessions.
  2. Distributed transactions are used when a local transaction needs to coordinate with multiple SQL Server instances, such as running stored procedures with business logic on different servers, using a Distributed Transaction Coordinator, or by the service itself on Azure Managed Instance.

How Does Commit Work in SQL Server?

Consider a simple example of an explicit transaction inserting a record into a table using the code:

The insert statement causes some locks to be acquired, which provides the isolation portion of the ACID properties of a database. These will only be released once the transaction has committed. When the COMMIT TRANSACTION statement executes, I know the insert is now durable. But how does this actually happen?

All changes to a database are logged. Simply put, when a change is made to a data file page, a description of the change is generated (called a log record) and entered into the database transaction log. Also, when a transaction begins or commits, a log record generates. Our simple explicit transaction will now have three log records in the transaction log (all with the same transaction ID), one for each of the three statements executed. In fact, if I’d used an autocommit transaction instead of an explicit transaction (executing just the insert statement), SQL Server would’ve automatically started and committed the transaction. There would still be three log records in the transaction log for the transaction. One interesting fact you might not know is SQL Server usually names transactions it starts. In this case, the name would be “INSERT.”

When a transaction commits, SQL Server has to make sure all the log records for the transaction are in the transaction log on disk and not just in memory, so in the event of a crash, the transaction can be replayed, guaranteeing its durability. It does this by making sure the transaction log in memory up to the log record for the COMMIT TRANSACTION is flushed to disk before the commit is acknowledged back to the user or application. The sequence of operations when a commit occurs is as follows:

  • Make sure the log is flushed to disk
  • If there’s a synchronous database mirror or synchronous availability group replica, make sure the log is also written to disk for their log files on the remote servers
  • Release the locks the transaction is holding
  • Acknowledge the commit has happened

There’s no need to also flush the changed data file pages to disk at this point, as the transaction has already been made durable by making sure the descriptions of all the changes are on disk. The data file pages will be written out later by a checkpoint operation—a topic for a future article.

How Does Rollback Work in SQL Server?

When a transaction must be rolled back, all operations included in the transaction are essentially reversed, so none of the data modifications from the transaction are present in the database. As the log records for a transaction are linked together in reverse order, the transaction log allows the transaction’s changes to be undone in reverse order.

Consider another simple example:

At this point, there are three log records for the transaction. If I then decide to execute a rollback command, SQL Server does the following:

  1. Finds the most recent log record for the “forward” part of the transaction, works out what operation will undo the change described by the log record, performs the operation, and generates a log record
  2. Finds the previous log record, pointed to by the “previous log record” LSN
  3. Repeats until the begin log record is reached. At this point, the rollback is complete, so another log record is generated to indicate the transaction has successfully aborted, and three more log records will generate for my example

As you can see, rolling back a transaction takes a lot of work under the covers.

It’s also possible to define a savepoint using the SAVE TRANSACTION statement and roll back to the named point in the transaction rather than rolling the entire transaction back.

Common Mistakes Capable of Causing Transaction Log Problems

The first mistake is to forget to commit a transaction, meaning everything subsequently happening on the connection is part of the same transaction. As more changes are made, more log records are generated, and more transaction log space is required. You can’t reuse the space used to store log records from earlier in the transaction (i.e., allowing the log to truncate), as those log records must remain in case the transaction rolls back (and they’re needed for the mechanism I described above). The transaction log will likely grow… and grow… and grow, until someone finally commits the long-running transaction and allows the log to be brought back under control.

The second mistake is to inadvertently execute some code that does a lot more work than you thought—for instance, performing an update on a very large table (e.g., a billion rows) and forgetting a WHERE clause. For every row updated, there’s at least one log record generated, so there will be at least a billion log records generated for the transaction, which will likely cause explosive transaction log growth. A DBA who doesn’t know how rollback works might be tempted to cancel the update immediately. However, a knowledgeable DBA will know rolling back a long-running transaction will generate at least the same number of log records as have already been generated. Rolling back in this example could take a lot more time, so you may decide the prudent course of action is to complete the update.

If you have a transaction log seemingly growing out of control, you can see why by running this code:

If one of these two mistake scenarios is the culprit, the output will look like this:

If not, you can read about the other possible values and what they mean in the Microsoft documentation here, in the Factors that can delay log truncation section.

Importance of Understanding Transactions in SQL Server

It’s essential to understand what transactions are and design your code to make sure it appropriately implements your business logic. Not understanding the internals I’ve described can lead to mistakes and cause problems for database administrators. In my experience, database administrators need to know how transactions work and what potential mistakes exist so they can troubleshoot issues around the transaction log more effectively.

There are many more facets to using transactions, such as specifying isolation levels and designing efficient code. Still, I hope this initial primer has given you a good grounding in why transactions are needed and how they work. I know it’s a bit of a cliché, but with SQL Server, it’s definitely a case of “the more you know, the further you’ll go!”

To gain fuller visibility into SQL Sentry transactions and performance, SolarWinds® SQL Sentry® is built to provide quick access to important metrics you can use to analyze even the most challenging issues. You can request a quote or download a free trial of SQL Sentry here.

Paul S. Randal is the CEO of SQLskills.com, which he runs with his wife Kimberly L. Tripp. Both Paul and Kimberly are widely-known and respected experts in the SQL Server world, and both are long-time SQL Server MVPs. Paul was a Contributing Editor for TechNet Magazine, where he wrote the bi-monthly SQL Q&A column and feature articles. He also had #1 top-rated workshops and sessions at the PASS Summit and TechEd. Paul is active in the SQL Server community, from user groups to online forums to helping out on Twitter (@PaulRandal – check out the #sqlhelp tag). His popular and widely-referenced blog can be found at https://www.sqlskills.com/blogs/paul/ and he can be reached at paul@sqlskills.com.

RELATED ARTICLES

Most Popular

Recent Comments