Wednesday, July 3, 2024
HomeDatabasesUsing SqlBulkCopy in .NET for Faster Bulk Data Loading

Using SqlBulkCopy in .NET for Faster Bulk Data Loading

There are times when .NET developers need to bulk load data into SQL Server. If there’s one thing I’ve learned over the years when it comes to data access patterns, it’s that us .NET developers like to make the same kinds of mistakes over and over again, which is great because once you know alternative strategies, you can apply them time and time again to sort out those pain points!

Ever been in a situation where rumblings of “process X is too slow” suddenly build into a super-high priority ball of urgency when that next step up in data volume hits? Yeah, that can be fun. No, really, it can be fun because we have strategies to sort this stuff out, right?

In this blog post, I’m going to talk about one particular piece of functionality—SqlBulkCopy—that can help you with bulk data loading. If I had to single out my favorite .NET class, SqlBulkCopy would be at the top of the list. My goal is to introduce you to this class so that maybe it can become a part of your tool belt, too.

Time Is Money

All too often, we are developing to tight timescales and against relatively low data volumes. It can be easy to write code to load some data into SQL Server in a way that appears absolutely fine, but soon starts to show painful slowdown when data volumes increase. Did you have a process that should never have to load more than 10,000 rows of data? Yeah… now it needs to load up to 10,000,000 rows.

Time matters. In my experience, the ability to load data quickly can be the difference between winning a new client or losing them to a competitor. The time spent on an efficient approach to data loading can earn you real-world money.

Common Approaches

Some of the ways I’ve seen batches of data loaded into SQL Server from .NET include:

  • Using an ORM, adding items to the context, and then submitting the changes
  • Using a SqlDataAdapter to send a DataTable of items to the SQL Server database via the Update method
  • Passing XML to a stored procedure that shreds the data out and then inserts it into the table
  • Passing a Table-Valued Parameter of data to a stored procedure, which then inserts the data into the table

Typically, I see these data loads are done in a single-threaded manner. I’ve seen approaches like these as the cause of particularly poor performing data loading processes (perhaps with the exception of the Table Valued Parameter approach, which is a topic for another day).

Grab Your SqlBulkCopy Cape

SqlBulkCopy is a class in the System.Data.SqlClient namespace that allows you to load data into SQL Server in a very efficient manner. SqlBulkCopy gives you similar functionality from .NET that the bcp utility provides in SQL Server.

In a nutshell, you point SqlBulkCopy at a SQL Server table in your database and then provide the WriteToServer method with a DataTable, array of DataRows, or an IDataReader, which it will then stream into SQL Server. This is what that looks like in C#:

// `connection` is a SqlConnection, that must be open.
// `myDataTable` is a DataTable containing the rows of data we want to bulk load.
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = "MyDestinationTable";
// How many records to send to the database in one go (all of them)
bulkCopy.BatchSize = myDataTable.Rows.Count;

// Load the data to the database
bulkCopy.WriteToServer(myDataTable);

// Close up
bulkCopy.Close();
}

Maximizing Throughput

Not only is SqlBulkCopy fast in “basic” usage, but there are also specific strategies you can use with it to gain further reductions in load times. You can use multiple instances of SqlBulkCopy in parallel to load data into the same destination table.

The following are my top tips for high-throughput success with SqlBulkCopy:

  1. Create a new heap table to load into without a primary key or indexes. It’s generally recommended to add indexes to the table after loading, as doing so reduces the amount of work needed to update index structures while loading data, which improves load times. It results in significantly less fragmentation. If loading into an existing SQL Server table, consider disabling or removing any indexes before loading data. One approach I use often is to load into a heap staging table first, and then use T-SQL to migrate the data across to the final destination table. Doing so allows you to apply insert or update logic and pre-process data within SQL Server to establish foreign key identifiers.
  2. Use the SqlBulkCopyOptions.TableLock option so that it takes a bulk update lock on the SQL Server table (rather than a default row-level lock). This is a key part of gaining high throughput when loading into a heap table. If loading into a table with a clustered index that you cannot remove during the load, then you should try not using this option—parallel operations can be blocked in this scenario, unlike in the heap scenario, and therefore don’t result in the same throughput boost.
  3. Create multiple threads (on a single machine or across multiple machines), each with a SqlBulkCopy instance pointing to the same table, and split the source data across these threads. Multiple bulk update locks across these processes are compatible with each other.

Show Me Some Numbers!

I originally blogged about using SqlBulkCopy for high performance bulk loading to SQL Server all the way back in 2010 (yikes!), which goes to show just how well this data loading method has stood the test of time. I’m going to share my raw original results from back then so that you can see how much faster SqlBulkCopy is for data loading.

(Note that you can see the full context of the scenarios I was testing at the time via my blog post links below.)

Scenario: Raw speed to load 100,000 rows

SqlDataAdapter approach 25.07s
SqlBulkCopy approach 1.59s

Reference: High performance bulk loading to SQL Server using SqlBulkCopy 

Scenario: Parallelization comparison of SqlBulkCopy

1 thread loading 20 million rows 15.41s
4 threads loading 5 million rows each 12.34s

Reference: SqlBulkCopy to SQL Server in Parallel

Based on the duration I saw for 100,000 rows using a SqlDataAdapter approach, I didn’t want to wait around for 20 million rows, and my laptop at the time wasn’t going to be happy with me.

Takeaways

Over the past decade, I’ve used SqlBulkCopy many times with success to rework poorly performing data loading processes. Although you can use it only to perform inserts, it can be part of a strategy for bulk updates by loading to a staging table and then running an UPDATE statement to update the target table. Using SqlBulkCopy can yield massive improvements when used as an alternative to looking up a row via an ORM, updating, and then persisting back via the ORM.

SqlBulkCopy has been a valuable asset in my tool belt for more than 10 years. If it’s not something you’ve used before, be sure to check it out!

Adrian (@AdaTheDev) is a Lead Development Engineer at SentryOne, focusing on the development of the Data DevOps SaaS product portfolio – namely SentryOne Document and SentryOne Test. Based in the UK, he joined SentryOne from Pragmatic Works in 2018, with 18 years experience in software development that includes a background in developing SaaS platforms in the hospitality and digital marketing industries. Adrian is a fan of good (bad) puns, dad jokes, and proper British biscuits.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments