Friday, November 29, 2024
Google search engine
HomeData Modelling & AITable-Valued Parameters vs. SqlBulkCopy for Bulk Loading Data into SQL Server

Table-Valued Parameters vs. SqlBulkCopy for Bulk Loading Data into SQL Server

In my previous blog post, I talked about SqlBulkCopy and how it’s been a useful part of my toolbelt as a .NET developer since I first discovered it in 2010. I touched on some other common approaches to bulk loading data into SQL Server from .NET that I’ve seen over the years that, perhaps with the exception of one approach, have been related to performance issues.

The exception is Table-Valued Parameters (TVPs). I was deliberately vague regarding this approach in my previous blog post because it deserves a deeper dive and a comparison to SqlBulkCopy. Both are particularly useful approaches to know about. We’re going to look at two specific things:

  1. Do both the TVP approach and SqlBulkCopy support the same use cases?
  2. How do they perform in a side-by-side comparison for bulk loading?

What Are Table-Valued Parameters?

TVPs were introduced in SQL Server 2008 and provide a means to pass a structured data set into a SQL statement, stored procedure, or function. TVPs are a powerful feature that provides the ability to more easily perform set-based operations. TVPs enable you to pass in multiple rows of data via a single table-typed parameter to then work with. 

Let’s cover a simple scenario that will form the basis of the load test comparison later in this blog post.

Example Usage

Step 1: Create a User-Defined Table Type

The user-defined table type defines the structure of the data, which then allows you to pass sets of data into/around your SQL statements and routines.

CREATE TYPE dbo.TestData AS TABLE
(
[Index] INTEGER,
[ImportantDate] DATETIME,
[SomeText] VARCHAR(30)
)

 

Step 2: Define a TVP

The user-defined type is used as the data type for a parameter—in this case, a stored procedure, but it could equally be an ad hoc, parameterized SQL statement. Note that READONLY must be specified, which means you cannot modify the data in that table parameter in the routine, so INSERTs/UPDATEs/DELETEs are not allowed on it.

Here, we’re simply copying all the data from the TVP into a target SQL table.

CREATE PROCEDURE dbo.LoadTvp
@Data dbo.TestData READONLY
AS
BEGIN
INSERT dbo.LoadTest([Index], [ImportantDate], [SomeText])
SELECT [Index], [ImportantDate], [SomeText]
FROM @Data
END

 

Step 3: Passing a TVP from .NET

Most, if not all, ORMs support TVPs, but we’re just going to use raw ADO.NET here. 

// Define a DataTable that matches the schema of the user-defined table type.
var dataTable = new DataTable();
dataTable.Columns.Add("Index", typeof(int));
dataTable.Columns.Add("ImportantDate", typeof(DateTime));
dataTable.Columns.Add("SomeText", typeof(string));

// Populate the dataTable with some values here…

using (var connection = new SqlConnection("…..connection string…."))
{
connection.Open();

using (var command = new SqlCommand("dbo.LoadTvp", connection))
{
command.CommandType = CommandType.StoredProcedure;

       // Add the Table-Valued Parameter using the SqlDbType.Structured type
       command.Parameters.Add(new SqlParameter("@Data", SqlDbType.Structured) {Value = dataTable});

       command.ExecuteNonQuery();
}
}

We’re using a DataTable as the source to pass in as the TVP, but you can also provide a DbDataReader, IEnumerable<T>, or SqlDataRecord. These would allow us to stream data out of one data source (e.g., a database or a file) and into the TVP, rather than loading the whole set into memory, as we are doing with the DataTable for the purposes of this scenario.

Use Cases: Table-Valued Parameters vs. SqlBulkCopy

One thing that might stand out immediately is the extra potential that the TVP approach can offer in terms of functionality. SqlBulkCopy is purely a bulk insert operation that can only feed data directly into a physical table in your database, whereas the TVP approach gives you the ability to pass a set of data into a SQL routine that can then use that data set how it likes.

For example, you could pass a list of IDs into a stored procedure for it to then lookup and return in a single query. Or, you could implement more complex loading logic inside the SQL routine that is receiving the data such as to deal with rows that might already exist in the target table via an INSERT or UPDATE approach. This flexibility really makes TVPs a great option.

For the performance comparison we’re going to run through next, we’re focusing on a like-for-like process of bulk inserting data into a destination table with no primary key or indexes—essentially a continuation of the same scenario we looked at in my previous blog post to see how TVPs behave in the same context.

Bulk Load Performance Comparison: Table-Valued Parameters vs. SqlBulkCopy

The Microsoft documentation on TVPs mentions that when working with smaller volumes of up to around 1,000 rows, TVPs can have better performance compared to BULK INSERT operations, which have a greater startup cost. In this section, we’re going to put that to the test to see how the two approaches compare in terms of raw bulk load performance in the following scenarios:

  • As the number of rows to insert increases in orders of magnitude from 100 rows up to 10 million rows
  • As we introduce parallelism to the loading process (which we know is a benefit of SqlBulkCopy)

For this test, we’re using a test harness that will run a suite of scenarios for both approaches in a consistent manner by:

  1. Recreating the test database from scratch, initializing the data and log file sizes to be big enough to avoid file growth being triggered during the test.
  2. Generating the rows of data for that run up front entirely in-memory into a DataTable. The data size generated will be identical for each test scenario and is made up of a unique INTEGER field, a DATETIME, and a VARCHAR.
  3. Loading the data into an empty heap table, with no indexes.
  4. Recording the timing from the moment the loading of the in-memory data starts to when it completes—purely the time to push data through the TVP or SqlBulkCopy loading process into the final table in a SQL Server 2019 database (excluding the time taken to generate the data).

The tests were run several times to check the overall consistency of the trends. Fluctuations are inevitable across runs, especially with lower data volumes, but the trends remain consistent. The full source for the test harness can be found here.

The results are as follows, with the fastest time highlighted in green and the slowest time highlighted in red for each data volume. (Note that all times are in milliseconds.)

 

Number Of Rows

Approach

100

1,000

10,000

100,000

1,000,000

10,000,000

SqlBulkCopy (Single Task)

59.490

55.835

86.488

191.372

1522.548

13895.131

TVP (Single Task)

26.718

55.249

57.798

385.262

3408.420

35243.123

SqlBulkCopy (4 Parallel Tasks)

93.034

76.628

91.925

227.001

933.903

6545.935

TVP (4 Parallel Tasks)

102.803

95.545

465.692

593.250

5169.282

49308.048

If we graph these results, we can better visualize and compare the approaches. The graphs below use the same data as the table above, the time taken to load each volume of rows into SQL Server, for each approach.

In both graphs, the X-axis for Number of Rows uses a logarithmic scale. But the difference between the graphs comes on the Y-axis. In the first graph, the Y-axis uses a linear scale, which helps to visualize just how much better SqlBulkCopy performs as the data volumes increase when compared to the TVP approach. In the second graph, the Y-axis uses a logarithmic scale, which helps to highlight the overall picture, including at the lower data volumes, and shows that there is a linear trend as data volumes increase up to 10 million rows. As the data volume increases by an order of magnitude, so does the duration.

Load Times by Data Volume_1

Load Times by Data Volume_2

At lower volumes of data, like the Microsoft documentation indicates, we do see better performance using the TVP approach. As the data volume increases, SqlBulkCopy delivers better throughput, even more so when parallelism is introduced.

In my previous blog post, we saw that SqlBulkCopy supports multiple bulk update locks across multiple processes simultaneously when loading into a heap table. This really shows here, with SqlBulkCopy more than 3 times faster to load 1 million rows than the TVP approach and more than 5 times faster for 10 million rows. The TVP approach did not benefit from parallelization.

Does This Mean SqlBulkCopy Is Better Than Table-Valued Parameters?

No! As is so often the case, it depends. Here, we’ve looked at one specific comparison of the performance for a raw bulk load into a heap table. If you aren’t loading data into a heap table, then you’d lose some of the optimizations SqlBulkCopy has to offer. There are many ways to achieve a goal and many variables that come into play when weighing the best approach for a given situation. The key is knowing the options that are at your disposal.

TVPs can be a great way to implement data processes in a set-based/batched manner, especially when working with smaller sets of data. They offer greater flexibility and can be more in line with existing data access patterns, especially given that ORMs support them natively. 

Adrian (@AdaTheDev) is a Lead Development Engineer at SentryOne, focusing on the development of the Data DevOps SaaS product portfolio – namely SentryOne Document and SentryOne Test. Based in the UK, he joined SentryOne from Pragmatic Works in 2018, with 18 years experience in software development that includes a background in developing SaaS platforms in the hospitality and digital marketing industries. Adrian is a fan of good (bad) puns, dad jokes, and proper British biscuits.

RELATED ARTICLES

Most Popular

Recent Comments