SQL Server Integration Services (SSIS) is a workflow orchestration service used mostly for data integration and transformation. It was first released as part of SQL Server 2005 and continues to receive updates and new features today. While SSIS can be installed and run on a server (on-premises or in the cloud), it’s also available in Azure Data Factory.
SSIS is commonly used to perform tasks such as:
- Moving data from one database to another
- Exporting files from a database to a file system
- Downloading files from an FTP server
- Extracting data from an operational database, transforming it, and loading it into a data warehouse
- Cleansing and standardizing address data
- Backing up SQL Server databases
- Processing Analysis Services models
While SSIS originally only ran on Windows, it has also been available for Linux since 2017.
Creating an SSIS Package
To use SSIS, you typically create an SSIS project in Visual Studio (or SQL Server Data Tools, depending on your version of SSIS). In Visual Studio 2019, there’s an SSIS extension you must install. For older versions of Visual Studio, there’s a standalone installer of SQL Server Data Tools (SSDT) with the SSIS development environment.
You can also generate SSIS packages programmatically. This is commonly done using .NET scripts or Business Intelligence Markup Language (BIML).
The SSIS extension in Visual Studio provides a visual development environment in which to create an SSIS project. Rather than looking at code, you’ll see boxes representing tasks and lines connecting them.
An SSIS project contains:
- One or more packages
- Zero or more project-level parameters
- Zero or more project-level connection managers
A package is the unit of execution in SSIS. While the technology won’t stop you from putting all your tasks in one big package, the best practice is to make each package for a single purpose. The purpose might be loading data into a certain dimension table or downloading a particular file from an FTP server. Some packages act as orchestration packages, not performing actual work themselves, but calling other packages in a specific order under specified conditions.
Parts of an SSIS Package
The main parts of SSIS package design are the control flow, the data flow, and connection managers. This is illustrated in Figure 1.
Figure 1: The main components of SSIS package design include connection managers, the control flow, and the data flow
Other objects within the package extend its functionality. These include:
- Parameters
- Variables
- Event Handlers
- Configurations
- Logging and Log Providers
The Control Flow
The control flow defines the tasks to be executed in a package and the order in which they should be run. Tasks can be linked together through constraints, which determine the conditions in which a task should execute (often, upon the success of the previously executed task). Precedence constraints include success, failure, or completion of the previous task. You can also define your own logic expression, so the task executes if the expression evaluates to true.
Common control flow tasks include Execute Package Task, Execute Process Task, File System Task, Script Task, Execute SQL Task, and Data Flow Task. Using these tasks together in a package allows you to, for example, copy data from a file server to a database, move the file to an archive folder, and send an email if the execution fails.
Third-party tasks can be added to the tasks included with SSIS. They often provide an easier way to perform actions that would normally require writing C# code in a script task or using obscure advanced configurations for a task. This might include reading the properties of a file, executing a PowerShell script, or decompressing a file.
The Data Flow
A data flow task is a special type of task designed to be executed in the control flow. It uses an in-memory pipeline to move and transform data. Because it requires extra configuration, there is a separate tab in Visual Studio for designing data flows.
There are three types of data flow components: sources, transformations, and destinations. A source component retrieves data from a source, such as a database view or a file on a server. A transformation component can be used to modify, summarize, or cleanse data. Destination components load data into a data store or create an in-memory dataset. Common transformation components include Lookup Transformation, Conditional Split Transformation, Data Conversion Transformation, Row Count Transformation, and Union All Transformation.
In addition to the data flow components included with SSIS, there are third-party data flow components designed to be helpful for development. They may be sources or destinations providing easier or better performing access to a data store. Or they may be transformations performing a commonly useful task such as hashing values, time zone conversion, or replacing unwanted characters. Often, you could create your own method to do this, but using the third-party component may save development time or increase data flow performance.
Figure 2 shows a simple control flow and data flow for an SSIS package. This package might be executed as part of the process designed to load a data warehouse. Its single purpose is to retrieve account data from Microsoft Dynamics 365 and land it in a staging table in a database.
Figure 2: The control flow for an SSIS package truncates a staging table, extracts data from Dynamics 365, and inserts it into the staging table
The control flow has four steps:
- Execute a stored procedure to log the start of package execution
- Truncate the staging table in a SQL Server database
- Execute a data flow task
- Execute a stored procedure to log the end of package execution
The control flow uses all native tasks available in SSIS. After the initial task, each task has a precedence constraint requiring the previous task to complete successfully before it runs.
The data flow task, which is executed as the third step in the control flow, has four steps:
- Retrieve data from the account entity in Microsoft Dynamics 365 (source component)
- Count the number of rows retrieved and store the result in a variable (transformation component)
- Add derived columns using the SSIS expression language (transformation component)
- Insert the resulting data into a table in a SQL Server database (destination component)
The source component in the data flow is a third-party component from SolarWinds Task Factory, and the subsequent components are native SSIS components.
Each data flow can contain multiple sources, destinations, and transformations. You can connect the output of one component to the input of the next component to create various paths. This allows you to create data flows designed to retrieve several files from different locations, combine the data from all the files, remove duplicate rows, and insert the data into a database table. The data flow might also write data to a separate destination if the data flow execution fails.
Executing an SSIS package
SSIS packages can be executed manually within Visual Studio for testing purposes. They’re generally deployed to the SSIS catalog (using a project deployment model) or a file system (using a package deployment model) for automated execution.
It’s common to use the SQL Server Agent to schedule the execution of SSIS packages. Packages can also be executed using a command-line utility, a T-SQL script, or a third-party scheduler. This allows you to trigger the execution of an SSIS package based upon a schedule (e.g., every day at 2 a.m.) or an event (e.g., a file is downloaded to a server).
Speed Up Development Time
If you’re getting started with SSIS development, check out this guide to help you create your first package. As you add more data sources and processes to your SSIS projects, you’ll likely find you need tasks or components that aren’t readily available “out of the box” in SSIS.
Be sure to look at SolarWinds Task Factory. It can save you time in building high-performance SSIS packages and eliminate tedious programming tasks for development teams using SSIS.
Meagan Longoria is an ActualTech Media Contributor, database expert, and business intelligence consultant at Denny Cherry & Associates, blogger, speaker, author, technical editor, and Microsoft Data Platform MVP.