Saturday, January 4, 2025
Google search engine
HomeData Modelling & AIFour Practical Use Cases for Grouped Concatenation

Four Practical Use Cases for Grouped Concatenation

In my two recent SQLPerformance.com posts on grouped concatenation (Grouped Concatenation in SQL Server and Grouped Concatenation : Ordering and Removing Duplicates), I dealt with some pretty frivolous use cases: flattening pet names and coffee orders. I did not mean to trivialize this, as there certainly are some real and useful scenarios where grouped concatenation can be very helpful.

Let’s forget about performance for a moment, or even user-facing processes at all. A lot of tasks are admin-related and require at least the concept of looping, if not an explicit loop or cursor (or multiple). Think about how you would do these things in T-SQL otherwise:
 

1. Search all character-based columns for a specific string
The search ultimately needs to check every string column in every table, but as a bonus, we want to scan each table only once. (Most scripts I’ve found out there generate a distinct SELECT ... WHERE col LIKE ... query for each relevant column in a table, and since it’s likely that a wildcard-like search is going to scan anyway, may as well only scan once.)

 

DECLARE 
  @SchemaName NVARCHAR(128)  = NULL,
  @TableName  NVARCHAR(128)  = NULL,
  @SearchTerm NVARCHAR(4000) = N'%foo%';
 
DECLARE @s NVARCHAR(MAX) = N'';
 
;WITH t AS
(
    SELECT 
      t.[object_id], 
      [table] = t.name, 
      [schema] = s.name
    FROM sys.tables AS t
    INNER JOIN sys.schemas AS s ON t.[schema_id] = s.[schema_id]
    WHERE (t.name = @TableName  OR @TableName  IS NULL)
      AND (s.name = @SchemaName OR @SchemaName IS NULL)
      AND EXISTS
      (
        SELECT 1 FROM sys.columns
          WHERE [object_id] = t.[object_id]
          AND system_type_id IN (35,99,167,175,231,239)
      )
)
SELECT @s = @s + N'SELECT N''' 
    + REPLACE([schema],'''','''''')
    + '.' + REPLACE([table], '''','''''') + ''',* 
 FROM ' + QUOTENAME([schema]) + '.' + QUOTENAME([table]) 
 + '
 WHERE ' + STUFF((SELECT '
 OR ' + QUOTENAME(name) + ' LIKE ' + CASE 
      WHEN system_type_id IN (99,231,239) 
      THEN 'N' ELSE '' END
      + '''' + @SearchTerm + ''''
    FROM sys.columns
    WHERE [object_id] = t.[object_id]
    AND system_type_id IN (35,99,167,175,231,239)
    ORDER BY name
    FOR XML PATH(''), TYPE
).value(N'.[1]', N'nvarchar(max)'),1,6,'') + ';
 
'
FROM t;
 
PRINT @s;
-- EXEC sp_executesql @s;

 

Now, the PRINT command is limited to 8K of output, so you may not see the entire command there. Don’t worry, it’s all there (see this tip for more information).

 

2. Remove repeated values in an existing comma-separated list
Someone might send you data in the form bob, sandy, tracy, bob, tracy, frank, tracy. If your job is to output bob, frank, sandy, tracy, I have a solution for you that does not require messy parsing or looping. Just a split function (see this post for several examples and performance comparisons) and then some ordered concatenation to put it back together. Assuming you have a split function that takes a list and a delimiter as arguments:

DECLARE @names NVARCHAR(4000) = N'bob,sandy,tracy,bob,tracy,frank,tracy';
 
;WITH x AS
(
  SELECT Item 
    FROM dbo.SplitStrings(@names, N',') 
    GROUP BY Item
)
SELECT STUFF((SELECT N',' + Item
  FROM x
  ORDER BY Item
  FOR XML PATH(''), TYPE
).value(N'.[1]',N'nvarchar(max)'),1,1,'');

 

If they can send the data to you as a table-valued parameter, that would be even better:

CREATE TYPE dbo.Names AS TABLE(Name NVARCHAR(128));
GO
 
DECLARE @table dbo.Names;
 
INSERT @table(Name) VALUES('bob'),('sandy'),('tracy'),
    ('bob'),('tracy'),('frank'),('tracy');
 
SELECT STUFF((SELECT DISTINCT N',' + Name
  FROM @table
  ORDER BY N',' + name
  FOR XML PATH(''), TYPE
).value(N'.[1]',N'nvarchar(max)'),1,1,'');

 

 

 

3. Generate a random string with a defined set of characters
Often people will want to generate a random string based on pulling characters out of a GUID or from a series of characters. What about a random password generator that’s made up of a defined set of characters, like A-Z, a-z, 0-9, and any of !@#$%^&*()?

 
If we just want characters that are valid in a GUID, that’s no problem. Let’s say we want an 8-character password; we can generate random strings like this:

SELECT LEFT(CONVERT(CHAR(36),NEWID()),8);

 

But for a wider range of characters, we need something a little more robust. Let’s say we define our character set like above, as upper and lower alpha, digits from 0 to 9, and then !@#$%^&*(). We can fill a table variable with the set of values we want to choose from, then concatenate the top 8 values ordered by NEWID(), for example:

DECLARE @c TABLE(c CHAR(1));
 
INSERT @c(c) 
SELECT CHAR(number)
  FROM [master].dbo.spt_values 
  WHERE [type] = N'P' AND
  (
    number IN (33,94) 
    OR number BETWEEN 35 AND 38
    OR number BETWEEN 40 AND 42
    OR number BETWEEN 48 AND 57
    OR number BETWEEN 64 AND 90
    OR number BETWEEN 97 AND 122
  );
 
;WITH x AS 
(
  SELECT TOP (8) c 
    FROM @c 
    ORDER BY NEWID()
)
SELECT
(
  SELECT c FROM x FOR XML PATH(''), TYPE
).value(N'.[1]',N'nvarchar(max)');

 

For the above string, each character in your allowed set can only appear once. If you want to allow characters to appear more than once, you can just break the composition up, and re-order it:

;WITH x AS 
(
  SELECT TOP (8) c FROM 
  (
    SELECT TOP (4) c FROM @c ORDER BY NEWID()
    UNION ALL
    SELECT TOP (4) c FROM @c ORDER BY NEWID()
  ) AS sub ORDER BY NEWID()
)
SELECT 
(
  SELECT c FROM x FOR XML PATH(''), TYPE
).value(N'.[1]',N'nvarchar(max)');

 

(As an aside, I would pre-generate this set and pull values off as you needed them, rather than generate them at runtime and pay the ever-increasing cost of detecting duplicate 8-string values, if you care about those. See this post over on SQLPerformance.com for more details.)

 

4. Generate drop/create scripts for every foreign key constraint
This can be useful in scenarios where you need to clean out data and you want to truncate, either because you don’t want to pay the penalty of deletes or you don’t want to have to structure your deletes in precarious order – otherwise you could just disable and re-enable the constraints. It can also be necessary for tasks that are not possible in straight DDL without dropping and re-creating the table – for example, adding or dropping the IDENTITY property from a column.

 
The drop is easy; just build a simple list of ALTER TABLE commands from sys.foreign_keys (I’ve also added commented-out filters that could be used to limit the scope to referenced tables matching a naming pattern or existing within a certain schema):

DECLARE @drop NVARCHAR(MAX) = N'';
 
SELECT @drop += N'
ALTER TABLE ' + QUOTENAME(cs.name) + '.' + QUOTENAME(ct.name) 
    + ' DROP CONSTRAINT ' + QUOTENAME(fk.name) + ';'
FROM sys.foreign_keys AS fk
INNER JOIN sys.tables AS ct ON fk.parent_object_id = ct.[object_id]
INNER JOIN sys.schemas AS cs ON ct.[schema_id] = cs.[schema_id];
-- WHERE fk.referenced_object_id IN 
-- (SELECT [object_id] FROM sys.tables WHERE name LIKE N'%some pattern%')
-- WHERE fk.referenced_schema_id IN 
-- (SELECT [schema_id] FROM sys.schemas WHERE name = N'some_schema')
 
PRINT @drop;

 

Don’t execute that yet! Of course, you need to generate the CREATE commands while they still exist. Create is a little more complex. We need to generate the list of columns on both sides of the constraint, even though in most cases there is only one column (the multi-column situation is where grouped concatenation comes in real handy):

DECLARE @create NVARCHAR(MAX) = N'';
 
SELECT @create += N'
ALTER TABLE ' 
   + QUOTENAME(cs.name) + '.' + QUOTENAME(ct.name) 
   + ' ADD CONSTRAINT ' + QUOTENAME(fk.name) 
   + ' FOREIGN KEY (' + STUFF((SELECT ',' + QUOTENAME(c.name)
    FROM sys.columns AS c 
    INNER JOIN sys.foreign_key_columns AS fkc 
    ON fkc.parent_column_id = c.column_id
    AND fkc.parent_object_id = c.[object_id]
    WHERE fkc.constraint_object_id = fk.[object_id]
    ORDER BY fkc.constraint_column_id 
    FOR XML PATH(N''), TYPE).value(N'.[1]', N'nvarchar(max)'), 1, 1, N'')
  + ') REFERENCES ' + QUOTENAME(rs.name) + '.' + QUOTENAME(rt.name)
  + '(' + STUFF((SELECT ',' + QUOTENAME(c.name)
    FROM sys.columns AS c 
    INNER JOIN sys.foreign_key_columns AS fkc 
    ON fkc.referenced_column_id = c.column_id
    AND fkc.referenced_object_id = c.[object_id]
    WHERE fkc.constraint_object_id = fk.[object_id]
    ORDER BY fkc.constraint_column_id 
    FOR XML PATH(N''), TYPE).value(N'.[1]', N'nvarchar(max)'), 1, 1, N'') + ');'
FROM sys.foreign_keys AS fk
INNER JOIN sys.tables AS rt -- referenced table
  ON fk.referenced_object_id = rt.[object_id]
INNER JOIN sys.schemas AS rs ON rt.[schema_id] = rs.[schema_id]
INNER JOIN sys.tables AS ct -- constraint table
  ON fk.parent_object_id = ct.[object_id]
INNER JOIN sys.schemas AS cs ON ct.[schema_id] = cs.[schema_id];
--WHERE rt.name LIKE N'%some pattern%'
--WHERE rs.name = N'some_schema'
 
PRINT @create;

 

When you’re happy with the output (keep in mind, again, that PRINT is limited to 8K, so it may look like the command is truncated), add this to the end, and run it again:

EXEC sp_executesql @drop;
 
-- drop and re-create the table here, whatever that needs to entail
 
EXEC sp_executesql @create;

 

 


Conclusion

While I will admit the basic thrust of the FOR XML PATH() syntax isn’t simple the first time you use it, and will state again that the goal is not performance, and that these might not be the most optimal way to solve some of these tasks. I just can’t stand writing cursors, never mind double- or triple-nested cursors… not exactly the kind of code I want to maintain (YMMV). Since these are largely admin, one-off tasks that end users aren’t tapping their fingers waiting for, solving the problems using grouped concatenation is a lot more interesting to me than writing lines and lines of boring cursors over and over again. In addition, the grouped concatenation method remains largely unchanged as you port it from problem to problem, so a large part of it is something you can memorize before too long, without even trying.

Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com.

Aaron’s blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.

RELATED ARTICLES

Most Popular

Recent Comments