Friday, November 29, 2024
Google search engine
HomeData Modelling & AIVisualizing Data Compression: See it for Yourself

Visualizing Data Compression: See it for Yourself

Introduction

In my previous post on compression, “#EntryLevel : Compression & Data Types“, I demonstrated how to get a feel for compression effectiveness based on data types and repeating values. I’ll be providing some examples that expand on the information in that post, so you may want to review it, if you haven’t already. In this post, I’m going to explain how you can see the impact of compression on specific values in your data. This isn’t something you’ll necessarily ever have to do, but it’s helpful for a deeper understanding of how compression works. Visualizing data compression allows you to see how SQL Server is storing the data.

Commands to Visualize Data Compression

When I say “Visualizing Data Compression”, I mean viewing the page data through DBCC PAGE. While it’s not necessary to understand the details of DBCC PAGE to apply data compression, it’s a great thing to know about. Paul Randal’s posts, “Inside the Storage Engine: Using DBCC PAGE and DBCC IND to find out if page splits ever roll back” and “Inside the Storage Engine: Anatomy of a page” provide fantastic information for the curious.

DBCC IND

To use DBCC PAGE, you’ll need to get a PagePID to look at a specific data page. For example, I can get a list of PagePIDs for the Person.Person table in my AdvetureWorks2014PE database by executing the following:

DBCC IND ('AdventureWorks2014PE', 'Person.Person', 0);

DBCC IND ExamplePagePID values returned by DBCC IND command

DBCC PAGE

Now, I can run DBCC PAGE to explore what’s on the page with ID 1472, knowing that will correspond to a page of data from Person.Person.

--Refer to aforementioned DBCC links for parameter option details
DBCC TRACEON (3604); --Output results to messages/console
GO 
DBCC PAGE ('AdventureWorks2014PE', 1, 1472, 3); --1 is the file number, 3 the output option
GO

This returns a large text result containing the contents of that page, as shown below.

DBCC PAGE ExampleBeginning of DBCC PAGE Results

sys.fn_PhysLocCracker

You may have noticed that DBCC IND provided an index of data pages for the table, but didn’t hint at the values stored on those pages. When visualizing data compression, we want to look at specific values. The sys.fn_PhysLocCracker function returns the page_id information for all of the rows or specific rows you want to see.

SELECT * FROM Person.Person
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%);

This adds the file_id, page_id, and slot_id to the end of the results.

sys.fn_PhysLocCracker ExamplePhysical Location Information Provided by sys.fn_PhysLocCracker

Visualizing Data Compression

Now, let’s pull this together to view the contents of a page with no compression, row compression, and page compression applied.

No Compression

First, I’ll execute a query to find the page storing “Melissa Price”.

SELECT page_id, slot_id FROM Person.Person
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%)
WHERE LastName = 'Price'
AND FirstName = 'Melissa';

This returns page_id 194193 and slot_id 3, so I execute the following to view the page contents.

DBCC TRACEON (3604); 
GO 
DBCC PAGE ('AdventureWorks2014PE', 1, 194193, 3); 
GO

Then, I’ll go to the slot holding the record I’m after and view its contents. I’ve cut some of the unnecessary details from the output and highlighted the columns we’ll focus on for the remaining examples of this data. BusinessEntityID is an int, PersonType is nchar(2), and FirstName is nvarchar(50). Notice that those last two are capable of storing Unicode characters and take up twice the space of char and varchar.

Slot 3 Offset 0x10a2 Length 1388
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 1388
Memory Dump @0x000000392245B0A2
0000000000000000: 30002900 a61e0000 49004e00 fa020000 00c92a41 0.).¦…I.N.ú….É*A
Slot 3 Column 1 Offset 0x4 Length 4 Length (physical) 4
BusinessEntityID = 7846
Slot 3 Column 2 Offset 0x8 Length 4 Length (physical) 4
PersonType = IN
Slot 3 Column 3 Offset 0xc Length 1 (Bit position 0)
NameStyle = 0
Slot 3 Column 4 Offset 0x0 Length 0 Length (physical) 0
Title = [NULL]
Slot 3 Column 5 Offset 0x3d Length 14 Length (physical) 14
FirstName = Melissa
Slot 3 Column 6 Offset 0x4b Length 2 Length (physical) 2
MiddleName = R
Slot 3 Column 7 Offset 0x4d Length 10 Length (physical) 10
LastName = Price
Slot 3 Column 8 Offset 0x0 Length 0 Length (physical) 0
Suffix = [NULL]
Slot 3 Column 9 Offset 0xd Length 4 Length (physical) 4
EmailPromotion = 2
Slot 3 Column 10 Offset 0x0 Length 0 Length (physical) 0
AdditionalContactInfo = [NULL]
Demographics = [BLOB Inline Data] Slot 3 Column 11 Offset 0x57 Length 1301 Length (physical) 1301 ÷
Slot 3 Column 12 Offset 0x11 Length 16 Length (physical) 16
rowguid = 6e412ac9-e798-4423-80ff-697f928b67a8
Slot 3 Column 13 Offset 0x21 Length 8 Length (physical) 8
ModifiedDate = 2014-02-02 00:00:00.000
Slot 3 Offset 0x0 Length 0 Length (physical) 0
KeyHashValue = (981e4ad228ae)

Next, I’ll apply Row and Page compression, and follow the same steps to look at this data, again, cutting out much of the excess text.

Row Compression

Slot 4 Offset 0x1591 Length 1359
Record Type = (COMPRESSED) PRIMARY_RECORD Record attributes = LONG DATA REGION
Record size = 1359
CD Array
Slot 4 Column 1 Offset 0x9 Length 4 Length (physical) 2
BusinessEntityID = 7846
Slot 4 Column 2 Offset 0xb Length 4 Length (physical) 3
PersonType = IN
Slot 4 Column 5 Offset 0xe Length 14 Length (physical) 7
FirstName = Melissa

Notice that row compression was able to store those three values more effectively.

Page Compression

Slot 0 Offset 0x5ad Length 891
Record Type = (COMPRESSED) PRIMARY_RECORD Record attributes = LONG DATA REGION
Record size = 891
CD Array
Slot 0 Column 1 Offset 0x9 Length 4 Length (physical) 2
BusinessEntityID = 7846
Slot 0 Column 2 Offset 0x0 Length 4 Length (physical) 0
PersonType = IN
Slot 0 Column 5 Offset 0x0 Length 14 Length (physical) 0
FirstName = Melissa

Page compression uses additional steps to reduce redundancy (again, see my prior post for more details). It’s not able to save additional space on the int value, however, the PersonType and FirstName have repeating values on the page, so it’s able to store those once, and not have to store them in the individual slots.

In the page header, I can see some of those values stored under “CompressionInfo Raw Bytes”.

CompressionInfo Raw Bytes

0000000000000000: 06000041 054d0521 0d400008 00000a18 494e104d …A.M.!.@……IN.M
0000000000000014: 656c6973 736180a2 c6000000 00010100 1b05dfff elissa.¢Æ………ßÿ

SentryOne Database Example

For SentryOne users, I recommend applying compression to the repository database. I have a series of posts explaining how and where to do that. Previously, I noted that the IX_Unique1 index on EventSourceHistoryDetail compresses rather well. I saw an 80% savings in the SentryOne database I used.

EventSourceHistoryDetail ExamplePage Compression for EventSourceHistoryDetail

The data types for the three columns in that index are uniqueidentifier, bigint, and nvarchar(128).

EventSourceHistoryDetail Index Data TypesEventSourceHistoryDetail Index Data Types

The EventSourceID and RemoteObjectID columns contain relatively few unique values, and while RemoteSequenceID contains unique values, they are bigints (which means the value 1 eats up 8 bytes of space). Let’s look at those three values across the SQL Server Data Compression spectrum.

No Compression

Slot 0 Column 3 Offset 0x14 Length 16 Length (physical) 16
EventSourceID = 0c260691-16c3-4bc9-8457-220bc4457749
Slot 0 Column 6 Offset 0x24 Length 8 Length (physical) 8
RemoteSequenceID = 1
Slot 0 Column 7 Offset 0x9b Length 72 Length (physical) 72
RemoteObjectID = 731dfa50-9458-48b3-9640-a3aec5b0da11

Row Compression

Slot 0 Column 3 Offset 0x41 Length 16 Length (physical) 16
EventSourceID = 0c260691-16c3-4bc9-8457-220bc4457749
Slot 0 Column 6 Offset 0x12 Length 8 Length (physical) 1
RemoteSequenceID = 1
Slot 0 Column 7 Offset 0x51 Length 72 Length (physical) 37
RemoteObjectID = 731dfa50-9458-48b3-9640-a3aec5b0da11

Row compression isn’t able to do anything with the EventSourceID, but it’s able to optimize the storage for the RemoteSequenceID and RemoteObjectID.

Page Compression

Slot 0 Column 3 Offset 0x0 Length 16 Length (physical) 0
EventSourceID = 0c260691-16c3-4bc9-8457-220bc4457749
Slot 0 Column 6 Offset 0x12 Length 8 Length (physical) 1
RemoteSequenceID = 1
Slot 0 Column 7 Offset 0x0 Length 72 Length (physical) 0
RemoteObjectID = 731dfa50-9458-48b3-9640-a3aec5b0da11

Page compression saves an impressive amount of space with this data because it’s able to store the EventSourceID and RemoteObjectID once on this page instead of for each row with those same values. These particular columns had a combined size of 96; now it’s 1.

In Conclusion

As I mentioned, you don’t need to know about DBCC PAGE in order to apply compression to a SQL Server database. You can, however, use it to see how specific data is being compressed for a deeper understanding of data compression. If you’re not sure about the savings you’re seeing, you can use it to verify what SQL Server is doing at the page level. Visualizing data compression provided a sanity check for me at times. Having explored various data types at different compression levels, I feel like I just “get it” more now that I’ve seen it.

If you’re just starting out with compression or haven’t looked into it because it was an Enterprise Edition feature, you should know that it’s available beyond Enterprise Edition in SQL Server 2016 SP 1. Now is the perfect time to master data compression!

Melissa is the Product Education Manager at SentryOne. Melissa has over a decade of experience with SQL Server through software performance and scalability testing, analysis and research projects, application development, and technical support.

RELATED ARTICLES

Most Popular

Recent Comments