SentryOne Document is a great solution for documenting your data systems, providing you with up-to-date documentation at all times and the ability to dive into the history of that documentation. However, it doesn’t stop at documentation—SentryOne Document also does lineage analysis, enabling you to trace the usage and movement of data around your entire BI stack.
This blog post isn’t intended to provide an introduction to what lineage analysis or impact analysis are but rather a few tips and tricks for how to get the most out of SentryOne Document’s lineage analysis mode.
Know Your Question
The lineage analysis capabilities in SentryOne Document can be configured multiple ways and can support impact analysis (“What will be impacted if I change this object/column/entity?”), lineage analysis (“Where did the data in this column come from?”), and even foreign key analysis (“How are these tables connected?”).
When you fire up the Lineage view in SentryOne Document, you’re likely to be looking at an unfiltered view that effectively shows inbound and outbound connections for all connection types. Knowing the question you want to answer will enable you to configure lineage analysis to give you much more succinct, digestible answers.
Link Type Filters Are Your Friend
One of the most useful features in the Lineage view is the ability to filter the types of links shown. You can filter by Link Type using the following section of the toolbar:
The first button shows/hides foreign key links. These are links created between tables and columns when a foreign key reference is present between them. It can be useful for answering the question, “What related data is available given a starting table?”
The second button controls the visibility of object links. Object links are useful when asking the question “What would be affected if I changed a certain object?” These are links created between object links because one object’s definition depends on another—hence object links are created where DML scripts are used. For example, this could be in a stored procedure, execute SQL task in SQL Server Integration Services (SSIS), or a dataset in SQL Server Reporting Services (SSRS). Let’s say that a stored procedure contains an EXEC statement to execute another stored procedure—that would create an object link. Or, say that an SSRS dataset used a user-defined function and a table—that would create an object link to both of those objects.
The third button allows you to show/hide data flow links. Data flow links answer, “Where did this data come from?” and “Where does this data get used?” These are links created between entities because data is moving from one to the other. Imagine you have a procedure that inserts data from Table1 into Table2—a data flow link would be created. Another example would be that you have an SSIS package that has a data source that picks up data from a table—a data flow link would be created.
Granularity Gives You the Big Picture
The Granularity Detail option effectively allows you to look at a less complex lineage set by limiting the depth at which objects are shown. For each technology type, the low/medium/high detail levels will relate to something slightly different.
For example, for SQL Server, Low equates to database level, Medium equates to object level, and High equates to column level, whereas for SSIS Low equates to package level, Medium equates to components, and High equates to columns. Irrespective of the differences between the technologies, a lower granularity level will give you a simplified view with which you can inspect links at the system level—answering questions such as “Which packages connect to this database?” or “Which SQL Server Analysis Services servers connect to this SQL Server?”
Often, different granularities work well for answering different types of questions. If you are looking at the movement of data, high granularity will give you the most detailed information about the journey of data through your system. If you’re looking at object dependencies, high granularity often won’t add a lot of value—it’s better to choose a medium granularity.
Text Mode for Larger Data Sets
It is very common for data systems to grow in complexity over time. Often, we can end up with mega-tables that have far too many columns and end up related to everything else in the database. In these circumstances, even starting with a low Dependency Level and extending can lead to an unwieldy graph—this is where Text View comes to your rescue. Enabling the Text View will present the dependencies as a list that you can scroll through, and you can even drill down into each item to follow the path through, much like you would using Extend Mode.
Extend Works Wonders
It can be tempting to set the dependency levels high and click an object to see everything that is connected through n degrees. Sometimes, in simpler systems, doing so can yield the exact answer you wanted. Often, though, it will yield a graph that contains a lot of objects that is very hard to digest. A better approach can be to start with a Dependency Level of 1, and extend the object that you are most interested in. The effect this has is to provide more information in an iterative fashion, which allows you to understand each step more thoroughly.
Aliases to the Rescue
Often, when we have a complex environment, a single machine can be referenced by several names. If you are connecting to a SQL Server on that machine from the machine itself, then you might connect with the name localhost. If you’re in the same network, you might connect to data-server-17. Depending on your network configuration, maybe you have to connect to data-server-17.acmewidgets.local. As networks evolve, these permutations can become even more complex.
These differences can cause disconnects in lineage, especially because something like localhost will mean different things depending on where it is used from. This is where aliases come into play—they are effectively saying something like, “For this solution item, when connecting to SQL Server, localhost actually means data-server-17.” After your next snapshot, everything links up. Magic!
Entity Type Filters
A more recent addition to the Lineage view is the ability to specify the entity types that you want to look at. This has the effect of filtering not only the graph but also the object tree. For example, if we want to look at only relationships between tables and views, we can select those object types and view the dependencies between them. This filtering is especially useful when using the Environment Map, which connects SentryOne SQL Sentry to SentryOne Document. If you want to look at only the machines that are connecting to a particular server, then this type of filtering allows you to do just that.
Wrapping It Up
Hopefully, this blog post has given you a few pointers as to how to get the most out of the lineage analysis functionality in SentryOne Document. Taming the vast quantity of information provided through lineage analysis is key to leveraging your investment well and getting the answers that you and your teams need. Happy data hunting!
Additional Resources
- Webinar: Database Documentation and Lineage in the Cloud
- Blog: Documentation and Impact Analysis Redefined
- Blog: Map Your SQL Server Estate with SQL Sentry and SentryOne Document
- Blog: SentryOne Document and Your Data Protection Roadmap
- Blog: A New, On-Premises Approach to Documentation and Data Lineage: SentryOne Document Software
- Documentation: SentryOne Document Lineage
Matt is Director of Platform Delivery at SentryOne, facilitating development activities across our product portfolio. Having spent the first part of his career working in payment and loyalty systems, working with several high volume databases, Matt developed a passion for tooling around database systems. He took some time to develop the tools that eventually became DBA xPress when they were acquired by Pragmatic Works. After working with Pragmatic Works to build out their database tooling, Matt joined SentryOne where he is excited to have the opportunity to take that tooling to the next level.