Editor’s Note: Eric is a speaker for ODSC East 2022. Be sure to check out his talk, “Network Analysis Made Simple,” there!
Graphs, also known as networks, are ubiquitous in our world. But did you know that graphs are also related to matrices and linear algebra?
Graphs, at their core, are comprised of two sets:
- A node set
- An edge set
Nodes are entities in a graph, and edges are their relationships. One anchoring example you can use throughout this blog post is a social network – people are nodes, and their connections are edges. In “Network Analysis Made Simple”, we go much deeper into particular examples and the specific NetworkX implementation.
As it turns out, you can represent graphs as matrices! Let’s construct a minimal complex example to illustrate the point. If you had a social network of 4 people (A through D), such that they had the following connectivity pattern:
D
|
A - B - C
└-------┘
We can actually represent the graph as an adjacency matrix, which highlights which nodes are connected to which other nodes:
A B C D
┌------------┐
A | 0 1 1 0 |
B | 1 0 1 1 |
C | 1 1 0 0 |
D | 0 1 0 0 |
└------------┘
Here, a value of 1
in the matrix indicates a connection between the two nodes, while a value of 0
indicates no relationship.
Using NetworkX, it’s straightforward to create the graph and convert it to matrix form:
import networkx as nx import numpy as np G = nx.Graph() G.add_nodes_from( [("A", "B"), ("B", "C"), ("A", "C"), ("B", "D")] ) adj_mat = nx.to_adjacency_matrix(G) # This is what adj_mat looks like adj_mat # output: array([[0., 1., 1., 0.], [1., 0., 1., 1.], [1., 1., 0., 0.], [0., 1., 0., 0.]])
From the adjacency matrix, it’s already easy to infer some basic information about the graph.
Firstly, we can test whether the matrix is symmetric around the diagonal or not to infer whether or not the graph can be represented as an undirected graph. We do this by checking whether the lower triangle of the adjacency matrix is equal to the upper triangle of the adjacency matrix when either one of them is transposed:
(np.triu(adj_mat) == np.tril(adj_mat).T).all() # evaluates to `True`
If the graph is undirected, we can count the number of edges in the graph by summing up either the upper or lower triangle of the matrix. If the graph is directed, then summing the matrix tells us the total number of edges:
np.triu(adj_mat).sum() # evalutes to 4.0
Additionally, taking the k
-th matrix powers of the adjacency matrix tells us the number of ways to reach another node by taking k
hops on the graph. Let’s see the example of k = 2
:
np.linalg.matrix_power(adj_mat, 2) # output: array([[2., 1., 1., 1.], [1., 3., 1., 0.], [1., 1., 2., 1.], [1., 0., 1., 1.]])
As we can see, there are two ways to go from A and back to A (value = 2), by going A -> B -> A
and A -> C -> A
. There are no ways to go from B to D using 2 hops. Hence the value is 0.
And if you’ve been astute enough to pick it up, the diagonal also happens to tell us the degree of each node, that is, the number of neighbors that each node has:
np.diagonal(np.linalg.matrix_power(adj_mat, 2)) # output: array([2., 3., 2., 1.])
Knowing how to convert between object and matrix representations of graphs is a really useful skill, because being able to do so gives you access to the necessary programming language APIs that can make your life a lot easier. I know this from personal experience!
I will be at ODSC East 2022 teaching Network Analysis Made Simple. There, we will learn more cool stuff about graphs, their key underlying concepts, and awesome connections to other topics through this tutorial. Hope to see you there!
About the author/ODSC East 2022 speaker:
Eric Ma, PhD, is an Investigator at the Novartis Institutes for Biomedical Research, where he solves biological problems using machine learning. He obtained his Doctor of Science (ScD) from the Department of Biological Engineering, MIT, and was an Insight Health Data Fellow in the summer of 2017. He has taught Network Analysis at a variety of data science venues, including PyCon USA, SciPy, PyData, and ODSC, and has also co-developed the Python Network Analysis curriculum on DataCamp. As an open-source contributor, he has made contributions to PyMC3, matplotlib, and bokeh. He has also led the development of the graph visualization package nxviz, and a data cleaning package pyjanitor (a Python port of the R package).