Prerequisite: PageRank Algorithm
The more popular a webpage is, the more are linkages that other webpages tend to have to them. Weighted PageRank algorithm is an extension of the conventional PageRank algorithm based on the same concept.
Weighted PageRank algorithm assigns higher rank values to more popular (important) pages instead of dividing the rank value of a page evenly among its outlink pages. Each outlink page gets a value proportional to its popularity, i.e. its number of inlinks and outlinks.
To a webpage ‘u’, an inlink is a URL of another webpage that contains a link pointing to ‘u’. Similarly to webpage ‘u’, an outlink is a link appearing in ‘u’ which points to another webpage.
Win(v,u) is the weight of link (v, u) calculated based on the number of inlinks of page u and the number of inlinks of all reference pages of page v.
Here, Ip and Iu represent the number of inlinks of page ‘p’ and ‘u’ respectively. R(v) represents the list of all reference pages of page ‘v’.
Wout(v,u) is the weight of link (v, u) calculated based on the number of outlinks of page u and the number of outlinks of all reference pages of page v.
Here, Op and Ou represent the number of outlinks of page ‘p’ and ‘u’ respectively. R(v) represents the list of all reference pages of page ‘v’.
Based on the importance of all pages as describes by their number of inlinks and outlinks, the Weighted PageRank formula is given as:
Here, PR(x) refers to the Weighted PageRank of page x.
d refers to the damping factor. The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is the damping factor.
Imagine a scenario where there are 5 webpages A, B, C, D and E.
- Site A (outlinks to B, C, D)
- Site B (outlinks to A, C, D)
- Site C (outlinks to D)
- Site D (outlinks to C, E)
- Site E (outlinks to B, C, D)
The below code demonstrates how the Weighted PageRank for each webpage in the above scenario can be calculated. The input is taken in the form of an outlink matrix and is run for a total of 5 iterations.
Code:
python3
def win(matrix, m, o): k = 0 for i in range ( 0 , n): if ( int (matrix[i][m]) = = 1 ): k = k + 1 l = 0 for i in range ( 0 , n): if ( int (matrix[o][i] = = 1 )): for j in range ( 0 , n): if (matrix[j][i] = = 1 ): l = l + 1 return float (k / l) def wout(matrix, m, o): k = 0 for i in range ( 0 , n): if ( int (matrix[ 0 ][i]) = = 1 ): k = k + 1 l = 0 for i in range ( 0 , n): if ( int (matrix[o][i] = = 1 )): for j in range ( 0 , n): if (matrix[i][j] = = 1 ): l = l + 1 return float (k / l) def pagerank(matrix, o, n, p): a = 0 for i in range ( 0 , n): if ( int (matrix[i][o]) = = 1 ): k = 0 for s in range ( 0 , n): if (matrix[i][s] = = 1 ): k = k + 1 a = a + float ((p[i] / k) * win(matrix, i, o) * wout(matrix, i, o)) return a n = 5 matrix = [[ 0 , 1 , 1 , 1 , 0 ], [ 1 , 0 , 1 , 1 , 0 ], [ 0 , 0 , 0 , 1 , 0 ], [ 0 , 0 , 1 , 0 , 1 ], [ 0 , 1 , 1 , 1 , 0 ]] d = 0.25 # damping factor o = 5 print ("Number of iterations is :", o) sum = 0 p = [] for i in range ( 0 , n): p.append( 1 ) for k in range ( 0 , o): for u in range ( 0 , n): g = pagerank(matrix, u, n, p) p[u] = ( 1 - d) + d * g for i in range ( 0 , n): sum + = p[i] print ("Page rank of node ", i + 1 , " is : ", p[i]) print (" Sum of all page ranks: ", sum ) |
Output:
Number of iterations is: 5 Page rank of node 1 is : 0.7563090216610497 Page rank of node 2 is : 0.7570825988098917 Page rank of node 3 is : 1.021617613959075 Page rank of node 4 is : 0.9412927238508162 Page rank of node 5 is : 0.7735323180962704 Sum of all page ranks: 4.249834276377103