In this article, we are going to see how to rank Numpy arrays with ties-breakers in Python.
The ranking is an essential statistical operation used in numerous fields like data science, sociology, etc. A very brute-force approach would be to sort the indices of the array in order of their corresponding values. Such an approach would be handy in cases that don’t involve the same values in the given set of numbers. This article will take it one step ahead and explore the rankdata() function from the Python library Scipy and illustrate its usage for lists that have ties.
rankdata() function
For computing the ranks, we’ll use the rankdata() function in scipy.stats library in Python. The function has five different tie-breaking strategies, and its syntax is as follows:
Syntax: scipy.stats.rankdata(arr, method=’average’, axis=None)
Parameters:
- arr: A n-dimensional array
- method: A string mentioning the tie-breaking strategy. It is of 5 types:
- ‘average’: The average of the ranks that would have been assigned to all the tied values is assigned to each value.
- ‘min’: The minimum of the ranks that would have been assigned to all the tied values is assigned to each value.
- ‘max’: The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
- ‘dense’: The rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
- ‘ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in arr.
- axis: Axis along which to perform the ranking. If None, the data array is first flattened.
Returns: An Numpy array of size equal to the size of arr, containing rank scores.
Example 1: Ranking on a 1-D Numpy Array
In this example, we’ll explore all the tie-breaking strategies over a 1-dimensional Numpy array.
Python3
import numpy as np from scipy.stats import rankdata arr = np.array([ - 20 , - 10 , - 10 , - 10 , 10 , 20 , 20 , 50 , 50 , 60 , 60 , 60 , 60 , 60 ]) # Normal ranking; each value has distinct rank print (f"Ordinal ranking: {rankdata(arr, method = 'ordinal' )}") # Average ranking; each value's # rank is averaged over all ties print (f"Average ranking: {rankdata(arr, method = 'average' )}") # Max ranking; each value's rank is the # maximum ordinal rank for the corresponding # tie print (f" Max ranking: {rankdata(arr, method = 'max' )}") # Min ranking; each value's rank is # the minimum ordinal rank for the corresponding # tie print (f" Min ranking: {rankdata(arr, method = 'min' )}") # Dense ranking; each value's rank # is sequentially arranged print (f"Dense ranking: {rankdata(arr, method = 'dense' )}") |
Output:
Example 2: Ranking on a 2-D Numpy Array along a particular axis using the ‘axis’ argument
In this example, we’ll explore all the tie-breaking strategies over a 2-dimensional Numpy array along the rows.
Python3
arr = np.array([[ - 20 , - 10 , - 10 , - 10 , 10 , 20 , 20 ], [ 50 , 50 , 60 , - 20 , 60 , 60 , 60 ], [ - 20 , 50 , - 10 , - 30 , 60 , 20 , 60 ]]) # Normal ranking; each value has distinct rank print (f"Ordinal ranking:\n {rankdata(arr, method = 'ordinal' , axis = 0 )}") # Average ranking; each value's # rank is averaged over all ties print (f"Average ranking:\n {rankdata(arr, method = 'average' , axis = 0 )}") # Max ranking; each value's rank is # the maximum ordinal rank for # the corresponding tie print (f" Max ranking:\n {rankdata(arr, method = 'max' , axis = 0 )}") # Min ranking; each value's rank is the # minimum ordinal rank for the corresponding # tie print (f" Min ranking:\n {rankdata(arr, method = 'min' , axis = 0 )}") # Dense ranking; each value's rank # is sequentially arranged print (f"Dense ranking:\n {rankdata(arr, method = 'dense' , axis = 0 )}") |
Output:
As we can see, the value for each column in the 2-D array ‘arr’ is assigned a rank by comparing the corresponding entries in the same row.