In this article, we will see how to compute one of the methods to determine forecast accuracy called the Symmetric Mean Absolute Percentage Error (or simply SMAPE) in Python.
The SMAPE is one of the alternatives to overcome the limitations with MAPE forecast error measurement. In contrast to the mean absolute percentage error, SMAPE has both a lower bound and an upper bound, therefore, it is known as symmetric. The ‘S’ in SMAPE stands for symmetric, ‘M’ stands for mean which takes in the average value over a series, ‘A’ stands for absolute that uses absolute values to keep the positive and negative errors from canceling one another out, ‘P’ is the percentage which makes this accuracy metric a relative metric, and the ‘E’ stands for error since this metric helps to determine the amount of error our forecast has.
The formula for SMAPE:
Consider the following example, where we have the sales information of a store. The day column represent the day number which we are referring to, the actual sales column represents the actual sales value for the respective day whereas the forecast sales column represents the forecast-ed values for the sales figures (probably with an ML model). The final column is the division between 3rd last and the 2nd last columns.
Day No. |
Actual Sales |
Forecast Sales |
A |forecast – actual| |
B (|actual| + |forecast|) / 2 |
A / B |
---|---|---|---|---|---|
1 |
136 |
134 |
2 |
135 |
0.014 |
2 |
120 |
124 |
4 |
122 |
0.032 |
3 |
138 |
132 |
6 |
135 |
0.044 |
4 |
155 |
141 |
14 |
148 |
0.094 |
5 |
149 |
149 |
0 |
149 |
0 |
The SMAPE value for the above example will be the mean value of the entries in A/B column. The value comes out to be 0.0368.
Calculate SMAPE in Python
Python
import pandas as pd import numpy as np # Define the function to return the SMAPE value def calculate_smape(actual, predicted) - > float : # Convert actual and predicted to numpy # array data type if not already if not all ([ isinstance (actual, np.ndarray), isinstance (predicted, np.ndarray)]): actual, predicted = np.array(actual), np.array(predicted) return round ( np.mean( np. abs (predicted - actual) / ((np. abs (predicted) + np. abs (actual)) / 2 ) ) * 100 , 2 ) if __name__ = = '__main__' : # CALCULATE SMAPE FROM PYTHON LIST actual = [ 136 , 120 , 138 , 155 , 149 ] predicted = [ 134 , 124 , 132 , 141 , 149 ] # Get SMAPE for python list as parameters print ( "py list :" , calculate_smape(actual, predicted), "%" ) # CALCULATE SMAPE FROM NUMPY ARRAY actual = np.array([ 136 , 120 , 138 , 155 , 149 ]) predicted = np.array([ 134 , 124 , 132 , 141 , 149 ]) # Get SMAPE for python list as parameters print ( "np array :" , calculate_smape(actual, predicted), "%" ) # CALCULATE SMAPE FROM PANDAS DATAFRAME # Define the pandas dataframe sales_df = pd.DataFrame({ "actual" : [ 136 , 120 , 138 , 155 , 149 ], "predicted" : [ 134 , 124 , 132 , 141 , 149 ] }) # Get SMAPE for pandas series as parameters print ( "pandas df:" , calculate_smape(sales_df.actual, sales_df.predicted), "%" ) |
Output:
py list : 3.73 % np array : 3.73 % pandas df: 3.73 %
Explanation:
In the program, we have calculated the SMAPE metric value for the same dataset provided in 3 different data type formats as function parameters, namely, python list, NumPy array, and pandas dataframe. The function is generalized to work with any python series-like data as input parameters. The function first converts the datatypes as numpy array so that the calculation becomes easier using the NumPy methods. The return statement can be explained through the following image: