KNN Model Complexity

26 July 2024

1

KNN is a machine learning algorithm which is used for both classification (using KNearestClassifier) and Regression (using KNearestRegressor) problems.In KNN algorithm K is the Hyperparameter. Choosing the right value of K matters. A machine learning model is said to have high model complexity if the built model is having low Bias and High Variance.

We know that,

High Bias and Low Variance = Under-fitting model.
Low Bias and High Variance = Over-fitting model. [Indicated highly complex model ].
Low Bias and Low Variance = Best fitting model. [This is preferred ].
High training accuracy and Low test accuracy ( out of sample accuracy ) = High Variance = Over-fitting model = More model complexity.
Low training accuracy and Low test accuracy ( out of sample accuracy ) = High Bias = Under-fitting model.

Code: To understand how K value in KNN algorithm affects the model complexity.

# This code may not run on GFG ide 
# As required modules are not found. 
   
# Import required modules 
import matplotlib.pyplot as plt 
from sklearn.datasets import make_regression 
from sklearn.neighbors import KNeighborsRegressor 
from sklearn.model_selection import train_test_split 
import numpy as np 
   
# Synthetically Create Data Set 
plt.figure() 
plt.title('SIMPLE-LINEAR-REGRESSION') 
x, y = make_regression( 
    n_samples = 100, n_features = 1,  
    n_informative = 1, noise = 15, random_state = 3) 
plt.scatter(x, y, color ='red', marker ='o', s = 30) 
   
# Train the model. 
knn = KNeighborsRegressor(n_neighbors = 7) 
x_train, x_test, y_train, y_test = train_test_split( 
    x, y, test_size = 0.2, random_state = 0) 
knn.fit(x_train, y_train) 
predict = knn.predict(x_test) 
print('Test Accuracy:', knn.score(x_test, y_test)) 
print('Training Accuracy:', knn.score(x_train, y_train)) 
   
# Plot The Output 
x_new = np.linspace(-3, 2, 100).reshape(100, 1) 
predict_new = knn.predict(x_new) 
plt.plot( 
    x_new, predict_new, color ='blue',  
    label ="K = 7") 
plt.scatter(x_train, y_train, color ='red' ) 
plt.scatter(x_test, predict, marker ='^', s = 90) 
plt.legend() 

Output:

Test Accuracy: 0.6465919540035108
Training Accuracy: 0.8687977824212627

Now let’s vary the value of K (Hyperparameter) from Low to High and observe the model complexity
K = 1

K = 10

K = 20

K = 50

K = 70

Observations:

When K value is small i.e. K=1, The model complexity is high ( Over-fitting or High Variance).
When K value is very large i.e. K=70, The model complexity decreases ( Under-fitting or High Bias ).

Conclusion:
As K value becomes small model complexity increases and as K value becomes large the model complexity decreases.

Code: Let’s consider the below plot

# This code may not run on GFG 
# As required modules are not found. 
  
# To plot test accuracy and train accuracy Vs K value. 
p = list(range(1, 31)) 
lst_test =[] 
lst_train =[] 
for i in p: 
    knn = KNeighborsRegressor(n_neighbors = i) 
    knn.fit(x_train, y_train) 
    z = knn.score(x_test, y_test) 
    t = knn.score(x_train, y_train) 
    lst_test.append(z) 
    lst_train.append(t) 
      
plt.plot(p, lst_test, color ='red', label ='Test Accuracy') 
plt.plot(p, lst_train, color ='b', label ='Train Accuracy') 
plt.xlabel('K VALUES --->') 
plt.title('FINDING BEST VALUE FOR K') 
plt.legend() 

Output:

Observation:
From the above graph, we can conclude that when K is small i.e. K=1, Training Accuracy is High but Test Accuracy is Low which means the model is over-fitting ( High Variance or High Model Complexity). When the value of K is large i.e. K=50, Training Accuracy is Low as well as Test Accuracy is Low which means the model is under-fitting ( High Bias or Low Model Complexity ).

So Hyperparameter tuning is necessary i.e. to select the best value of K in KNN algorithm for which the model has Low Bias and Low Variance and results in a good model with high out of sample accuracy.

We can use GridSearchCV or RandomSearchCv to find the best value of hyper parameter K.

Last Updated :
05 Sep, 2020

<!–

–>

KNN Model Complexity

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Unban Someone on Twitch in 2025: Quick & Easy by Penka Hristovska

Best VPNs for WhatsApp in 2025: Fast and Reliable by Danica Djokic

How to Change Facebook Location in 2025 + Marketplace & Dating by Danica Djokic

Is Akinator Safe for Kids? What Parents Need to Know in 2025 by Penka Hristovska

Recent Comments

EDITOR PICKS

How to Unban Someone on Twitch in 2025: Quick & Easy by Penka Hristovska

Best VPNs for WhatsApp in 2025: Fast and Reliable by Danica Djokic

How to Change Facebook Location in 2025 + Marketplace & Dating by Danica Djokic

POPULAR POSTS

How to Unban Someone on Twitch in 2025: Quick & Easy by Penka Hristovska

Best VPNs for WhatsApp in 2025: Fast and Reliable by Danica Djokic

How to Change Facebook Location in 2025 + Marketplace & Dating by Danica Djokic

POPULAR CATEGORY

ABOUT US

FOLLOW US

KNN Model Complexity

Please Login to comment…

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US