Curve fitting examines the relationship between one or more predictors (independent variables) and a response variable (dependent variable), with the goal of defining a “best fit” model of the relationship. It is the process of constructing a mathematical function, that has the best fit to a series of data points possibly subject to constraints.
Curve fit in Python
In Python, we can perform curve fit by using scipy.optimize library.
Syntax:
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(- inf, inf), method=None, jac=None, *, full_output=False, **kwargs)
Parameters:
- f (callable function): The model function, f(X, . . .). It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments.
- xdata (array_like or object): The independent variable where the data is measured. Should usually be an M-length sequence or an (k,M)-shaped array in case of functions with k predictors (multiple independent variables).
- ydata (array_like): The dependent data, a length M array – nominally f(xdata, . . .).
The curve_fit uses the non-linear least squares method by default to fit a function, f, to the data points.
Defining Model function
We define the function (curve) to which we want to fit our data. Here, a and b are parameters that define the curve. In this example, we choose y=(a(x_2)^2+b(x_2)^2) as our model function.
Python3
def f(X, a, b): x_1, x_2 = X return a * x_1 * * 2 + b * x_2 * * 2 |
Initializing the independent(y) and dependent(X) data
In this step, we initialize the independent data x_1 and x_2 using np.linspace(0, 4, 50) which creates an evenly spaced array over a specified interval. In the case of multiple independent variables X = (x_1, x_2, ). Then we initialize dependent data y using the model function and adding noise (np.random.random(50)*4) to it.
Python3
import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit x_1 = np.linspace( 0 , 4 , 50 ) x_2 = np.linspace( 0 , 4 , 50 ) X = (x_1, x_2) y = f(X, 2 , 4 ) # Adding Noise y = y + np.random.random( 50 ) * 4 # plotting the data points fig = plt.figure() fig.set_figwidth( 40 ) fig.set_figheight( 10 ) ax = plt.axes(projection = '3d' ) ax.set_xlabel( 'x_1' , fontsize = 12 , color = 'green' ) ax.set_ylabel( 'x_2' , fontsize = 12 , color = 'green' ) ax.set_zlabel( 'y' , fontsize = 12 , color = 'green' ) ax.scatter3D(x_1, x_2, y, color = 'green' ) plt.title( "Plot of data points" ) plt.show() |
Output:
The curve_fit() method returns the following output:
- popt (array): Optimal values for the parameters so that the sum of the squared residuals of f(xdata, *popt) – ydata is minimized.
- pcov2-D (array): The estimated covariance of popt. The diagonals provide the variance of the parameter estimate.
Python3
popt, pcov = curve_fit(f, X, y) popt |
Output:
array([-96.89634526, 103.10365474])
Visualizing Results
Now by using the parameters we have obtained from the curve_fit method plot the curve in the 3-D plane using the plot3D method.
Python3
# plotting the data points and the fitted curve fig = plt.figure() fig.set_figwidth( 40 ) fig.set_figheight( 10 ) ax = plt.axes(projection = '3d' ) ax.set_title( 'Curve fit plot' , fontsize = 15 ) ax.set_xlabel( 'x_1' , fontsize = 12 , color = 'green' ) ax.set_ylabel( 'x_2' , fontsize = 12 , color = 'green' ) ax.set_zlabel( 'y' , fontsize = 12 , color = 'green' ) ax.scatter3D(x_1, x_2, y, color = 'green' ) ax.plot3D(x_1, x_2, popt[ 0 ] * (x_1 * * 2 ) + popt[ 1 ] * (x_2 * * 2 ), color = 'black' ) plt.show() |
Output:
We can optimize our solution with the help of other parameters such as p0 and bounds. p0 is the Initial guess for the parameters (length N). If None, then the initial values will all be 1. Bounds are used to set lower and upper bounds on parameters. Defaults to no bounds.