Lagrange interpolation — Python

Time:2022-5-26

Data analysis

Data cleaning: missing value processing, 1 deleting records2 data interpolation3 no treatment

Data inhttps://book.tipdm.org/jc/219Data and code in resource package in Chapter4 \ demo \ data \ cataling_ sale. xls

image
image

Common interpolation methods

image

Interpolation Lagrange interpolation

According to mathematical knowledge, for n known points on the plane (no two points), n-1 polynomial can be found on a straight line
image
, let the polynomial curve pass through these n points.
1) Find the N-1 degree polynomial with known n points:
image

Bring the coordinates of n points into the polynomial: getimage
Solve Lagrange interpolation polynomial:image
Bring the point x corresponding to the missing function value into the polynomial to obtain the approximate value L (x) of the trend value

#Lagrange interpolation code
Import pandas as PD # import data analysis library pandas
import numpy as np
import matplotlib.pyplot as plt
from scipy. Interpolate import Lagrange # import Lagrange interpolation function

inputfile = '../ data/catering_ sale. Xls' # sales volume data path
outputfile = '../ tmp/sales. Xls' # output data path

data = pd. read_ Excel (inputfile) # read in data
Temp = data [u 'sales volume'] [(data [u 'sales volume'] < 400) | (data [u 'sales volume'] > 5000)] # find the value that does not meet the requirements data [column] [row]
for i in range(temp.shape[0]):
    data. LOC [temp. Index [i], u 'sales'] = NP Nan # changes the non-conforming value to null value

#Custom column vector interpolation function
#S is the column vector, n is the interpolated position, K is the number of data before and after taking, and the default is 5
def ployinterp_column(s, n, k=5):
  Y = s.iloc [list (range (n-k, n)) + list (range (n + 1, N + 1 + k))] # fetching is the data passed in
  Y = y [y.notnull()] # eliminate null values
  f = lagrange(y.index, list(y))
  Return f (n) # interpolation and return the interpolation result

#Determine whether interpolation is required one by one
for i in data.columns:
  for j in range(len(data)):
    If (data [i]. Isnull()) [J]: # if it is empty, it is interpolated.
        data.loc[j,i] = ployinterp_column(data[i], j)

data. to_ Write to file (Excel, output) #
print("success")

Operation results:

image
This code can be run


problem

No,newspaperSettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
image
I don’t know how to eliminate this warning. Anyway, I just look and look. It can run when I don’t pay attention! It seems that you can’t assign multiple values at once. You should assign values separately.

last

However, we can find that there is a problem with the inserted value when we take a closer look: there is an abnormal value when we output the inserted value
image

When processing the data, we change the values less than 400 and more than 5000 into null values, and then insert the values through Lagrange interpolation. We want to insert a negative number into the data, which is very outrageous. I checked and found nothing wrong; Then I output the data used and the fitted Lagrange function:
f=-0.008874 x + 11.53 x – 6657 x + 2.242e+06 x – 4.854e+08 x + 7.005e+10 x – 6.74e+12 x + 4.168e+14 x – 1.504e+16 x + 2.411e+17
image
I didn’t find any problems. After that, I thought about whether the fitting function steps were accurate enough. I increased the points, but there were no good results, but they were more outrageous. This situation is over fitting, that is, this model can fit the model you trained very well, but the test model is not good.
For example: the following set of data can be seen with X4 function fitting does not have too many points on the model, X4 function fitting is relatively more, but if tested, the prediction of the 14th power model may be very unreasonable:
image

Finally, I reduced the value point and found that when the upper and lower points are 4, there will be a good result, and when the upper and lower points are 3, 2 and 1 (straight line, not recommended). Therefore, there is nothing wrong with the five upper and lower points we fit, but the function it fits is that the value is outrageous at that point.