Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Time:2022-5-9

Original link: http://tecdat.cn/?p=23921

This paper describes the process of training support vector regression model, which is used to predict power consumption based on several weather variables, an hour of the day, and whether the day is weekend / holiday / home working day or ordinary working day.

Fast description of support vector machine

Support vector machine is a form of machine learning, which can be used for classification or regression. As simple as possible, support vector machine finds the best line or plane to divide two groups of data, or in the case of regression, finds the best path to describe the trend within the tolerance range.

For classification, the algorithm minimizes the risk of misclassification of data.

For regression, the algorithm minimizes the risk of data points not obtained by the regression model within an acceptable tolerance.

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Not covered in this manual

  • Seasonality
  • feature selection
  • Over fitting, cross validation.

Import some packages and data

Import pandas as PD # for data analysis, especially time series
Import numpy as NP # matrix and linear algebra, similar to matlab
From # Matplotlib # import # pyplot as # plot

Scikit learn is one of the large machine learning packages in Python.

from sklearn import svm
from sklearn import cross_validation
from sklearn import preprocessing as pre

Random insertion here for better data visualization.

#Set color
graylight = '#d4d4d2'
gray = '#737373'
red = '#ff3700'

The data I use in this model is obtained from the smart meter installed in the apartment.

The “usage” field gives the number of kWh consumed during that hour.

elec.head(3)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Out[5]:

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Weather data extraction.

weather.head()

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

 Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Pretreatment

Combined power and weather

First, we need to combine power data and weather data into one data frame and remove irrelevant information.

#Merge into one pandas data framework
 pd.merge(weather, elec,True, True)

#Remove unnecessary fields from the data frame
del elec\['tempm'\], elec\['cost'\]

#Convert wind speed to units
 elec\['wspdm'\] * 0.62

elec.head()

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

 Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

fig = plt.figure(figsize=\[14,8\])

elecweather\['USAGE'\].plot

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

I want to distinguish typical working days from weekends, holidays and working from home. So now all normal working days are 0, and all holidays, weekends and home working days are 1.

Categorical variables: weekdays and weekends / holidays / home working days

##Set weekends and holidays to 1, otherwise 0
elecwea\['Day'\] = np.zeros

#Weekend
elecwea\['Atypical_Day'\]\[(elecwea.index.dawe==5)|(elecwea.index.dawe==6)\] = 1

#Holidays, working days at home
Holiday = \ ['2014-01-01', '2014-01-20' \]
workhome = \['2014-01-21','2014-02-13','2014-03-03','2014-04-04'\]

for i in range(len(holiday)):
    elecwea\['Day'\]\[elecwea.index.date==np.datetime64(holidays\[i\])\] = 1
for i in range(len(workhome)):
    elecwea\['Day'\]\[elecwea.index.date==np.datetime64(workhome\[i\]) \] = 1
 
elecwea.head(3)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

More categorical variables: day of the week, hour

In this case, each hour of the day is a categorical variable, not a continuous variable. When analyzing, you need to make a “yes” or “no” correspondence for each hour of the day.

#Create a new column for each hour of the day, if index If hour is the hour corresponding to this column, 1 will be allocated; otherwise, 0 will be allocated

for i in range(0,24):
    elecweat\[i\] = np.zeros(len(elecweat\['USAGE'))
    elecweat\[i\]\[elecweat.index.hour==i\] = 1
    
#Example: 3am
elecweat\[3\]\[:6\]

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Time series: the historical window of previous power demand needs to be attached

Since this is a time series, if we want to predict the energy consumption of the next hour, any given x vector / y target pair in the training data should provide the power consumption (y value, or target) of the current hour and the weather data and consumption (x vector) of the previous hour (or how many hours in the past).

#Add historical usage to each x vector

#Set forecast lead hours
hours = 1

#Set historical usage hours
hourswin = 12


for k in range(hours,hours+hourswin):
    
    elec\_weat\['USAGE-%i'% k\] = np.zero(len(elec\_weat\['USAGE'\])

    
    
for i in range(hours+hourswi,len(elecweat\['USAGE'\]))。)
    
    for j in range(hours,hours+hourswin):
        
        elec\_weat\['USAGE-%i'% j\]\[i\] = elec\_weat\['USAGE\]i-j\] 。

        
elec_weat.head(3)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

It is divided into training period and testing period

Because this is time series data, it is more meaningful to define the training period and testing period, rather than random sporadic data points. If it is not a time series, we can choose a random sample to separate a test set.

#Define training and testing periods
train_ Start = '18-jan-2014' (start of training).
train_end = '24-march-2014'.
test_ Start = '25-march-2014' (test start).
test_end = '31-march-2014'。
#It is divided into training set and test set (still in pandas data frame).

xtrain = elec\_and\_weather\[train\_start:train\_end\]。
del xtrain\['US'\]
del xtrain\['time_end'\]


ytrain = elec\_and\_weather\['US'\]\[train\_start:train\_end\] 。

Output the training set into CSV to see more clearly.

X\_train\_df.to\_csv('training\_set.csv')

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

The scikit learn package receives numpy arrays instead of pandas dataframes, so we need to convert them.

#Numpy array for sklearn

X\_train = np.array(X\_train_df)

Standardized variable

All variables need to be standardized. The algorithm does not know what the scale of each variable is. In other words, the value of 73 in the column of temperature seems to have an advantage over 0.3 of the kWh used in the previous hour, because the actual value is so different. The standardscaler () in the preprocessing module of sklearn removes the average value of each variable and normalizes it to unit variance. When the model is trained on proportional data, the model will determine which variables are more influential, rather than any scale / order of magnitude to determine this influence in advance.

Training SVR model

Fit the model to the training data!

SVR\_model = svm.SVR(kernel='rbf',C=100,gamma=.001).fit(X\_train\_scaled,y\_train)
print 'Testing R^2 =', round(SVR\_model.score(X\_test\_scaled,y\_test),3)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Prediction and testing

Calculate the forecast for the next hour (forecast!) We have reserved a test data set, so we will use all input variables (appropriate scaling) to predict the “Y” target value (utilization in the next hour).

#The SVR model is used to calculate the predicted usage in the next hour
 SVRpredict(X\_test\_scaled)

#Put it in the pandas data framework for ease of use
DataFrame(predict_y)

Draw the time series of actual and predicted power demand during the test.

#Draw predicted and actual values

plt.plot(index,y\_test\_df,color='k')
plt.plot(predictindex,predict_y)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

The result of resampling is kwh per day

###Plot the total kwh per day during the test


y\_test\_barplot
ax. set_ Ylabel ('total daily consumption (KWH))

#The bar chart of pandas / Matplotlib converts the x-axis to floating point, so it is necessary to retrieve the data time
ax.set_xticklabels(\[dt.strftime('%b %d') for dt in

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Error measurement

Here are some accuracy measurements.

len(y\_test\_df)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Root mean square error

This is actually the standard error of the model, and its unit is the same as that of the predicted variable (or kwh here).

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

calcRMSE(predict\_y, y\_test_df)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Average absolute percentage error

Using this method, calculate the absolute percentage error between each predicted value and the actual value, and take its average value; The unit of measure is percentage. If you don’t take the absolute value and there is no deviation in the model, you will eventually get a result close to zero, and this method is worthless.

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

errorsMAPE(predict\_y, y\_test_df)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Average bias error

The mean deviation error shows the overestimation or underestimation of the model. The average deviation error of the initial SVM model is -0.02, which indicates that the model does not systematically overestimate or underestimate the kWh consumption per hour.

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

calcMBE(predict\_y, y\_test_df)

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Coefficient of variation

This is similar to RMSE, except that it is normalized to the average. It shows how much change is relative to the average.

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

This is similar to RMSE, except that it is normalized to the average. It shows how much change is relative to the average.

plot45 = plt.plot(\[0,2\],\[0,2\],'k')

Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption


Python uses support vector machine regression (SVR) model to analyze power consumption and predict power consumption

Most popular insights

1.Using LSTM and python to predict time series in Python

2.Using long-term and short-term memory model LSTM to predict and analyze time series in Python

3.Time series (ARIMA, exponential smoothing) analysis using R language

4.R language multivariate copula GARCH model time series prediction

5.R language copulas and financial time series cases

6.Using R language random fluctuation model SV to deal with random fluctuations in time series

7.Tar threshold autoregressive model of R language time series

8.R language K-shape time series clustering method for stock price time series clustering

9.Time series prediction with ARIMA model in Python 3