The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. If you have any questions, please contact us in time.
The following article comes from the data theory, author: wpc7113
Introduction to Python data analysis
Standardization: the distribution of characteristic data is adjusted to standard normal distribution, also known as Gaussian distribution, that is, the mean value of data is 0 and the variance is 1
The reason of standardization is that if the variance of some features is too large, it will dominate the objective function, so that the parameter estimator can not learn other features correctly.
The process of standardization consists of two steps: decentralizing the mean value (mean value becomes 0) and scaling the variance (variance becomes 1).
from sklearn import preprocessing from sklearn.datasets import load_iris iris = load_iris() X, y = iris.data, iris.target Standard transformation scaler = preprocessing.StandardScaler().fit(X) x_scaler=scaler.transform(X)
2. Min max normalization
Min max normalization transforms the original data linearly to [0,1] interval (it can also be other intervals with fixed minimum and maximum values)
min_max_scaler = preprocessing.MinMaxScaler() x_train_minmax = min_max_scaler.fit_transform(X)
max_abs_scaler = preprocessing.MaxAbsScaler() x_train_maxabs = max_abs_scaler.fit_transform(X)
4. Robustscaler: standardization of data with outlier
transformer = preprocessing.RobustScaler().fit(X) x_robust_scaler=transformer.transform(X)
5. Quantiletransformer quantile transformation
quantile_transformer = preprocessing.QuantileTransformer(random_state=0) X_train_trans = quantile_transformer.fit_transform(X)
Box Cox transformation is a generalized power transformation method proposed by box and Cox in 1964. It is a data transformation commonly used in statistical modeling. It is used when continuous response variables do not satisfy normal distribution. After box Cox transform, the unobservable error and the correlation of prediction variables can be reduced to a certain extent. The main feature of box Cox transform is to introduce a parameter, estimate the parameter through the data itself, and then determine the data transformation form. Box Cox transform can obviously improve the normality, symmetry and variance equality of data, and is effective for many practical data. The changes are as follows:
pt = preprocessing.PowerTransformer(method='box-cox', standardize=False) pt.fit_transform(X)
Normalization is to map the values of different variation ranges to the same fixed range. The common one is [0,1], which is also called normalization.
X_normalized = preprocessing.normalize(X, norm='l2')
8. Hot coding
enc = preprocessing.OneHotEncoder(categories='auto') enc.fit(y.reshape(-1,1)) y_one_hot=enc.transform(y.reshape(-1,1)) y_one_hot.toarray()
binarizer = preprocessing.Binarizer(threshold=1.1) binarizer.fit(X) binarizer.transform(X)
10. Polynomial transformation
poly =preprocessing.PolynomialFeatures(2) poly.fit_transform(X)
11. Custom transformation
transformer = preprocessing.FunctionTransformer(np.log1p, validate=True) transformer.fit(X) log1p_x=transformer.transform(X)