On the basic concepts of machine learning


Hypothetical function

hypothesis function
    The hypothesis function can be regarded as the initial model for the known data.

    In supervised learning, the hypothesis function used to fit the input sample is recorded as H θ (x).

Cost function / loss function / error function

cost function/loss function/error function
    Generally speaking, any function that can measure the difference between the predicted value and the real value of the model can be called the cost function.
 Name origin:
    The loss or error can be understood as the loss of accuracy or the error with the real value,
    Because the function we fit is not 100% accurate, but a rough one.
    In many machine learning scenarios, we can't do a 100% fit.
    So there's an error statement.

    In order to evaluate the quality of model fitting, loss function is usually used to measure the degree of fitting.
    The cost function is a function of the parameter theta.

    The cost function is the cost of solving the objective function,
    In other words, the cost function is a means to solve the objective function. The cost function has the cost of optimization.

    Cost function is the error description of the whole dataset,
    It is the cost (solution path) to estimate the data after selecting the parameters w and B,
    Cost is the average of all data errors.
 Training process:
    When we determine the model h, all we do is to train the parameters theta of the model.
    So when will the training of the model end?
    In this case, the cost function is also involved, because the cost function is used to measure the quality of the model,
    Our goal, of course, is to get the best model (that is, the model that best fits the training sample).
    When we get the minimum value of cost function J, we get the optimal parameter theta.

objective function

The functional relationship established by the objective function between the assumed function and the known data,


When new data is added, it may lead to changes in Max and min, which need to be redefined.


   x' = (x - X\_min) / (X\_max - X\_min)

Average normalization

   x' = (x - μ) / (MaxValue - MinValue)

Nonlinear normalization



Standard deviation standardization / zero mean standardization

x' = (x - μ)/σ


  x' = x - μ

Feature scaling

Feature scaling is used to standardize the range of data features.

x′\=x−x¯ / σ

Why feature scaling?

1. The path that can make the gradient fall is not so zigzag, (ellipse and circle of contour)

2. Make the range of each feature comparable

Which models must be normalized / standardized?

Activation function

Sigmoid function

Sigmoid functions are also called logistic functions.
    The value range is (0,1), which can map a real number to an interval of (0,1),
    It can be used for two categories.
    When x is 0, y is 0.5; the value range is between (0, 1)

On the basic concepts of machine learning

Softmax function

Also known as the normalized exponential function.

    The purpose is to show the results of multi classification in the form of probability.

    It is a generalization of two class function sigmoid in multi class.

    When k equals 2 in the formula, it degenerates into sigmoid function.

Why is softmax in this form:
    1) The probability of prediction is non negative;
    2) The sum of the probabilities of various prediction results is equal to 1.

How softmax converts multi class output into probability steps:
    1) Numerator: Map real output to zero to positive infinity through exponential function.

    2) Denominator: add all results and normalize them.

What is the role of normalization in Feature Engineering?

Feature scaling

Normalization, standardization and zero centered

Logistic regression (I)

One minute understanding of softmax function (super simple)