Tag：gradient

Time：2021117
catalog 1、 Linear regression 2、 Mathematical principle of gradient descent method 3、 Optimization of gradient descent method 4、 Python implementation 1、 Linear regression for a detailed introduction of linear regression, please refer to my last blog postLinear regression: the realization of least square method. In “linear regression: the realization of the least square method”, […]

Time：2021116
catalog 1、 Logarithmic probability and logarithmic probability regression 2、 Sigmoid function 3、 Maximum likelihood method 4、 Gradient descent method 4、 Python implementation 1、 Logarithmic probability and logarithmic probability regression in logarithmic probability regression, we output the model of the sample\(y^*\)It is defined that the sample is a positive exampleprobability, will\(\frac{y^*}{1y^*}\)Defined asprobability（odds）The probability is the […]

Time：2021113
Parameter priority propagation method in distributed training of technical blog neural network Author: Ni Hao This paper is from the parallel & distributed learning section of the 2019 SysML conference. Data parallel training has been widely used in distributed computing of deep neural networks. However, the performance improvement of distributed computing is often limited by […]

Time：20201229
In this article, we will take a closer look at a gradient enhancement library called catboost. In gradient promotion, prediction is made by a group of weak learners. Unlike the random forests that create decision trees for each sample, trees are created one after another in gradient enhancement. The previous trees in the model do […]

Time：20201222
By ReNu Khandelwal What is neural machine translation？ Neural machine translation is a technology that translates one language into another. One example is the conversion of English to Hindi. Let’s think about it. If you’re in an Indian village, most of the people there don’t know English. You plan to communicate with the villagers effortlessly. […]

Time：20201216
Gradient centered GC makes the weight gradient zero mean, which can make the training of the network more stable, and can improve the generalization ability of the network. The algorithm is simple. The theoretical analysis of this paper is very sufficient, which can well explain the principle of GC Source: Xiaofei’s algorithm Engineering Notes official […]

Time：20201213
Recently, Tencent engineers successfully broke the world record of 128 card training Imagenet with 2 minutes and 31 seconds. It’s a full seven seconds faster than the previous record. “Our strength has not been fully developed. If we use roce, this score can be further improved to 2 minutes and 2 seconds,” said Tencent engineers […]

Time：20201127
[column 2] activation function (1) on activation function and its development Activation function is a very important part of neural networks. In the history of neural networks, various activation functions are also a research direction. In our study, we often don’t think about why we use this function and where they come from? Biological neural […]

Time：20201126
By ReNu Khandelwal Let’s start with the following questions Cyclic neural network can solve the problems of artificial neural network and convolution neural network. Where can I use RNN? What is RNN and how does it work? Anti gradient loss and gradient explosion challenging RNN How can LSTM and Gru address these challenges Suppose we’re […]

Time：20201122
1. Objectives Fit function $f (x) = 5.0x_ 1+4.0x_ 2+3.0x_ 3+3 $ 2. Theory It is similar to onedimensional linear regression. 3. Implementation 3.0 environment python == 3.6 torch == 1.4 3.1 necessary packages import torch import torch.nn as nn import torch.optim as optim import numpy as np 3.2 creating data and transforming forms # […]

Time：20201121
1. Objectives Fitting function $f (x) = 2x_ {1}^{3}+3x_ 2^2+4x_ 3+0.5 $ 2. Theory The principle is similar to onedimensional linear regression and multidimensional linear regression, but the frequency is higher. 3. Implementation 3.1 environment python == 3.6 torch == 1.4 3.2 construction data #This is the target weight and offset w = torch.FloatTensor([2.0, 3.0, […]

Time：20201115
In this paper, DBTD method is used to calculate the filtering threshold, and then the random pruning algorithm is combined to prune the eigenvalue gradient, which can reduce the calculation amount in the backhaul phase. The training on CPU and arm has 3.99 times and 5.92 times acceleration effect respectively Source: Xiaofei’s algorithm Engineering Notes […]