Because I personally don’t like listening to one teacher repeat the same thing, but like listening to multiple teachers talk about the same thing, so I sorted out this list. I think it’s better than my own learning order!
Attention essence or weight?
- (optional) pre knowledge: word embedding and representation (article): no contact with NLP at all. I think it’s really clear
- (optional) understand the basic structure of transformer (video): looking at this cover, I really didn’t expect it to be so good. Stereotype is harmful. I talked about [why] and [Core Architecture] in a very intuitive way
- (suggestion) Mr. Li Mu explains the thesis paragraph by paragraph (video): Mr. Li Mu‘s explanation of the paper is absolutely very detailed and clear, which adds a lot of knowledge and speaks very well; Teacher Li Hongyi is said to be more straightforward, but I’m really lazy for two and a half hours?
- (recommended) hand push attention (video): I’ve always only paid tribute to the person who pushed the formula, and this one really made it clear. It’s equivalent to going through it again. You can know what the specific performance of each part should be before looking at the code
- (optional but more recommended) see transformer again: the explanation content is not much different from the video of 2, and a lot of legends are added; The main reason is that many questions in the comment area are common and very exciting
- Code implementation 1: break down the key structures one by one and write them
- Code implementation 2: but the one above needs to be assembled by itself Just copy and paste this. It feels great