Strengthen learning and understanding
Reinforcement learning is the interaction between agents and the environment (exploration and trial and error). They perceive the environment through interactive information, so as to adjust their behavior and choose the best result.
Reinforcement learning focuses more on goal oriented learning from interaction.
[map the situation to the action to maximize the numerical reward signal. Popularly understood as a simple simulation of people’s learning process, it is equivalent to people doing many explorations and expressing the final labor results in the form of state value function and action state pair value function. Use the exploration results to select appropriate actions to complete their own tasks.]
Using revenue signals to formalize goals is one of the most significant goals of reinforcement learning. Revenue signals can only be used to convey what you want, not how to achieve your goals
What must be clear before solving the problem
- What is the problem to be studied and whether it involves interaction with the environment?
- Is it appropriate to use reinforcement learning to solve this problem? (essentially an optimization problem)
- What are the states of the agent, what actions are corresponding to each state, and whether the interaction law with the environment can be expressed explicitly?
- What is the purpose of interacting with the environment? What is the goal to achieve? How to set rewards for each status?
They correspond to the environmental state, the corresponding actions in the state, the relationship between actions and state transition, the setting of goals and the reward measurement.
The elements of reinforcement learning include strategy, reward signal, value function and environment model.
Have you idealized some situations in your current consideration? If not idealized, what method should be chosen to solve this problem?
[personal understanding: reinforcement learning is like exploring the environment by specifying rules. Try many times according to the rules and output the final convergence results to guide decision-making]