Rewards in reinforcement learning?
Hi, it wasn't clear for me the reinforcement learning, can you please expand on this? or give some examples?
Thanks!
1 answers ( 0 marked as helpful)
Hi Kristin,
Great to have you on the course and thanks for reaching out!
In supervised learning, we aim to minimize the objective function (often called loss function).
In reinforcement learning, we aim to maximize the objective function (often called reward function).
In fact, until recently many people were considering reinforcement learning as a type of supervised learning. However, with the newer advancements, the differences are big enough so we can divide them into two separate types.
***
If you minimize a loss function, especially if it is always positive (>=0), you are 100% sure that when this function reaches 0, you have the best model possible. Quite often, the loss is basically a function of the error. So when the error is 0, you have no error, so the perfect model. Of course, that never happens, but you know the final goal: have a loss (error) of 0.
In reinforcement learning, things are different. There is no correct solution, so there is no error. Take chess for instance. We don't know all possible moves in chess (the game is not deterministic), so we can't teach a model to make no error.
What we can do instead is to make the problem opposite -> if a move leads you to a better position, we increase the reward. If a move leads you to a worse position, we decrease the reward. However, there is no cap on the reward function. So we basically train the model to be as good as possible at chess, without knowing the 'perfect game' (we don't have enough computational power to know that yet).
Due to this, reinforcement learning is the preferred method for teaching a robot to walk (there is no 'correct' way to walk, there are just good/better ways to walk). We also use it when teaching self-driving cars how to drive (there are ways to drive good or bad, but there is no 'perfect' way, because each situation on the road... each second on the road is like no other situation ever).
Hope this helps!
Best,
The 365 Team
The 365 Team