The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

Rewards in reinforcement learning?

Rewards in reinforcement learning?


Hi, it wasn’t clear for me the reinforcement learning, can you please expand on this? or give some examples?

1 Answer

365 Team

Hi Kristin,
Great to have you on the course and thanks for reaching out!
In supervised learning, we aim to minimize the objective function (often called loss function).
In reinforcement learning, we aim to maximize the objective function (often called reward function).
In fact, until recently many people were considering reinforcement learning as a type of supervised learning. However, with the newer advancements, the differences are big enough so we can divide them into two separate types.
If you minimize a loss function, especially if it is always positive (>=0), you are 100% sure that when this function reaches 0, you have the best model possible. Quite often, the loss is basically a function of the error. So when the error is 0, you have no error, so the perfect model. Of course, that never happens, but you know the final goal: have a loss (error) of 0.
In reinforcement learning, things are different. There is no correct solution, so there is no error. Take chess for instance. We don’t know all possible moves in chess (the game is not deterministic), so we can’t teach a model to make no error.
What we can do instead is to make the problem opposite -> if a move leads you to a better position, we increase the reward. If a move leads you to a worse position, we decrease the reward. However, there is no cap on the reward function. So we basically train the model to be as good as possible at chess, without knowing the ‘perfect game’ (we don’t have enough computational power to know that yet).
Due to this, reinforcement learning is the preferred method for teaching a robot to walk (there is no ‘correct’ way to walk, there are just good/better ways to walk). We also use it when teaching self-driving cars how to drive (there are ways to drive good or bad, but there is no ‘perfect’ way, because each situation on the road… each second on the road is like no other situation ever).
Hope this helps!
The 365 Team

Complete Data Science Education
Get 50% OFF