Resolved: qst 2 of the quiz
you said in the course that at the end of robot training the objective function will minimize because most of the shoot will be close to the target so , how the reward system is basicaly used for maximized the objective function .
isn't that the opposite of what you said
To clarify, in ML, we often refer to an "objective function" or "loss function" that we want to minimize. For example, in regression problems, we might want to minimize the mean squared error between our predictions and the actual outcomes.
In contrast, in reinforcement learning, we often talk about a "reward function" that we want to maximize. The reward function measures how well the agent (in your analogy, the robot) is doing at its task. Each time the agent takes an action, it receives some reward. The goal of the agent is to learn a policy—a mapping from states to actions—that maximizes the expected sum of these rewards.
So, it's not that these two ideas are opposite, but rather they are two sides of the same coin. Both involve optimizing some measure of performance. In one case, we're minimizing a measure of error, and in the other, we're maximizing a measure of reward. But the ultimate goal in both cases is to improve the performance of our model or agent.
In the context of the robot shooting arrows, the reward might be a high value for hitting close to the target and a low value for missing. During training, the robot adjusts its shooting strategy to maximize this reward. This is analogous to a machine learning model adjusting its parameters to minimize its loss function.
Hope this helps!