In the video the objective function is associated with the error function, which we want to minimize.
In the quiz it is said that the ojective function typically represents the cumulative reward, which we want to maximize. (In both cases we're talking about reinforcement learning, is it correct?)
I can see why and can relate to both, but from educational perspective it is very confusing to introduce to us the objective function, saying it needs to be minimized and in the quiz saying it needs to be maximized, WITHOUT mentioning why we have two different cases and when we need to differentiate. I miss here the awareness and love for the detail. I just started learning with 365ds and I really hope it's not the whole programm build like this.
I understand your confusion regarding the objective function in reinforcement learning and how it is presented in different contexts.
1. Objective Function as Error Function (Minimization): In some reinforcement learning scenarios, especially in the context of learning an optimal policy, the objective function can be conceptualized as an error or loss function. This is often the case when you're using techniques like Temporal Difference Learning or Q-Learning, where the goal is to minimize the difference (error) between the predicted rewards and the actual rewards received. Here, minimizing the error leads to a more accurate prediction of rewards, and hence, a more effective policy.
2. Objective Function as Cumulative Reward (Maximization): More commonly, in reinforcement learning, the objective function is defined as the cumulative reward that an agent is expected to gain over time. The goal here is to maximize this cumulative reward. This approach aligns with the core principle of reinforcement learning where an agent learns to take actions in an environment to maximize some notion of cumulative reward.
The key to understanding this lies in the context of the problem you are solving. If the focus is on accurately predicting future rewards (as in value-based methods), minimizing error becomes important. If the focus is on maximizing rewards through actions (as in policy-based methods), then maximizing the cumulative reward is the objective.