Last answered:

03 Dec 2023

Posted on:

18 Mar 2023

1

Resolved: Possible bug with latest contract filtering

Let's look at the dept_employee subquery from the answer query. I've changed it a bit, moving from_date condition to ON block

SELECT 
    de.emp_no, de.dept_no, de.from_date, de.to_date
FROM
    dept_emp de
        JOIN
    (SELECT 
        emp_no, MAX(from_date) AS from_date
    FROM
        dept_emp
    GROUP BY emp_no) de1 ON de.emp_no = de1.emp_no AND de.from_date = de1.from_date
WHERE
    de.to_date < '2002-01-01'
        AND de.from_date > '2000-01-01'
order by de.emp_no, de.dept_no;

Now let's move the WHERE block inside the subquery

SELECT 
    de.emp_no, de.dept_no, de.from_date, de.to_date
FROM
    dept_emp de
        JOIN
    (SELECT 
        emp_no, MAX(from_date) AS from_date
    FROM
        dept_emp
    WHERE
        from_date > '2000-01-01'
            AND to_date < '2002-01-01'
    GROUP BY emp_no) de1 ON de.emp_no = de1.emp_no AND de.from_date = de1.from_date
ORDER BY de.emp_no, de.dept_no;

As a result we're getting 676 and 806 rows respectively.

Why do the results differ?

To find out, let's look at the employee 13626. They are missing in the first result, while still being present in the second one.

Selecting from dept_emp gives us following results:

select * from dept_emp where emp_no = 13626; 


Technically they had a contract "they have signed during the suggested time period". If you interpret the quoted line as "a contract started during the period", it's the second one. But it's getting picked up by the inner query and discarded by the outer WHERE. So, they do not show up in the results.

If you interpret the quoted line as "a contract started and ended during the period" (which is more likely looking at the code) , we have the first contract. But it's getting discarded by the inner MAX.

I consider this a bug since the employee actually have a contract for both cases and should not be discarded


I think you can achieve same conclusions with salaries table:

9 answers ( 1 marked as helpful)
Instructor
Posted on:

08 Apr 2023

1

Hi Nick!


Thanks for reaching out.


What you are sharing is a great observation, thank you for doing that.

I am not sure we are talking about a bug but rather about a deeper understanding of how MySQL operates. Indeed, working with subqueries is, I think it is safe to say, a delicate matter, so where we place the conditions precisely alters the output. And since the code written abides by the MySQL syntax, Workbench will return a result in both cases so it is up to us, the user, to check the obtained data and ask the relative analytical and coding questions prior to concluding about the meaning of a certain output.

What happens "behind the scenes" is as you describe it, what I would add is to not forget about the order in which the SQL Optimizer will apply the tasks we've given it upon the execution of the given query. So, if a WHERE condition is left beyond the subquery, it will be applied prior to obtaining any particular dataset. I.e., referring to the first query, this will eliminate all rows from the dept_emp table beyond the provided condition even prior to joining the data from the subquery.

Finally, yes, the second query refers to the code we've provided namely because we don't need to eliminate rows by the use of the MAX function.


Hope this helps.
Best regards,
Martin

Posted on:

08 Apr 2023

1

Actually, the first query is a slightly modified version of the subquery from the solution.

 

I meant that the solution has a bug, not MySQL or MySQL Workbench.

The second query from the thread is mine.

Instructor
Posted on:

17 May 2023

0

Hi Nick!


Thanks for your reply.


Can you please clarify what you mean by a "bug" - did the previous explanation help in understanding how MySQL functions?

You have obtained different from-dates meaning the mechanic used by SQL to retrieve the desired data is different - something you've achieved by using the MAX() function as you've described.

Also, I do obtain 676 and 806 rows as answers of the first and second query you've shared, respectively. Do you mean that you only obtain 2 or 3 rows? If that's the case, then we might be talking about a bug. If not, do you think you may have loaded a different database or be working with a shortened version of it? Or have you set any limits to the output obtained?


Looking forward to your answer.
Kind regards,
Martin

Posted on:

18 May 2023

1

I do not consider it a MySQL bug. I don't think that the solution gives a right answer to the task. I've tried explaining it at the beggining.

Posted on:

12 Oct 2023

2

Such a great annalysis @Nick Konovalchuk. It helped me a lot to deepen my understanding! Thank you


Adding on to the analysis.

If the restrictive time period/dates are placed within the outer query, we are essentially saying that get the contracts that were initiated and terminated within that time period and no further contract was signed by that employee. i.e they left the company in that given timeframe


If the restrictive time period/dates are placed within the inner query, we are essentially saying that get the contracts that were initiated and terminated within that time period and there is a possibility that there are more contracts signed by the employee in the company. ie they are still an employee of the company


By Restrictive time period/dates i mean the where clause - WHERE from_date > '2000-01-01' AND to_date < '2002-01-01'

 



Posted on:

11 Nov 2023

1

Hi Nick Konovalchuk,


I got what you mean. To get meaningful answer, filter should be moved from outer query to inner query. I got 245 rows.


Thank you 

Instructor
Posted on:

26 Nov 2023

0

Hi everyone!


Please excuse us for the delayed reply. Technically, it is now that I see the follow-up of this thread. 

I will discuss this situation with our Team and get back to you as soon as possible.

Thank you for your patience.


Kind regards,
Martin

Posted on:

28 Nov 2023

1

This is a common problem with these assignments. It is not a bug in the solution's code, it is not a a bug in our code, it is a "bug" in the assignment details.  The interpretation to your request can be taken in two different ways.  Just as Tanika explained above.  These words should be more precise:

"Consider the employees' contracts that have been signed after the 1st of January 2000 and terminated before the 1st of January 2002"


You said to look only at the contracts, and then the most recent of these contacts as per the second bullet:

"The salary values of the latest contracts they have signed during the suggested time period"


Your solutions assumed the interpretation:

"If the restrictive time period/dates are placed within the outer query, we are essentially saying that get the contracts that were initiated and terminated within that time period and no further contract was signed by that employee. i.e they left the company in that given timeframe"


But since that was not specified, I have 806 records of employees that started and ended a contract in the given timeframe.  


We do not have a realtime/live person to get clarity in these matters as we would in a job, so it is imperative that you specify very precicely what you mean,  and then have testers (multiple) that did not write the assignment interpret the assignment and see if it aligns with what you thought was necessary.  I know that these sorts of things are hard, I am a former College Math instructor and have written many many on-line assignments that I needed to monitor for clarity.  My students had me almost immidiatly (I monitored the formum on my phone) and at most waited 12 hours for an answer to any  questions.  I keep getting responses long after I have left a section that has cause me confusion and frustration.  This takes away from the student experience and reduces your ratings.  Please consider my suggestion.  



Instructor
Posted on:

03 Dec 2023

0

Hi everyone!


Thank you very much for the detailed and explanatory comments to all of you. The slow management of the solution of this thread is entirely my fault. Apart from not understanding what you are pointing out from the first moment, there was the technical error to not follow up on the thread, hence the slow response time.

The task and solution have been corrected, it is a question of technical time for us to update the material in the course so that it is visible to all. 

Please excuse me for the inconvenience caused.


Back to the subject - indeed, the WHERE clause must be within the subquery in both the first and second part of the JOIN used in the solution of the given exercise (which is this one: https://learn.365datascience.com/courses/advanced-sql/sql-aggregate-functions-and-window-functions-part-ii/#assignment).


Hope this helps and please excuse me once again.

Kind regards,
Martin

Submit an answer