Last answered:

22 May 2024

Posted on:

25 Feb 2023

1

Resolved: Another Prospective for the Solution

I am quite confused regarding the solution as the given solution is just checking the year when an employee starteed working 

However, maybe there are some employees that are employed before 1990 but still working in the company by that year and even after

and this can be applied to all years

Also there is nothing mentioned regarding when the employee has left

I had a solution trying to consider those factors, it seems quite complicated and do not know if I am right or not

SELECT 
    cal_year,
    gender,
    SUM(CASE
        WHEN cal_year BETWEEN YEAR(hire_date) AND YEAR(end_date) THEN 1
        ELSE 0
    END) AS no_of_emp
FROM
    (SELECT 
        e.emp_no, e.gender, e.hire_date, MAX(d.to_date) AS end_date
    FROM
        employees e
    JOIN dept_emp d ON e.emp_no = d.emp_no
    GROUP BY e.emp_no) a
        CROSS JOIN
    (SELECT DISTINCT
        YEAR(hire_date) AS cal_year
    FROM
        employees
    WHERE
        YEAR(hire_date) >= 1990) b
GROUP BY cal_year , gender
ORDER BY cal_year;
2 answers ( 1 marked as helpful)
Posted on:

13 Jan 2024

0

Thank you for raising this, I am confused as well.I hope 365datascience repond to this.

Instructor
Posted on:

22 May 2024

0

Hi Eslam and Temitope!

Thanks for reaching out!


Please accept my apologies for the delayed response.

In your query, the condition WHERE YEAR(hire_date) >= 1990 is applied only in the subquery b, which generates distinct calendar years. However, this condition isn't applied to the main query, which performs a cross join between the generated calendar years and the result set from the subquery a. This means that for each calendar year in the result set, the query will attempt to calculate the number of employees for all years regardless of the condition.

In contrast, the solution query directly filters the data based on the condition HAVING calendar_year >= 1990, ensuring that only years from 1990 and onwards are included in the result set. This results in fewer rows returned compared to the first query, where data for all years may be included.


Hope this helps.
Best,
Martin

Submit an answer