Last answered:

26 Jun 2020

Posted on:

11 Apr 2020

1

Confusion: SQL Tableau Problem 1

Hi all, I am working on SQL Tableau Problem 1, and found a potential issue in the solution provided in the course. I am a little confused so would like to seek help here. The task is "Create a visualization that provides a breakdown between the male and female employees working in the company each year, starting from 1990." In the solution, YEAR(d.from_date) in "dept_emp" table is used as "calendar_year" to calculate the number of employees per year. Later in Tableau, it is acknowledged that this number only accounts for the employees joining company in that year, but not ones (continuously) hired in that year. Quick table calculation is used to generate the cumulative frequency. Looking into dept_emp table, since one employee may have multiple records, e.g. emp_no 10029 who were in dept_no d004 from 1991-1999 and later in d006 from 1999 to 9999, I feel that this solution cannot address:
  1. Employees who transferred department would have multiple records in dept_emp. They not necessarily joined the company in the YEAR(from_date) on a certain record, and could be double counted.
  2. Employees who had left the company, e.g. 10025, might still be counted as hired in following years.
I am not sure if it is because I omitted something in the lecture. Appreciate it if anyone could help. Thanks!
2 answers ( 0 marked as helpful)
Instructor
Posted on:

14 Apr 2020

1
Hi Ji Hu! Thanks for reaching out. There's nothing you have omitted and you are right in your intuition. Actually, thank you very much for pointing this out. We were thinking a lot about how to start the SQL-Tableau challenges and finally opted for a task and solution that are not inherently incorrect but are in a sense more loose in the understanding of the problem.  It is actually the 'active' column in Task 2 that will, I think, answer your question. However, we thought that it would be too much to do in terms of steps to do in SQL as a first task, so we left that for Task 2. Thank you for the understanding in advance and, once again, thank you for pointing this out.  Please feel free to write back if needed. Currently, I will mark this lesson for a revision.  Best,
Martin
Posted on:

26 Jun 2020

0
Hello   I started to work on a posible solution for the issue you mention and I came up with the following that might solve the point 1 below the text. I belive, the point 2 too, but haven't analzyed that in detail.As for every new entry in the table (counted as +1 hire) we have a new updated to_date (counted as +1 termination) Indeed, that was too complex as a first exercise and I came back to it after completing the course just for "Intelectual" curiosity :) Nothing to criticize to the instructors.  Maybe someone can help me to refine the code. Thank you in advance
Start your code here

# Hires - Terminations
SELECT
hire.calendar_year,
hire.gender,
num_hires,
num_terminations,
(num_hires - num_terminations) as Variance
FROM
(
SELECT
Year(t.from_date) as calendar_year,
e.gender,
count(e.emp_no) as num_hires
FROM
t_employees e
JOIN
t_dept_emp t
ON e.emp_no = t.emp_No
GROUP BY calendar_year, gender
HAVING calendar_year >= '1990'
) hire
CROSS JOIN
(
SELECT
Year(t.to_date) as calendar_year,
e.gender,
count(e.emp_no) as num_terminations
FROM
t_employees e
JOIN
t_dept_emp t
ON e.emp_no = t.emp_No
GROUP BY calendar_year, gender
HAVING calendar_year >= '1990'
AND calendar_year < Year(current_date())
) term
ON hire.calendar_year = term.calendar_year
AND hire.gender = term.gender
;
 

Submit an answer