The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

Confusion: SQL Tableau Problem 1

Confusion: SQL Tableau Problem 1

1
Vote
2
Answer

Hi all,
I am working on SQL Tableau Problem 1, and found a potential issue in the solution provided in the course. I am a little confused so would like to seek help here.
The task is “Create a visualization that provides a breakdown between the male and female employees working in the company each year, starting from 1990.”
In the solution, YEAR(d.from_date) in “dept_emp” table is used as “calendar_year” to calculate the number of employees per year. Later in Tableau, it is acknowledged that this number only accounts for the employees joining company in that year, but not ones (continuously) hired in that year. Quick table calculation is used to generate the cumulative frequency.
Looking into dept_emp table, since one employee may have multiple records, e.g. emp_no 10029 who were in dept_no d004 from 1991-1999 and later in d006 from 1999 to 9999, I feel that this solution cannot address:

  1. Employees who transferred department would have multiple records in dept_emp. They not necessarily joined the company in the YEAR(from_date) on a certain record, and could be double counted.
  2. Employees who had left the company, e.g. 10025, might still be counted as hired in following years.

I am not sure if it is because I omitted something in the lecture. Appreciate it if anyone could help. Thanks!

2 Answers

365 Team

Hi Ji Hu!
Thanks for reaching out.
There’s nothing you have omitted and you are right in your intuition. Actually, thank you very much for pointing this out. We were thinking a lot about how to start the SQL-Tableau challenges and finally opted for a task and solution that are not inherently incorrect but are in a sense more loose in the understanding of the problem. 
It is actually the ‘active’ column in Task 2 that will, I think, answer your question. However, we thought that it would be too much to do in terms of steps to do in SQL as a first task, so we left that for Task 2.
Thank you for the understanding in advance and, once again, thank you for pointing this out. 
Please feel free to write back if needed. Currently, I will mark this lesson for a revision. 
Best,
Martin

Thanks a lot for your reply, Martin! I very much enjoyed the course 🙂

5 months

Thank you very much for the kind words and the understanding! I hope you will enjoy the course until its very end and please don’t hesitate to post another question should you need further assistance. Thank you.

5 months

Super Learner
0
Votes

Hello
 
I started to work on a posible solution for the issue you mention and I came up with the following that might solve the point 1 below the text.
I belive, the point 2 too, but haven’t analzyed that in detail.As for every new entry in the table (counted as +1 hire) we have a new updated to_date (counted as +1 termination)
Indeed, that was too complex as a first exercise and I came back to it after completing the course just for “Intelectual” curiosity 🙂 Nothing to criticize to the instructors.  Maybe someone can help me to refine the code.
Thank you in advance

Start your code here

# Hires - Terminations
SELECT
hire.calendar_year,
hire.gender,
num_hires,
num_terminations,
(num_hires - num_terminations) as Variance
FROM
(
SELECT
Year(t.from_date) as calendar_year,
e.gender,
count(e.emp_no) as num_hires
FROM
t_employees e
JOIN
t_dept_emp t
ON e.emp_no = t.emp_No
GROUP BY calendar_year, gender
HAVING calendar_year >= '1990'
) hire
CROSS JOIN
(
SELECT
Year(t.to_date) as calendar_year,
e.gender,
count(e.emp_no) as num_terminations
FROM
t_employees e
JOIN
t_dept_emp t
ON e.emp_no = t.emp_No
GROUP BY calendar_year, gender
HAVING calendar_year >= '1990'
AND calendar_year < Year(current_date())
) term
ON hire.calendar_year = term.calendar_year
AND hire.gender = term.gender
;