The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

How do I solve this SQL question?

How do I solve this SQL question?

0
Votes
1
Answer

SQL – Asked by Amazon

We’re given two tables. Table A has one million records with fields ID and AGE. Table B has 100 records with two fields as well, ID and SALARY.
Let’s say in Table B, the mean salary is 50K and the median salary is 100K. 

SELECT A.ID,A.AGE,B.SALARY
FROM A
LEFT JOIN B
ON A.ID = B.ID
WHERE B.SALARY > 50000

Given the query above gets run, about how many records would be returned?

a. 500K
b. 33
c. 50
d. 330K
e. 66
 
How would I know the exact number of the rows to be returned without working on the tables myself?
My rough estimations suggest that the number would be between 50 and 66. 
I can guess that the exact number is tied up with the provided mean and median. But it doesnt tell me anything apart from the fact that Table B is right-skewed. 
I’d appreciate a guide here. Thank you.
 

1 Answer

365 Team
0
Votes

Hi Steven!
Thanks for reaching out.
Normally, our policy reflects your intuition in a way – we wouldn’t want to answer questions that are not related to a database we know in detail, because we will be limited in our chances of providing a well explained answer. 
In this specific case, the guidance to provide is not much different from the idea you already have – it seems that you do have to take into consideration the mean and median values. And you are right in your intuition about Table B. 
The way in which your knowledge of SQL can help you understand this task better is to think of how many records there are in Table B (only 100), and what kind or JOIN has been used (a LEFT JOIN). 
Which is the table on the left – Table A, which contains 1 million rows, right?
With these facts in mind, you may try to guess whether to expect an output of thousands or units. And then, your statistical knowledge should help you in specifying the first two digits of the number you are looking for.
That said, let me repeat that we avoid answering questions that have not been provided by us and hope you can agree with us about this.
I still hope this helps somehow!
Best,
Martin