The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

SQL comparing exists and inner subqueries

SQL comparing exists and inner subqueries

0
Votes
1
Answer

Hi,
It’s said in a lecture that coding for inner queries is faster for smaller data set and exist is faster for bigger sets.
How is it so?
Also, how big is a ‘small’ data set and also ‘big’ data set? 
 

#
1 Answer

365 Team
0
Votes

Hi Josh!
Thanks for reaching out!
To be frank, this is a very delicate subject and we have to be careful when we generalise about using IN, EXISTS, or even JOIN in MySQL.
It is important to realise what the difference between the three tools is, so that you don’t get confused about which one to use in a given situation.
All we wanted to say with this statement is that due to their nature, IN could perform quicker in a relatively smaller datasets, because it will check all the results from the subquery before processing further. If the dataset was bigger, this would naturally take more time on many occasions.
When you are using EXISTS, instead, once the SQL Optimizer has found a row that matches your condition (i.e. a value the exists), the condition will be satisfied, and you will have managed to obtain a valid output from the EXISTS condition.
So, our idea was to provide this statement that you mention in the question in order to exemplify the difference between using IN and EXISTS.
When talking beyond that generalisation, in a nutshell it is best to measure the time with which a query with IN and a query with EXISTS would perform in a specific situation and opt for the more efficient one. There are too many things to take into consideration in order to always know which one to use, but as an idea, you can know when to try IN and when – EXISTS. What is most relevant is to understand how you can use these two operators correctly, so that you deliver the correct output first, and then think about performance.
Hope this helps.
Best,
Martin

×
Learn Data Science
this Summer!
Get 50% OFF