Answering data science interview questions: one of the trickiest obstacles you will have to overcome in your quest to become a data scientist!
It might calm your nerves to know that almost every job seeker struggles. That’s because data science interview questions cover a bunch of different topics (data science is an interdisciplinary field, after all) and those cheeky interviewers love to throw you the odd curveball.
The first step to hitting those curveballs out of the park is to see them coming, and to see them coming you’ve got to be confident about the rest of your game.
So, you must do your homework! An interviewer can spot someone who hasn’t from a mile away, but you wouldn’t be here if you didn’t know that already though, would you?
There are plenty of articles out there that will give you all the example answers you could hope for and yes, technical questions will come up (so, it’s worth brushing up on the details). But to remember one hundred-odd different examples would only serve to confuse you more, plus what if a question comes up you didn’t study for?
We want to take you through the interview typology. Show you what data science interview questions are made of and what the interviewers are looking for. Each section will include tips and strategies on how to best approach a question in a logical way, and what do data scientists love? That’s right…
The money!… O.K, the logic!
Data scientists go crazy over logic, so it makes sense for you to comprehend the underlying principles, rather than to repeat someone else’s words.
This is not only how you understand the game, but how you win it!
The following are the types of question we will cover, you can read them in whichever order you like, as long as it’s logical!
- Technical questions
- Practical experience questions
- Behavioural questions
- Scenarios (A.K.A case study questions)
1. Technical questions
A strong grasp of mathematics, statistics, coding, and machine learning is a must for a data scientist. You are likely to be asked to demonstrate your hands-on technical skills but prepare to show off your theoretical techniques, too!
You must possess a considerable amount of knowledge, as your interviewer will want to measure that knowledge. They will also want to know how well you can articulate complex concepts – Something any data scientist should be comfortable doing.
Mathematics underpins the study of machine learning, statistics, algorithms, and computer architecture, among others. So, applied maths is at the heart of the matter. Showing a good grasp of mathematics signals to the interviewer that you could quickly adapt to those other fields.
Be prepared to answer some quick (mental) maths questions, such as:
- What is the sum of numbers from 1 to 100?
- A snail falls down a well 50ft deep. Each day it climbs up 3ft, and each night slides down 1ft. How many days does it take him to get out?
- You have a 10x10x10 cube, made of one thousand 1x1x1 cubes. If you remove the outer layer of this structure, how many cubes will you have left?
Questions like these are to check you have basic maths skills and shouldn’t be too tricky for you.
Things become a little more interesting when encountering puzzle questions. Employers use them to test your lateral thinking. Use them as an opportunity to show off your problem-solving skills. Think outside the box. Be sure to vocalise your solution. This will give the interviewer an idea of how you go about solving a problem, even if you don’t come to the right solution (or maybe you found an even better solution and the interviewer won’t believe you unless you tell them how you came to it).
Here are some real-life data science interview questions:
- A race track has 5 lanes. There are 25 horses and one would like to find out the 3 fastest horses of those 25. What is the minimum number of races one would need to conduct to determine the 3 fastest horses?
- Four people need to cross a rickety bridge at night. Unfortunately, they have a single torch and the bridge is too dangerous to cross without one. The bridge is only strong enough to support two people at a time. Not all people take the same time to cross the bridge. Times for each person: 1 min, 2 mins, 7 mins and 10 mins. What is the shortest time needed for all four of them to cross the bridge?1
Finally, there are those hard maths problems.
It is unlikely that you’ll be given an equation to solve, rather you’ll be asked a simply worded question which requires conceptual preparation to answer. Furthermore, it may intertwine with probability theory, even if it seems it doesn’t.
Some examples are:
- Consider an extension of rock, paper, scissors where there are N options instead of 3 options. For what values of N is it possible to construct a fair game, whereby ‘fair’ we mean that for any move that a player plays there are an equal number of moves that beat it or lose to it?
- In a country in which people only want boys, every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?
- A fair coin is tossed at each stage. The player wins the game when tails appears. The payoff depends on the number of heads that appeared prior to the tails. If the game ends at the first stage, the payoff is 2 dollars. If it ends on the second stage, it is 4 dollars. On the third – 8 dollars, and so on. At each stage the payoff for winning doubles. What would be a fair price to pay a casino for entering the game?
Finally, don’t get surprised if they ask you to solve some problems along the lines of:
- What is the first derivative of xx
- Why is irrational
While not being able to answer such questions is not a deal-breaker solving them will help you stand out from the crowd. These problems are given to check if you are interested in mathematics, showing enthusiasm and some logical thinking could be more impressive than finding the solution.
Did you know, data Scientists were once called statisticians? The two professions aren’t one and the same, but many data scientists have finished a statistics degree. And that’s no wonder! Statistics is one of the ‘founding fathers’ of data science. Logically, you will be tested on your ability to reason statistically. Even if theoretical knowledge isn’t your strongest suit, you need to use precise technical language.
Consider the following question: What is the difference between false positive and false negative?
It seems that you need to provide some textbook definitions…
Got you! Nobody wants to hear generic theory; it’s boring and you will blend in with the crowd.
Employers will want you to identify situations where you can implement the theory.
If there is a whiteboard, use it! Draw a confusion matrix! Go through the theory and show how it applies!
While still talking statistics, what are some other questions that may pop up?
- What is the null hypothesis and how do we state it?
- How would you explain a linear regression to a business executive?
- Tell me what heteroskedasticity is and how to solve it.
- What’s the Central Limit Theorem and what are its practical implications?
- How do you find the correlation between a categorical variable and a continuous variable?
- Explain p-value. Present it as if talking to a client.
- What do you understand by statistical power and how do you calculate it?
- Please explain the differences between overfitting and underfitting.
- Explain what cross-validation is. How and why is it used?
Did you think those last two are machine learning questions? Well spotted, now we see that ML overlaps with statistical concepts!
- Could you give examples of data that does not have a Gaussian distribution, nor log-normal?
- What is your favourite statistical software? State three positive and negative aspects of it.
- Explain bootstrapping as if you’re talking to a non-technical person.
- State some biases that you are likely to encounter when cleaning a database.
We have stepped away from dull statistics and taken a lunge forward to… practical data science.
Every data scientist needs a certain amount of programming knowledge. You don’t have to be a pro, but employers will want to see that you have a decent grip on it and have the potential for rapid improvement.
Python, R, and SQL are the bread-and-butter programming languages in data science. Questions about these three staples should not come as surprise.
R and Python are interchangeable, so knowing one or the other will usually suffice (but knowing both won’t be a disadvantage).
‘Can you be more specific?’
- How are missing values and impossible values represented in R?
- What is the difference between lapply and sapply?
- How do you merge two data frames in R?
- What is the command used to store R objects in a file?
- How can you split a continuous variable into different groups/ranks in R?
- Please explain three key differences between Python and R.
- Which Python library would you prefer to use for Data wrangling?
- How can you build a simple logistic regression in Python?
- What’s the shortest way open a text file in Python?
- Have you done web scraping in Python? How can you do that?
- Please explain what is a ‘pass’ in Python.
- Please explain how one can perform pattern matching in Python.
- You have duplicate values in a dataset for a variable in Python. How would you handle them?
- What tool would you use in Python to find bugs?
- What’s your preferred library for plotting in Python: Seaborn or Matplotlib?
- Explain the difference between INNER JOIN and OUTER JOIN.
- You have a table called with Cust_ID, Order_Date, Order_ID, Tran_Amt. How would you select the top 100 customers with the highest spend over a year-long period?
- If you were stuck on a desert island with a database that contained all the knowledge ever created, but you only had 10 SQL statements that you could ever use, what would they be?
- Describe the different parts of an SQL query.
- What is the difference between DELETE and TRUNCATE?
- What is the difference between UNION and UNION ALL?
- Write down a SQL script to return data from two tables.
- What is the difference between a WHERE statement and a HAVING clause?
- Tell me the difference between a primary key and a unique key.
- What is the difference between SQL, MySQL and SQL Server?
1.4 Machine Learning
A familiarity with machine learning methodologies is essential for every aspiring data scientist. You should be prepared to explain key concepts in a nutshell.
It’s quite possible that the interviewer will outline a prediction problem and ask you to come up with algorithms. With the algorithms, expect to touch upon commonly observed problems and their fixes.
Check out the following machine learning questions we’ve picked for you:
- What is the difference between supervised and unsupervised machine learning?
- Explain your favourite algorithm to me in less than a minute.
- How would you deal with an imbalanced dataset?
- How do you ensure you are not overfitting with a model?
- What approaches would you use to evaluate the prediction accuracy of a logistics regression model?
- Explain the steps needed for data cleaning and wrangling before applying machine learning algorithms
- How do you deal with sparse data?
- Could you explain the Bias-Variance trade-off?
Additionally, you may stumble upon way too specific or way too vague questions such as:
- Explain the difference between Gaussian Mixture Model and K-Means.
- Tell me about a machine learning project you admire.
Remember the whiteboard tip?
Make it your interview BFF! Trying to answer a machine learning question with only words would take at least 5 minutes. And that’s 5 minutes you could spend giving the interviewer examples of other amazing things you know. The interviewer will already know the concepts, so you can exemplify your answer with a drawing and a short explanation taking less than two minutes. Voila!
Technical questions are important, and a data scientist needs to know the answers and how to put them into practice.
There are countless data science questions and an interviewer is not going to waste time asking dozens of questions to gauge whether you are the candidate for them. Instead, why not ask you to give your experience.
These are practical experience questions, designed to shed light on your pace of work, experiences, and habits. To avoid having to sift through your back catalogue of experiences on the spot, have in mind a few experiences that are versatile – Ones that exemplify different skills based on the question.
Let’s give you taste of those:
- Summarize your experience.
- Tell me about your first data science pet project.
- How do you keep up with the news about politics, economics, and business? What about data science?
- So, Python is your preferred programming language. What experience do you have with R? Tell me what you have done with that.
Of course, you can get it vice-versa:
- So, R is your preferred programming language. What experience do you have with Python? Tell me what you have done with that.
- Do you have experience in Tableau?
- What kind of RDBMS software do you have experience with?
- Have you taken any online courses related to data science? If yes, how many did you complete with a certificate?
- What companies have you worked at? What was your role? Elaborate on the day-to-day activities you were asked to perform.
- Do you have a project portfolio? Maybe a GitHub or a Kaggle profile? What projects have you implemented? *They may pick the most interesting one to them* Let’s discuss this specific project in detail.
Like any other job interview, employers are interested in how you handle workplace situations, how you work in a team and whether you are a good fit for the company.
Behavioural questions can be asked indirectly, for example, the interviewer may pose broad questions about your motivation or the tasks you enjoy.
Certainly, there is not a right answer here. The intent is to judge your past responses as they can accurately predict future behaviour. Moreover, behavioural questions are also seeking to evaluate if you can communicate clearly and persuasively.
Let’s see an example: Describe a situation when you faced a conflict while working on a team project.
Instead of asking hypothetical questions (“How will you deal with…”), the interviewer is hoping to elicit a more meaningful response by pushing you to chat about a real-life past event. Don’t fall into the trap of just generalising your example, the interviewer will be looking for four things in your story:
- Situation: What was the context? (devote around 10% of the answer time)
- Task: What needed to be done? (devote around 10% of the answer time)
- Action: What did you do? (devote around 70% of the answer time)
- Results: What were the accomplishments? (devote around 10% of the answer time)
Also known as the STAR technique, these steps will help you present your answers in a clear and succinct fashion. Don’t get confused – they don’t want a rigid answer with each step accounted precisely but a story that delivers the technique in a flowing yet concise way.
Bear in mind that some behavioural questions are long-winded and sound vague, but the STAR approach comes in handy whenever you hear: “How did you deal with…” or “Describe a time when…”.
Here are some hot tips when answering behavioural questions.
Show some passion – enthusiasm about your past experiences shows you are a person who cares about their work, don’t make the mistake of thinking your employer will want to hear how much you hated your previous jobs. On that note though, try and be specific – don’t go off on a tangent about all the things you liked about the job, stay relevant, you don’t want to appear like you can’t focus on the point at hand. And lastly, if your story describes some conflict with another team member, end on a positive note – show you are not someone to hold a grudge
Dying for examples? Here you go:
- Please describe a data science project you worked on (Yes! It overlaps with the ‘practical experience category!)
- Tell me about a situation when you had to balance competing priorities.
- Describe a time when you managed to persuade someone to see things your way.
- How did you deal with a situation when you had to adapt to a difficult situation?
- Describe a time when you were bored at work. What did you do to motivate yourself?
- Select a product or app you really like and make a recommendation on how it could be improved.
- Describe a situation where you effectively worked under pressure.
- What have you liked and disliked about your previous position?
- Have you ever faced a problem you couldn’t solve?
- Describe a situation when you failed to meet a deadline.
- Our team is brand new and is under-financed. We have no standard procedures or training, and everything is ad-hoc. How would you go about this situation?
Curious to find out what are the 5 skills you need to match any data science job description? Then check out this article.
The purpose of scenarios is to test your experience in various data science fields. Case study questions will likely look for skills outside of the technical toolkit. For instance, they may be looking for logical reasoning or business understanding. It’s important for you to demonstrate structured thinking, reasoning, and problem-solving skills. After all, you can’t be a good data scientist if you cannot identify the underlying problems.
Let’s see how this works:
The sales department has increased the selling price of all items by 5%. There are 10 items, all with different price tags. Before the price increase, gross revenue was $500,000 with an average selling price of $1. After the price increase, gross revenue was $505,000, with an average selling price of $0.95. Why hasn’t the price increase had the desired impact of increasing revenue and average selling price?
This question requires thinking in a business context, so you need to come up with insights and clearly communicate them. Scenarios are a great opportunity for the employer to get a sense of how you tackle problems which will reflect your overall attitude towards work.
You can be also given market sizing questions, called guestimates by some, a term that sounds like you just need to take a stab in the dark, which is just not the case. While reaching a conclusion does require a degree of guesswork and estimation, the process of how you use them is difficult and requires rigid logic. There is not a single correct answer to questions like these and chances are that the interviewer doesn’t know the exact answer, either. Here is an example:
How many SUV’s in the parking lot downstairs? How many ping-pong balls can fit into this room?
You’ve likely come across questions like these. For market sizing problems, your result should be of the same order of magnitude as the actual number. In any case, don’t worry too much about the figure. In fact, the employer won’t be that interested in the result, more the path you took to provide a number. So, focus on the structure instead, and don’t forget to articulate the problem-solving process you undertake. The easiest way to approach the problem is to write down your structure and then speak out loud – don’t skimp on your reasoning.
Some questions don’t have exact answers.
In any case, you may want to practice on these real data science interview questions:
- If a product costs $4.00, with an $8.00 sunk cost, and we charge X amount of dollars along with a $10 annual fee, how many do we need to sell to break even, etc?
- The conversion rate for a specific chair is 0.5% for the first 50,000 shoppers that look at it. The price of the chair is $250. Our company makes 27% profit on the sale. The next 50,000 shoppers will get a 10% discount. What is the conversion rate we must achieve to receive the same profits as before?
- You get X amount of views on a website, Y amount of people click on the ad, then Z amount of people enter their names after, where X, Y and Z are given. How much does it cost to acquire a customer? What’s the conversion rate? Would it make sense to run the campaign comparing the value of customer acquisition to the revenue gained from conversion rate?
- How many mattresses are sold each year in the United States?
- What will be the size of 3D TV sets in India?
- How many data scientists are there in the USA?
Some questions seem odd, right? That’s normal. Don’t hesitate to ask clarifying questions to get to the point. Questions won’t make you look like you have gaps in your knowledge, but rather will show that you pay attention to detail.
For more on the business aspect of a data scientist’s job, read our article 5 Business Basics for Data Scientists.
An interview is a dialogue, not a written test!
Excellent, now you have read through the article, consider our typology as the starting point in your interview prep. However, we have only scratched the surface when it comes to examples of data science interview questions you may encounter. The industry is booming and as such, companies are constantly adapting their interview sessions (what may be a common question today may be one hardly asked in 2 years). This is especially true with start-ups that undergo constant changes. Data science interview questions vary in their peculiarities, but the types of questions remain the same, so having a base knowledge of these types with a good amount of preparation will allow you to logically tackle any question the interviewer has up her sleeve.