How to prepare for data science interview questions?
There are certain times in life when you’re put to the test – a point where you must channel all the hard work and preparation you’ve done into a decisive win. For the athlete, that’s the Olympic Games. For the aspiring movie star, it’s the audition for the Hollywood project co-starring Leonardo DiCaprio. And, for the data scientist, the moment of truth is the data science interview.
If you want to successfully land a job in data science, knowing your stuff and putting it in a neat package with an impressive CV, an outstanding portfolio, and a flashy resume will only get you halfway through the door.
What will open it is understanding the whole data science interview process and how to navigate it smoothly - from seeing that job posting to closing the deal with a welcome-to-the-team handshake.
And in this guide, we’re going to show you how to get there. Here’s what you’ll learn:
The Data Science Interview
Real Data Science Interview Questions and Answers
- General/common data science interview questions
- Data scientist interview questions
- Data analyst interview questions
- BI analyst interview questions
- Data engineer interview questions
- and data architect interview questions
Data science interview preparation: What else do you need to prepare for a data science interview?
- What’s the data science interview process like?
- What’s a Data Scientist Hiring Manager Looking for?
- How to build a high-quality data science project portfolio?
- Why use networking as a tool for a successful data science interview?
- How to answer tricky data science interview questions?
- How to answer the question “Do you have any questions”?
Practically, everything you need to know about all levels of preparation. And those are the insights that will ultimately help you get the job you want and you’re qualified for.
So let’s dive right in.
Real Data Science Interview Questions and Answers
Here’s our collection of straight-to-the-point data science questions paired with their answers.
We start with a few general data science interview questions. The rest of the technical and behavioral interview questions are categorized by data science career paths - data scientist, data analyst, BI analyst, data engineer, and data architect. To keep this article focused, we’re only showing 10 of each… If you want to explore all questions for a path, follow through to their respective articles.
Data Science Interview Questions
What General/ Common Data Science Interview Questions Can You Expect?
General data science interview questions include some statistics interview questions, computer science interview questions, Python interview questions, and SQL interview questions. Usually, the interviewers start with these to help you feel at ease and get ready to proceed with some more challenging ones. Here are 3 examples.
1. How do data scientists use statistics?
According to Iliya, co-founder of 365 Data Science, “An answer like ‘Data scientists use statistics in almost everything they do’ would be good enough for me if I was interviewing you. However, keep in mind that this is a very tricky question. Not because it is hard to answer - on the contrary. But sometimes the question is not asked for the answer itself, but rather for the way you structure your thought process and express an idea.
Therefore, assuming you got asked this question, you’d need to maintain your composure and structure a nice-sounding answer. One of the better ways to achieve that is to frame the question within a framework.
Here’s one possible answer. Certainly, you can work on creating a smaller version of it as you also don’t want to bore your interviewer.
‘If we think about data science as a field, we can identify several pillars it is built upon: Mathematics, Probability, Statistics, Economics, Programming, Data visualization, Machine learning and modeling in general, etc. Now, we could simplify this framework by ignoring Mathematics as a pillar, as it is the basis of every science. Then we could assume probability is an integral part of statistics and continue simplifying further until reaching three fairly independent fields: Statistics, Economics, and Programming. Programming is just a tool for materializing ideas into solutions. Economics, on the other hand, is more about the ‘business thinking’ about a problem. Therefore, all of a data scientist’s work boils down to statistics.
One could argue that Machine learning is a separate field, but it is actually an iterative, programmatically efficient application of statistics.
Models such as linear regression, logistic regression, decision trees, etc., are all developed by statisticians. Their predictions are nothing more than statistical inferences based on the original distributions of the data and making assumptions about the distribution of the future values.
Deep learning? Well, one of the most common methods for backpropagation is called: ‘Stochastic gradient descent’ and the word ‘stochastic’ is a probabilistic term, therefore, falling within the field of statistics.
Data visualizations also could fall under the umbrella of descriptive statistics. After all, a visualization usually aims to describe the distribution of a variable or the interconnection of several different variables.
One notable exception is data preprocessing. That’s an activity which is mainly related to programming and often does not require statistical knowledge. That’s why data engineers and data architects exist. They need not be proficient in statistics – that’s the data scientist’s job. Finally, there is an exception to the exception – statistical data preprocessing. Here we’ve got creation of dummy variables, feature scaling, regularization, and so on. While preprocessing tasks in their execution, they require solid statistical knowledge.’
And there you have it – the interview version of the answer ‘Data scientists use statistics in almost everything they do’.”
2. What’s the difference between SAS, R, And Python Programming?
SAS is one of the most popular analytics tools used by some of the biggest companies in the world. It has great statistical functions and graphical user interface. However, it is too pricey to be eagerly adopted by smaller enterprises or individuals.
R, on the other hand, is a robust tool for statistical computation, graphical representation, and reporting.
The best part about R is that it is an Open Source tool. As such, both academia and the research community use it generously and update it with the latest features for everybody to use.
In comparison, Python is a powerful open-source programming language. It’s intuitive to learn and works well with most other tools and technologies. Python has a myriad of libraries and community created modules. Its functions include statistical operation, model building and many more. The best characteristic of Python is that it is a general-purpose programming language so it is not limited in any way.
3. What is the difference between WHERE and HAVING clause in SQL?
Adding a WHERE clause to a query allows you to set a condition which you can use to specify what part of the data you want to retrieve from the database.
HAVING is a clause frequently implemented with GROUP BY because it refines the output from records that do not satisfy a certain condition.
HAVING needs to be inserted between the GROUP BY and ORDER BY clauses. In a way, HAVING is like WHERE but applied to the GROUP BY block.
On some occasions, an identical result could be obtained by implementing the same condition, either with the WHERE or with the HAVING clause.
The main distinction between the two clauses is that HAVING can be applied for subsets of aggregated groups, while in the WHERE block, this is forbidden. In other words, after HAVING, you can have a condition with an aggregate function, while WHERE cannot use aggregate functions within its conditions.
What Do Data Scientist Interview Questions Cover?
You certainly can’t go wrong by getting familiar with:
- Python programming interview questions;
- algorithm interview questions;
- statistician interview questions (including linear regression interview questions);
- R interview questions;
- Data scientist behavioral questions;
- SQL interview questions.
It’s really a seemingly endless list (which we’ll cover in detail in our follow-up articles). And that’s not surprising, as data scientists are often expected to be a jack-of-all-trades.
So, what data scientist interview questions should you practice?
Here are 37 real-life examples.
1. What is a Normal distribution?
A distribution is a function that shows the possible values for a variable and how often they occur.
To answer this question, you are likely to need to first define what a distribution is.
So, in statistics, when we use the term distribution, we usually mean a probability distribution. Here's one definition of the term:
A Normal distribution, also known as Gaussian distribution, or The Bell Curve, is probably the most common distribution. There are several important reasons:
It approximates a wide variety of random variables.
Distributions of sample means with large enough sample sizes could be approximated to Normal, following the Central Limit Theorem
All computable statistics are elegant (they really are!!!)
Decisions based on Normal distribution insights have a good track record.
What is very important is that the Normal distribution is symmetrical around its mean, with a concentration of observations around the mean. Moreover, its mean, median and mode are the same. Finally, you should get an extra point if you mention that 95% of the data points from a Normal distribution are located within 2 standard deviations from the mean, and 99.7% of the data points are located within 3 standard deviations from the mean.
Now, you may be also expected to give an example.
Since many biological phenomena are normally distributed it is going to be the easiest to turn to a biological example. Try to showcase all facts that you just mentioned about a Normal distribution.
Let focus on the height of people. You know a few people that are very short and a few people that are very tall. You also know a bit more people that are short but not too short, and approximately an equal amount that are tall, but not too tall. Most of your acquaintances, though have a very similar height, centered around the mean height of all the people in your area or country. There are some differences which are mainly geographical, but the overall pattern is such.
2. R has several packages for solving a particular problem. How do you decide which one is best to use?
R has extensive documentation online. There is usually a comprehensive guide for the use of popular packages in R, including the analysis of concrete data sets. These can be useful to find out which approach is best suited to solve the problem at hand.
Just like with any other script language, it is the responsibility of the data scientist to choose the best approach to solve the problem at hand. The choice usually depends on the problem itself or the specific nature of the data (i.e., size of the data set, the type of values and so on).
Something to consider is the tradeoff between how much work the package is saving you, and how much of the functionality you are sacrificing.
It bears also mentioning that because packages come with limitations, as well as benefits, if you are working in a team and sharing your code, it might be wise to assimilate to a shared package culture.
3. What are interpolation and extrapolation?
Sometimes you could be asked a question that contains mathematical terms. This shows you the importance of knowing mathematics when getting into data science. Now, interpolation and extrapolation are two very similar concepts. They both refer to predicting or determining new values based on some sample information.
There is one subtle difference, though.
Say the range of values we’ve got is in the interval (a, b). If the values we are predicting are inside the interval (a, b), we are talking about interpolation (inter = between). If the values we are predicting are outside the interval (a, b), we are talking about extrapolation (extra = outside).
Here’s one example.
Imagine you’ve got the number sequence: 2, 4, _, 8, 10, 12. What is the number in the blank spot? It is obviously 6. By solving this problem, you interpolated the value.
Now, with this knowledge, you know the sequence is 2, 4, 6, 8, 10, 12. What is the next value in line? 14, right? Well, we have extrapolated the next number in the sequence.
Finally, we must connect this question with data science a bit more. If they ask you this question, they are probably looking for you to elaborate on that.
Whenever we are doing predictive modeling you will be trying to predict values - that’s no surprise. Interpolated values are generally considered reliable, while extrapolated ones - less reliable or sometimes invalid. For instance, in the sequence from above: 2, 4, 6, 8, 10, 12, you may want to extrapolate a number before 2. Normally, you’d go for ‘0’. However, the natural domain of your problem may be positive numbers. In that case, 0 would be an inadmissible answer.
In fact, often we are faced with issues where extrapolation may not be permitted because the pattern doesn’t hold outside the observed range, or the domain of the event is … the observed domain. It is extremely rare to find cases where interpolation is problematic. Please bear in mind that last bit and don’t forget to mention it in the interview!
4. What is the difference between population and sample in data?
A population is the collection of all items of interest to our study and is usually denoted with an uppercase N. The numbers we’ve obtained when using a population are called parameters.
A sample is a subset of the population and is denoted with a lowercase n, and the numbers we’ve obtained when working with a sample are called statistics.
That’s more or less what you are expected to say.
Further, you can spend some time exploring the peculiarities of observing a population. Conversely, it is likely that you’ll be asked to dig deeper into why in statistics we work with samples and what types of samples are there.
In general, samples are much more efficient and much less expensive to work with. With the proper statistical tests, 30 sample observations may be enough for you to take a data driven decision.
Finally, samples have two properties: randomness and representativeness. A sample can be one of those, both, or neither. To conduct statistical tests, which results you can use later on, your sample needs to be both random and representative.
Consider this simplified situation.
Say you work in a firm with 4 departments: IT, Marketing, HR, and Sales. There are 1000 people in each department, so a total of 4000 people. You want to evaluate the general attitude towards a decision to move to a new office, which is much better on the inside, but is located on the other side of the city.
You decide you don't really want to ask 4000 people, but 100 is a nice sample. Now, we know that the 4 groups are exactly equal. So, we expect that in those 100 people, we would have 25 from each department.
- We pick 100 people (out of the 4000) at random and realize that we have 30 IT, 30 Marketing, 30 HR, and 10 from Sales. Obviously, the opinion of the Sales department is underrepresented. We have a sample, which is random but not representative.
- I've been working in this firm for quite a while now, so I have many friends all over it. I decide to ask the opinion of my friends from each department because I want them to feel comfortable in the workplace. I pick 25 people from each department. The sample is representative but is not random.
In the first case, we have underrepresented some group of people. In the second case, we've made a decision based on a specific circle of people and not the general 'public'.
If I want it to be random and representative, I will pick 25 people from IT at random, then 25 people from Marketing at random, same for HR and Sales. In this way, all groups will be represented, and the sample will be random.
You can decide to skip that detailed explanation, or better – ask them if they want you to dive deeper into the topic and then impress them with your detailed understanding!
5. What are the steps in making a decision tree?
First, a decision tree is a flow-chart diagram. It is extremely easy to read, understand and apply to many different problems. There are 4 steps that are important when building a decision tree.
- Start the tree. In other words, find the starting state – maybe a question or idea, depending on your context.
- Add branches. Once you have a question or an idea, it branches out into 1,2, or many different branches.
- Add the leaves. Each branch ends with a leaf. The leaf is the state which you will reach once you have followed a branch.
- Repeat 2 and 3. We then repeat steps 2 and 3, where the starting points are the leaves, until we finish-off the tree. In other words, every question and possible outcome should be included.
Depending on the context you may be expected to add additional steps like: complete the tree, terminate a branch, verify with your team, code it, deploy it, etc.
However, these 4 steps are the main ones in creating a decision tree. Whether to include these extra steps really depends on the position you are applying for.
If you are applying for some data science project management position, you may be expected to say: ‘Validate with all stakeholders to ensure the quality of the decision tree’.
If you are applying for a data scientist position, you may be expected to explain a bit more about the programming language and library you intend to use. This also includes the reason why you’d choose that library.
6. How is machine learning deployed in real-world scenarios?
This question is a bit tricky. Model deployment is a part of a data science job, but in fact, efficient model deployment is more often related to engineering, software development, cloud computing, etc. In other words, to make sure everything is right, you’d better turn to your IT department or hire a computer scientist in your team.
Now, there are 3 important steps:
- Once you train a model, you should save it, or better – store it in a file. There are different ways in which this could be achieved. The general ‘Pythonic’ ways are through pickle or joblib. However, libraries such as TensorFlow deal with much more complicated model objects and thus they offer ad-hoc functions for deployment. Often they look like this: .save(‘filename’).
This part of the process is always done by the data scientist, ML engineer, or whoever is in charge of the model training.
- Computing instance. AWS and Microsoft Azure offer computing instances, or cloud-based environments that can run the model you’ve just created. Surely, you can share the file with your colleagues through email or Messenger, but more often, there will be some cloud that handles the deployment. The computing instance should be set-up to communicate with all other systems that feed the inputs and/or require the outputs of the model.
- Job scheduler. Having a model and a place to run it, you can specify when and how to run it. That could be once a week, once per day, or every time an event occurs (e.g. a transaction, new user registration, etc.). At the desired time, new data would be taken, loaded, cleaned, preprocessed, fed to the model, etc. until you reach the desired outcome.
Having completed these 3 steps, you are practically done.
You will have a model, running on some cloud at prescheduled times. Once you’ve got an output, you can return it to a Python notebook, or better connect it to yet another system (that could be considered a part of 2.). Depending on your needs, it could be a web app (e.g. a recommender system gives information about a particular customer and shows them relevant results), or some kind of visualization software such as Tableau or PowerBI which would analyze your data in real time.
Needless to stress, 2. and 3. would rarely be a data scientist’s primary job. Still, in a smaller team, that may fall on them, too!
7. What is K-means clustering? How can you select K for K-means?
The main goal of clustering is to group individual observations so that the observations from one group are very similar to each other. In addition, we’d like them to be very different from the observations in other groups. There are two main types of clustering: flat and hierarchical. Hierarchical clustering is much more spectacular because of the dendrograms we can create, but flat clustering techniques are much more computationally efficient. Therefore, we usually opt for the latter.
K-means clustering is the most prominent example of flat clustering.
It consists of finding K clusters, given their mean distance from the centers of the clusters. K stands for the number of clusters we are trying to identify. This is a value, selected prior to the clustering.
Now, the optimal number of clusters is obviously what we are usually interested in.
There are several ways to approach that, but the most common one is called: ‘The Elbow Method’.
There, we solve the clustering problem with 1, 2, 3, 4, 5, 6 and so on number of clusters. We then plot them on a graph where on the x-axis we’ve got the number of clusters, while on the y-axis, the WCSS (within cluster sum of squares). The resulting image resembles a human elbow. The place where the kink is signifies the optimal clustering solution. And that’s how you choose the ‘K’ in K-means!
8. What are the disadvantages of a linear model?
This is one of the strangest questions you could be asked. It is like being asked: ‘what are the disadvantages of playing tennis barefoot?’ You don’t need shoes to play tennis, but it is much better if you do.
Now, the most common linear models are the linear regression model and linear time series model. Therefore, let’s answer the question in that context.
The single biggest advantage of a linear model is that it is simple. From there, there are mainly disadvantages and limitations.
Therefore, let’s focus on the top 3 cons of using a linear model.
-
Linear model implies linear relationships.
A linear model assumes that the independent variables explain the dependent one(s) in a linear way, e.g. a = bx + c. No powers, exponents, logarithms, etc. are allowed. Obviously, this is a great simplification – the real world is not linear. Using a linear model, would either disregard some patterns, or force us to execute complicated transformations to reach a linear representation. -
Data must be independent.
In the general case, that’s not always true, but in 95+% of the linear models conducted in practice – it is. Most linear models assume that the variables in the model are not collinear. Alternatively, we observe multicollinearity, or the math behind the model estimation ‘brakes’. Assuming that the variables are independent is obviously a very brave statement especially because we are limited to a linear relationship (if we had exponents and logarithms, the probability that they are collinear would drop dramatically). -
Outliers are a big, big issue.
Since linear models assume linearity, having values that are too big, or too small regarding any feature may be devastating for the model. All points are expected to be close to some line, which as you can imagine is rather unrealistic. To deal with that we often complicate the linear model in ways that practically make it behave like a non-linear one.
9. Describe a time when you were under pressure.
Do you know the saying “When the going gets tough, the tough get going?” Every Hiring Manager wants to make sure you can handle the pressure of the job. Are you someone who is likely to abandon the boat when things get a little tough? Every firm needs people that are reliable. All jobs involve a certain element of pressure; some more than others, obviously. Your task here is to give an example of a stressful situation and show how you coped with it.
Here’s an example of such a situation:
I was under significant pressure before taking my GMAT exam. I needed a really good grade in order to be admitted to the graduate school that I am now graduating from. A few weeks before the exam, I noticed that I was becoming nervous. Two things helped me handle the pressure much better; I started sleeping for at least 7 hours (going to bed earlier in the evening) and I dedicated at least one hour a day to sports activities. This had a hugely positive impact on my concentration and stress level.
10. How would you add value to our company?
Did you see “The Wolf of Wall Street”? Remember Jordan Belfort’s famous quote “sell me this pen”? The same principle applies to this question as well, although instead of selling a pen, you need to sell the idea of you landing the job. This is what the recruiter is asking you to do. You need to convince him/her that you will add value to the company. But, how are you going to be able to tell how you would add value to the company before having worked for the company?
Most candidates will start by listing their qualifications, work experience, personal traits, achievements, and they will be hoping to push the right button, somewhere along the way.
Similarly, when facing the “sell me this pen” task, most people start describing the pen’s attributes; it is a great pen, writes very well, it is very shiny and smooth, etc.
It is natural to focus on your qualities and qualifications when asked how are you going to add value to the company.
However, this is a trap.
Most people would do just that. They will explain that they are great and that they are qualified. But that fails to answer the question itself, right? How are you going to add value? Analogically, the person who is being sold a pen can ask “Why do I need this pen?” Instead of falling for this trap and responding like everybody else, you can instead show that you are different by using an alternative approach.
Turn this into a back and forth dialogue and figure out what value needs to be added to the team that you will be joining.
What does the company need? Are there any supplementary skills that are missing? Is there a particular area that they would like to reinforce? Learn more about the Interviewer’s take on the current situation and understand precisely what is expected from you. Don’t be shy to ask about the company’s mid-term strategy and the type of people that they will need in the future. Then you can nail the question by pointing out how your qualifications and motivation match with the needs that they have.
The whole dynamic of this type of question is driven by the fact that before you are able to sell a pen, you have to know more about the person who is going to buy it, what are his needs and what kind of pens is he usually writing with. Once you have positively identified a need, you can point out that your product is the right solution for that need.
MORE DATA SCIENTIST INTERVIEW QUESTIONS...
What Data Analyst Interview Questions You Should Prepare For?
By all means, you should be prepared to answer some Python interview questions. Naturally, interview questions for data analyst also include some other specific data analytics interview questions and data analysis interview questions, so make sure you pay attention to those, too. And don’t forget to practice some data analyst behavioral questions. Their increasing importance for interviewers and can actually tilt the scales of their final decision.
So, are you ready for some data analyst interview questions real-world examples?
Here are 32 data analyst interview questions and answers that will help you get on the fast track to your interview success.
1. Name a few libraries used in Python for data analysis.
These are the most important Python libraries you should mention.
Numpy is an essential library, as it used for matrices and arrays and includes methods for their manipulation.
Pandas is the second library, which is used in almost any data analysis performed in Python.
It includes data structures and operations for manipulating numerical tables and time series. It often uses numpy to produce linear math results and is, therefore, a lot faster than standard Python. So, a good knowledge of the Pandas library is a must if you’re a data analyst using Python.
Scipy and Scikit Learn –are two of the main machine learning libraries.
Sci-py boasts an impressive number of mathematical algorithms and high-level commands and classes to help data scientists in their data analysis tasks. Scikit learn was originally developed during a “Google Summer of Code” project, as a third party extension for Scipy. Scikit learn includes various classification, regression, and clustering algorithms, designed to be incorporated with the Scipy and Numpy packages.
And once you’re done with machine learning, you’ll also need a good way of visualizing the results. Matplotlib\ Seaborn are-visualization libraries, which are great for that.
Tensorflow, Keras and Pytorch are libraries for deep learning. If you want to train neural networks, for example in the context of NLPs or Computer Vision, these are the way to go. Here knowing the difference between Tensorflow 1 and Tensorflow 2 could be a bonus during an interview.
2. What is a Logistic regression?
A logistic regression is one of the simplest classification models. It is widely used mainly due to its simplicity and ease of interpretation. Logistic regressions are well understood and studied throughout the years and thus are still a data scientist’s preferred classification choice on many occasions.
A logistic regression could be used in 2 distinct ways that sound different yet are reached in the same way, methodologically speaking.
The first use case is whenever we’ve got a categorical outcome. Examples are: Yes/No, Will buy/Won’t buy, and 0/1 situations. As any other classification method, a logistic regression would output the category it deems most probable to be the answer.
Speaking of probabilities, we reach the second use case. We could employ a logistic regression to determine the exact probability that an event is going to occur.
The mechanics of the two use cases follow the same path.
For instance, imagine a logistic regression predicts that a customer is 70% likely to buy and 30% likely to not buy. Under these conditions, the prediction will be classified as ‘Will buy’. Depending on our needs we could use one the probabilistic representation or simply the output class.
Finally, it is useful to note that we were discussing a binary logistic regression.
Binary here stands for an outcome with only 2 possibilities. The logistic regression model could be generalized to many categories, in which case it would be called a multinomial logistic regression.
At this point, you may or may not decide to mention the multinomial logistic regression. In 99% of the cases where we use the term ‘logistic regression’, we mean binary logistic regression. Referring to the multinomial case could prompt the interviewer to ask you additional questions on multinomial logistic regression, which would definitely be much trickier for you, especially if you have never used it.
3. Have you worked with comparatively large data sets in a project? How did you collect and prepare the data for analysis?
How to Answer
Working with large data sets can be challenging. So, with this question, the hiring manager wants to assess your ability to deal with the issues that might occur. If you have relevant experience, talk about the problems you have faced and how you managed to resolve them. In case you’ve never experienced any issues working with large data sets, describe the details of the project and all the stages of preparing the data for analysis.
Answer Example
“In the last company I was in, I often worked with large data sets from external suppliers. For example, survey responses for Customer Analytics projects. And that means a large data set with huge sample size. So, to prepare the data for analysis, I’d go through the following steps. First, I’d run predetermined frequencies and queries to check the validity of the data. This helped me pin down various problems, such as missing data, problems with the data type, or skip-pattern errors in the survey data set. I’d check with the supplier, so we can implement the necessary corrections before we move forward with the analysis. Once done, I’d often consult with a Data Engineer to pick the most suitable analysis tool for a data set of this size. Finally, I’d load the data and start my analysis.”
4. Which tools have you used in each stage of your previous data analysis projects?
How to Answer
A data analyst must be experienced in using a wide range of tools in the various phases of their analyses – from preparation, through exploration, to presenting the end results. Hiring managers know that a single tool can be utilized in multiple stages of the analytical process. So, if that’s your experience, make sure you highlight it. This will demonstrate your expertise in working with that specific tool. However, if you have worked with multiple tools throughout your experience, share that, too. That’s how you’ll showcase the span of your skills.
Answer Example
“In my experience as a data analyst, I’ve used a variety of tools that have helped me build up a strong skillset. In the preparation and exploration stages, I’ve mostly used Microsoft Excel and Microsoft Access, depending on the complexity of the data set. While in the exploration phase, I’ve also used SAS and SPSS to extract insights from the data. Apart from these statistical programs, I’ve employed analytical tools, such as Tableau and Cognos Analytics. I find Tableau, together with Power BI to be great tools for creating powerful dashboard visualizations. And, of course, Excel and PowerPoint are classic tools for building in-company presentations.”
5. In large companies, data is often stored in multiple data warehouses. Have you ever worked on a complex analytical project, where you had to query multiple data warehouses in order to gather the required data?
How to Answer
The technical complexity of your work as a Data Analyst may vary depending on the size of the companies you have worked at in the past. Strong technical skills is an important attribute of a Data Analyst's background. Having experience retrieving data from multiple data warehouses demonstrates your understanding of databases, data structures, and programming languages.
The size of the companies you’ve worked for can affect the technical complexity of your tasks as a data analyst. That said, a strong technical skillset is always a plus in the eyes of your future employer. So, having retrieved data from multiple data warehouses in your work on past projects will showcase your expertise in databases and data structures, as well as in programming languages.
Answer Example
“I’ve had the chance to work for a big corporation in the past. I can say my work there has been of great importance to developing my technical skillset. Once, I queried against 5 different data warehouses to retrieve the data for a large-scale company project. Once I had all the necessary records and variables, I built a dataset I later utilized in my analysis.”
6. Tell us about a project where, due to data limitations, the stakeholders couldn’t reach the answer they needed. How did you resolve this issue?
How to Answer
The interviewer wants to be reassured that, as a data analyst, you can deal with all types of data challenges. That’s particularly important when collaborating with stakeholders who may lack an in-depth understanding of data. This question is also ideal for showcasing your problem-solving skills.
Answer Example
“A few years back, I worked on a customer segmentation project initiated by the company executives. Unfortunately, they couldn’t come up with a substantial customer segmentation plan, as the data in the customer data warehouse wasn’t robust enough. To help with the progress of the project, I worked closely with the data warehouse team. Our collaboration resulted in outlining data initiatives and actionable steps which ultimately led the project to its final goal.”
7. What web analytics tools have you used in your professional experience?
How to Answer
More and more data analyst job postings require web analytics experience (or list it as a preferred skill). And, while some companies separate the roles and their job descriptions, others prefer to hire a data analyst with an all-encompassing skillset. So, if you have relevant experience, it’s a good idea to mention the metrics you were tracking and the field of their application.”
Answer Example
"Using Google Analytics, I have used web analytics as part of a larger marketing campaign evaluation project. The web metrics I tracked included open rate, click-through rate, average time on page and conversion rate. In addition, I was able to build funnels within Google Analytics to measure where visitors were dropping off before converting. By tracking these web metrics in conjunction with non-web marketing efforts, I was able to recommend the best marketing channels to use to target specific segments."
“I have experience using Google Analytics for a Black Friday campaign evaluation project. For the purpose, I had to track the following metrics – open-rate, click-through rate, conversion rate, and average time on page. I also used Google Analytics to build funnels that measure at which part of their journey the visitors dropped off prior to converting. Tracking these web metrics helped me come up with recommendations about the best marketing channels for targeting specific audiences.”
8. Give me an example of a time when you worked as a team.
Coming together is a beginning. Keeping together is progress. Working together is success.
- Henry Ford
One of the greatest virtues in the modern corporate world is the ability to work well as a team. Make sure that you are ready with a story that shows you are able to do exactly that. A team worker can be distinguished by his/her ability to:
- Put the team’s needs first
- Communicate well with the other team members
- Want to succeed as a part of a group
- Listen actively
- Respect others
- Appreciate other work styles
Keep in mind these qualities when you think of a story when you were part of a team. The story should demonstrate not only the fact that you were part of the team, but also that you were a great one too.
Here’s an example of such a situation:
A group assignment during the last year of my studies required me and four of my classmates to perform a detailed Company Valuation.
This was a pretty difficult task that included a significant amount of work. The deadline for submitting the complete work was in 2 weeks. At the time, I was busy filling out internship applications and had to prepare for some of my other exams. This was the case for the other team members as well.
Nevertheless, all of us concentrated full-time on the project, as I understood that this was the only way we could have respected the tight deadline imposed. Another interesting thing about the project was that we managed to work well together, despite the different styles that each group member had. We listened actively and were open to the ideas that the others had. Given that we came from a different background, each of us certainly added value to the project. Good communication helped us coordinate our responsibilities and integrate the separate pieces of work that we were assigned individually.
9. Describe a time when you failed to meet your goals.
It is impossible to live without failing at something, unless you live so cautiously that you might as well not have lived at all – in which case you fail by default.
- J.K. Rowling
Some failure in life is inevitable. Those who are brave and bold attempt many new things and thus fail much more often. Don’t be afraid to explain a time when you wanted to achieve something, but you were not able to do it. Chances are that the interviewer is more interested in learning how you handled the failure that you experienced. He wants to know whether you learned from your mistakes and whether you are motivated to succeed in the future.
When you think of a story, don’t pick a major failure and try to choose a story where external factors influenced your failure as well. Inexperience on your part is OK too, given that you are in the early stages of your career. Don’t point out as a reason for your failure qualities that can have a negative impact on your work in the future (for example attention to detail, ability to handle pressure, etc.).
It is very important to show that you turned a negative situation into a valuable learning experience. This will make a great impression on the Interviewer.
Here’s an example of such a situation:
Last year, I was eager to find a summer internship opportunity, but I wasn’t able to do that. One of the main reasons behind this was the tough job market that we are currently facing. Along with that, I believe I was too inexperienced and did not realize how difficult it was to find a good opportunity.
This year I had a totally different approach. You could say I learned my lesson perfectly. So, I started preparing myself since November and created a shortlist of opportunities that I wanted to pursue. Then I researched all potential employers and chose the ones that were really interesting. I had more time to work on my CV and Cover Letters and to prepare for interviews. Of course, I wasn’t going to make the same mistake twice.
10. Why should we hire you?
This question is very similar to “How would you add value to our company”. The Hiring Manager challenges you to sell him/her the idea of you being hired. Your profile is the product that needs to be sold. Remember the example that we gave with the pen?
Most people will start listing their qualities and qualifications, hoping that they will touch the right nerve along the way. But that is not the way to go.
The Hiring Manager has read your CV, he/she already knows about your credentials. What he/she wants to understand is whether you can handle a tough question and be persuasive while making a valid point. Try to open your answer with a question instead:
Manager: Let me ask you, with so many people applying for this job, why should we hire you?
Job-Seeker: A great question. But I would like to ask you something as well. Can I?
Manager: Sure, go ahead.
Job-Seeker: What makes a great Analyst with your firm?
Manager: We are looking for people who are very independent and are able to learn fast, even when they are under pressure. Does that make sense?
Job-Seeker: Sure, it does. I can imagine that the environment in which your firm operates requires such qualities. This is precisely what made me apply for this position in the first place. I want to be a part of your dynamic environment. I am able to learn fast and adapt to changing circumstances quite easily. For example, …
Sounds much better, right?
In order to respond successfully to this question, you need to communicate well with the interviewer and understand exactly what they are looking for. Otherwise, you simply don’t know why they should hire you, leaving your answer to be a shot in the dark.
MORE DATA ANALYST INTERVIEW QUESTIONS...
What Do Business Intelligence Analyst Interview Questions Comprise?
Business intelligence interview questions are bound to comprise some business analytics interview questions, data modeling interview questions, and credit risk interview questions. Of course, business analyst behavioral interview questions are important, too. In addition, you shouldn’t neglect to practice SQL coding interview questions and various Python programming questions and answers. Statistics questions and answers are also popular among BI Analyst interviewers, so make sure you don’t skip those, as well.
So, here are 31 examples of BI analyst interview questions you can use for practice.
1. In your opinion, what are the key strengths a business analyst should possess?
How to Answer
A great business analyst should have a strong analytical mind, an “out-of-the-box” approach to solving problems, and the ability to handle pressure. Those are just a few of the strengths that a business analyst must possess. However, to avoid searching for an answer on the spot, carefully review the job description for the role. Make a note of the key strengths listed by the employer, and base your response on that.
Answer Example
“When it comes to key strengths, I’d say business analysts should have a profound understanding of the business and its processes. They should also be able to collaborate efficiently with company executives, even if the latter lack technical or analytics background. Last but not least, attention to detail is crucial in this line of work. That said, I’ve worked hard to develop those skills, and I can’t wait to put them into practice in your organization.”
2. Do you have a B-plan when faced with a change of course on a moment’s notice?
How to Answer
Every skilled business intelligence analyst knows how to pivot, adapt, and change when the plan suddenly falls apart. The ability to solve problems creatively in tense situations is one of the most valuable assets of a business intelligence analyst. So, don’t be shy to go into detail about coming up with a number of alternative scenarios for your clients. Although you may never have to resort to them, the fact that you’re prepared for emergencies is a great sign for the interviewer.
Answer Example
“Contingency plans are my favorite! As a business intelligence analyst, I know it’s great if we can do “X”, as planned. However, things aren’t always perfect, and plans can change quickly. Especially, if there are a few decision-makers involved in a project. That’s why I’m always ready to go with “Y” if the situation calls for it. Having a B-plan takes the edge off, and reassures the whole team that we have a go-to strategy in case we encounter any issues.”
3. Have you worked with teams from various departments in a company?
How to Answer
Being able to work in a cross-functional environment is certainly a plus for larger companies. Hiring managers are aware that you’ll probably have to collaborate on projects with teams from other departments, such as HR, IT, or Marketing. Therefore, they want to know more about your exposure to the challenges that may arise in this line of work. That said, make sure you share how you’ve solved any issues you’ve faced in your experience.
Answer Example
“In my last job as a business intelligence analyst, I was often exposed to cross-functional teamwork. I’ve mostly worked with our HR and IT departments. In my experience, if the team is attuned to the needs of the company for that particular project, it can turn out to be a huge success. I do my best to communicate expectations clearly. In addition, I take into account that everyone has different work styles, strengths, and weaknesses. Usually, that largely depends on their expertise and job role.”
4. How would you create a taxonomy to identify key customer trends in unstructured data?
First, you have to understand the company’s objectives prior to categorizing the data. Once you’ve done this, it is always good to follow an iterative approach by pulling new data samples and improving the model accordingly. And you do this by validating it for accuracy through solicited feedback from the stakeholders of the business. This helps ensure that your model is producing actionable results and improving over time.
5. What is an SQL View?
A view is a virtual table whose contents are obtained from an existing table or tables, called base tables. The retrieval happens through an SQL statement, incorporated into the view. So, you can think of a view object as a view into the base table. The view itself doesn’t contain any real data; the data is electronically stored in the base table. The view simply shows the data contained in the base table.
Author’s note: You can find a more detailed explanation and examples in our tutorial Introduction to SQL Views.
6. What do you do if you disagree with someone at work?
This one is part of the business analyst behavioral interview questions and answers. It is perfectly normal to disagree with someone at work. Similar situations occur all the time. When answering this question, do not speak about the person that you disagreed with. This is really important, as you do not want to come off as someone who bad mouths people; this is unprofessional. The Hiring Manager is not interested in learning saucy details about the bad habits of that other person. Instead, he/she wants to know more about your conflict management abilities. He/she is eager to learn whether you are an active listener and whether you are good at persuading people.
Every behavioral question comes together with a story that supports the answer given by the candidate.
When you answer this question, try to think of a disagreement that was not personal, but derived from different views regarding the execution of a certain task. It is much safer to have this type of disagreement, as it does not suggest you are someone that is difficult to work with.
There are a few key points which you should concentrate on:
- You listened actively
- looked for the best possible solution
- had at heart the team’s success rather than showing muscles
- were persuasive
Here’s an example of such a situation:
You and three of your classmates were asked to prepare a Business Plan.
Probably the most important part of the whole Business Plan is the prediction of the top-line – revenues. You wanted to use a bottom-up approach and one of your classmates thought that the top-down approach would be more useful. Both you and your colleague were convinced that your own approach was correct. The work could not continue before resolving this issue.
So, you asked your classmate to elaborate on his point and demonstrated that you are interested in his idea; he made a valid point. There was a recent market study that your team could use as a reference. It predicted the overall dimensions of the market for the next five years. This was a valuable piece of information, although it is difficult to predict the firm’s market share. You explained that the advantage of the bottom-up approach is that you can base your growth assumptions on historical data and incorporate data that is specific for the firm under consideration. After each of you explained your points of view, you came to the conclusion that the best thing to do is to use both approaches and obtain a range that would indicate the company’s revenues.
7. Tell us about the last presentation you gave. In your opinion, how did it go?
How to Answer
As a business intelligence analyst, giving presentations to the executives of your company or the company’s clients, will be an important part of your work. You’ll often be expected to extract the insights from the data, prepare the presentation, along with compelling visuals and dashboards, and then deliver it - all by your own efforts. If you have plenty of experience, discuss the topic of your presentations and the feedback you received. If you’re straight out of college, think of a presentation you had to prepare as a part of your education. Of course, it would be more than great if you have a sample of your best presentation on your phone or tablet to show to the hiring manager.
Answer Example
“One of the presentations I’m proud of was related to the launching of a client’s new app. I had to share the results from the preliminary user testing. What I came up with was an engaging presentation with lots of eye-catching visuals. I believe the latter, together with intriguing content, are key to a well-received presentation. I highlighted both the areas of strength, and the areas of improvement. After that, I shared some actionable tips for product improvement with the client. The feedback was positive, and I can actually show you a copy of my presentation on my tablet.”
8. What does the acronym INVEST stand for?
How to Answer
As a business intelligence analyst, you should understand what the acronym INVEST means to technical teams and product managers. It stands for:
- Independent
- Negotiable
- Valuable
- Estimable
- Sized appropriately
- Testable
If you’re familiar with the term, break down each word to show the interviewers you know what you’re talking about. If not, make sure you show interest in understanding the concept and which industries mostly use it.
Answer Example
“I’ve mostly worked in the banking and telecommunications fields. My business analysis was mostly done on the strategic side, and I have limited exposure to this term. I know INVEST is mostly used by business intelligence analysts collaborating with IT and developers teams. As far as I remember, it stands for Independent, Negotiable, Valuable, Estimable, Sized appropriately, and Testable. I’ll be happy to gain better knowledge about INVEST and how it is utilized in your company.”
9. Are you Six Sigma certified? Do you think that’s important and why?
How to Answer
A Six Sigma certification is not a must, but it’s certainly a plus for a BI analyst. Six Sigma certifications have different levels, starting from white belt through yellow, green and black belts to master black belt and champion belt. If you have completed the training, talk about your experience, the skills you’ve acquired, and how you apply them in your job as a BI analyst. If not, share your perspective on why you would consider taking the training.
Answer Example
“Although I haven’t started any Six Sigma training yet, I’m aware that expertise in lean management will certainly be helpful to my clients, as I build up my professional portfolio. So, earning a Six Sigma certification is definitely an option I intend to explore in the future.”
10. What does the acronym PEST mean? Have you used it in your business intelligence experience?
How to Answer
The acronym PEST stands for: Political, Economic, Social, and Technological. A PEST analysis is a strategic business tool that allows BI analysts to discover, evaluate, organize, and track macro-economic factors that can influence their business and make them more competitive in the future. If you’re experienced in the business intelligence field, you should have some knowledge of PEST and how it works.
However, if you haven’t had the chance to employ PEST in your work experience, show the hiring manager you have a basic idea of the concept and that you’re more than willing to apply this form of analysis in your future job.
Answer Example
“I am just starting my career in business intelligence, so I haven’t applied PEST analysis in my work just yet. Nevertheless, I’ve implemented PEST in a case study while in college. I had to discover the political, economic, social, and technological factors affecting the airline industry in recent years. I think it’s a really efficient type of analysis and I’d be happy to become proficient in it in the future.”
MORE BI ANALYST INTERVIEW QUESTIONS
What Data Engineer Interview Questions You Should Be Able To Answer?
If you want to be successful at the data engineer interview, you should not only answer SQL, R, and Python questions, but also know your ETL tools like the palm of your hand. Interviewers also often inquire about data systems and frameworks, cloud computing environments, and data maintenance.
And they’ll probably ask you some data management interview questions, as well.
So, here are 30 data engineer interview questions that will help you with your preparation.
1. Explain data import in R.
R reads data from a decent number of sources, like text, Excel, SPSS, SAS, Stata, systat… with text, and more specifically, CSV, being the most popular. Depending on the format of the data, you’d need to use different packages to import it into R.
In terms of syntax, there is nothing too shocking about the operations – a standard read call is used in most situations.
Importing text files is fairly straightforward.
The user can use the barebones read.table() function from the built-in {utils} package, and set all relevant arguments, or opt for using read.csv() which has default values for the arguments most often used in importing a CSV file. Both of these would result is you creating a data frame. You could also choose to use the read_csv() from the {tibble} package and import your data as a tibble. That’s the method to be preferred if you’re using R to do data science.
Importing Excel files happens with the {xlsx} package.
Importing SPSS and SAS data often requires the {Hmisc} package. For .sas7bdat files specifically, Hadley Wickham’s {haven} package can be helpful.
Importing Stata and systat data typically happens with R’s {foreign} package.
2. What is the difference between UNION and UNION ALL?
The UNION command is very similar to the JOIN command, as they are both used to select related information from multiple tables. However, the UNION command selects only columns of the same data type. Furthermore, UNION selects distinct values only, i.e. it combines the result set of two or more SELECT statements. In contrast, UNION ALL selects all values (without eliminating duplicate rows).
3. What programming/scripting languages have you used? Which one are you most experienced with?
How to Answer
Generally, job descriptions list the required and preferred programming skills for the role. So, when you talk about the languages you’re most experienced with, make sure you emphasize your work with the preferred/required ones in past projects. In case you lack experience in these, focus on the languages you’re proficient in and list any similarities they may have with the required. And don’t forget to point out that you’re a fast learner that can easily grasp new concepts and languages. This will show the interviewer that you’ll be committed to using the necessary tools, even if you have to complete additional training.
Answer Example
“I have worked with both Python and SQL. However, I’m most comfortable using Python, due to the nature of the tasks in the previous company I worked for. I understand that SQL is preferred, and I can assure you I can advance my SQL skills quickly on the job. I’m a quick learner and learning new concepts has always come easy to me.”
4. Have you ever found a new use for existing data that has brought a positive change in your employer’s business?
How to Answer
A data engineer is often one of the few people who has the broadest view of the company’s data. It’s quite common for departments to work with a limited set of tables within the organization’s databases and thus hinder the accuracy of their analyses. That said, a good data engineer should be familiar with the projects and initiatives of each department. This will allow them to provide other employees with valuable insight into what data is available and how they can utilize it to improve the quality of analyses throughout the organization.”
Answer Example
“As a data engineer, it’s important for me to be familiar with all initiatives taken up by the company’s departments. I believe employers should have access to data from other departments in order to improve their work. In my previous job, I proposed to connect employee data with sales data. As it turned out, there was a correlation between the education and work experience of hired employees and high or low sales periods. The subsequent detailed analysis showed that certain employee profiles result in considerable increases in sales for a significant period of time. I take pride in this discovery, as HR data had never been cross-referenced with sales data for analytical purposes in this company before.”
5. Have you ever taken part in a data disaster recovery situation? If so, describe what happened and how you solved the issue at hand.
How to Answer
Completing daily assignments is only part of the data engineer’s job. Above all, hiring managers are looking for someone who can quickly respond to urgent situations and contribute to their remedy. Sometimes a data infrastructure may fail. Or data can become inaccessible, lost, or even destroyed. All of these can hurt the company’s processes. So, when answering this question, present yourself as a decisive person with a hands-on approach to solving unforeseen issues.
Answer Example
“In my most recent data engineer job, I was part of a team project focused on developing a Disaster Recovery Strategy. This is how I got familiar with the actions that needed to be taken when we faced a real data disaster recovery situation. A corrupt file somehow got loaded into the company’s system. This caused databases to lock up. As a result, a lot of the data was corrupted as well. What I did was immediately approach the IT team. Together, we made sure our data backups were loaded as quickly as possible, so that the operations in the company can continue to run smoothly.”
6. Have you ever created custom analytics applications? If so, please share details about the application you’ve built.
How to Answer
In order to build a custom analytics application, a data engineer should have an in-depth understanding of the analytic needs of all departments within the company. Creating such applications requires careful planning and teamwork. That said, you should answer in a way that highlights not only your programming expertise but also your excellent communication skills.
Answer Example
“The goal of the custom application I built was to marry primary marketing research data with sales data that was stored in the company’s databases. The app helped the Marketing department avoid the tedious process of requesting data from the data warehouse and loading it into Excel. This resulted in a much quicker performance of specific analyses.”
7. What is your experience in working with data scientists? What do you think are the common skills you share?
How to Answer
It's true that data engineers and data scientists have some skills and qualifications in common. It’s also possible that they have some overlap in responsibilities, depending on the requirements of the employer. But, in their essence, their roles are quite different.
Data engineers should be aware of the data scientists’ ongoing projects. They handle the maintenance, architecture, and preparation of data for future analysis.
Data scientists, on the other hand, rely on the data engineers’ work to extract insights from the data and present the results to management and executives.
That said, a hiring manager would like to know how well you understand the work of data scientists and what is your experience interacting with them.
Answer Example
“I’ve had the chance to work with data scientists on many projects and occasions and I can say it’s been a very productive and rewarding experience. We both understand analytics and programming languages which made it easy for me to help them with their projects. The fact that we have overlapping skills allowed the data scientists to grasp the limitations of our infrastructure and data availability. At the same time, I was able to easily understand their data needs.”
8. What is your preferred field of work? Do you prefer Pipeline or Database, or a more Generalist role?
How to Answer
A data engineer’s role heavily depends on the size of the company and the specific tasks they’re assigned. Generalists employ a variety of skills, as they are responsible for many different tasks. If you’re focused on Pipeline, this means you have experience in working closely with data scientists and have a better understanding of how to prepare data for analysis. Data engineers who have worked mostly in Database, have in-depth knowledge of the ETL process and table schemas. No matter which role/s you have been in, include all your experiences in your answer. You can also go into moderate detail in explaining why you prefer one type over the other.
Answer Example
“I’ve always worked in more of a Generalist role. I can say I like this one more than the other types because I like having a broader scope of expertise. I enjoy being in-the-know about the whole structure and process, as opposed to focusing on just one subset of skills I’ve acquired.”
9. According to some Big Data professionals, data engineering is a non-analytical career path. Do you consider this statement true or false? Why?
How to Answer
This statement can’t be interpreted in a single way. Yes, it’s true that compared to a data analyst, a data engineer’s work is much less analytical in nature. However, this doesn’t mean that data engineers lack analytical skills or that they don’t implement them at all. When giving your answer, tell the hiring manager how you view your role as a data engineer and how you’ve used your analytics skills on the job.”
Answer Example
“I’d have to say I firmly disagree with this statement. I’ve used my analytical skills on numerous occasions. As a data engineer, I’ve often performed analyses to ensure the high quality and integrity of the data. My analytical skills have also helped me immensely in my mutual projects with data scientists and data analysts. Thanks to my analytical mindset, I’ve been able to identify and help them with their data needs.”
10. What trainings would you enroll in to advance your data engineering skills?
How to Answer
Technology’s constantly changing, so, if you’re setting high goals for yourself, this question may prompt you to list several trainings you’d like to fit in your schedule. However, make sure you convey that you’d like to complete these courses as they cover topics of interest and not to make up for weaknesses in your preparation. Balance your answer by mentioning your strengths and the skills you’ve already acquired.
Answer Example
“I think enrolling in trainings is crucial for any data engineer that wants to be up-to-date with the advancements in the industry. Personally, I’d like to expand my current expertise in ETL processes and the cloud environment. Although I have significant experience working with both, I believe my future work can only benefit from continuous learning.”
MORE DATA ENGINEER INTERVIEW QUESTIONS...
What Data Architect Interview Questions You Should Be Ready For?
If you want to ace the data architect interview, you must show confidence in talking about data accessibility, data security, and data source integration. Moreover, you must convince the hiring manager that you are capable of understanding the data needs and utilization across the different company departments. And, of course, make a great impression when answering the tricky data architect behavioral questions. Here are 30 data architect interview questions to help you prepare.
1. Tell us about a situation when you made changes to a company's data management systems and the impact it made on the company.
The data needs of companies change and hiring managers want to make sure they hire an architect that will not only adapt to the new requirements but will also take up the initiative to implement these changes and introduce some new improvements. If you are just beginning your career as a Data Architect and you don’t have experience in dealing with such changes, think of a hypothetical situation that will demonstrate your problem-solving skills and hands-on approach to challenges.
Here’s an example of such a situation.
While working for my previous employer, I was part of a project aiming to make data more accessible to all company employees. Each department's data was siloed and team members in other departments couldn’t access it. Acquiring data outside one's own department was a dull and tiresome process that prevented timely analyses. I actively took part in making data sharing among the company’s departments easy without compromising data security. Thus, analysts were able to complete their projects in time using a much more robust dataset than before. This made it possible for senior management to make fast and better-informed strategic decisions.
2. What is referential integrity?
Referential integrity is a subset of data integrity that refers to the accuracy and consistency of data linked between tables. Referential integrity is majorly important -if a database lacks referential integrity, this can result in return of incomplete data without any indication of an error.
For instance, we can say the foreign key in a certain child table maintains the referential integrity within the database by referencing a valid, existing primary key in the parent table.
A foreign key in SQL is defined through a foreign key constraint. This type of constraint verifies that the values in the child and parent tables match. Therefore, referential integrity doesn’t allow us to add records to a related table unless there is an associated record in the primary table. It also prevents us from changing values in a primary table that would lead to orphaned records in a related table. Moreover, it makes it impossible to delete records from a primary table in case there are matching related records.
To visualize how the fields from the various tables within a database refer to each other, people usually use Entity-Relationship diagrams (ER diagrams), or, the simpler and handier tool – relational schemas.
3. Provide me an example of a time when you had to teach someone a new skill?
This is an easy one, right? The Hiring Manager wants you to demonstrate that you are a person that is willing to teach others. The fact that you are willing to teach means a few very important things:
- You are willing to share knowledge (very valuable for the company)
- You're a team player who is willing to help others
- You relate well to people
The second aspect that is important about this question is the method that you used when you were teaching. How did you share your knowledge? Did you have to use some special technique in order to explain a given concept? Did you have a strategy that helped to facilitate learning? Perhaps you provided valid practical examples?
Here’s an example of such a situation:
You can say that you always wanted to teach your younger brother how to create good PowerPoint presentations. At first, it was difficult because it was very hard to get his attention. Then you proposed creating a presentation together – a presentation about his favorite motorbike company. He instantly agreed because it was something that he was interested in sharing with his friends and perhaps post in one of his favorite forums. At first, you were the one who was working with the mouse and the keyboard, but then you let him complete the second half of the presentation and you gave him your guidance throughout the process. The results were amazing your brother learned so much in such a short period of time. This was a very fulfilling experience for you and you realized that you enjoy teaching.
4. As a data architect, what steps have you taken to understand how different departments use the company’s stored data?
How to Answer
Different departments have different data needs. And, as a data architect, you must have the ability to work with people from non-technical backgrounds to understand how they use the available data. When you answer this question, do your best to convey that you’re willing to educate yourself to improve your job and better serve the company’s data requirements.
Answer Example
“As a data architect, understanding the work of my colleagues in different departments has always been important to me. In my previous workplace, I’d regularly meet with reps from other teams to discuss their current and future projects. I would ask a series of questions, instead of making assumptions. This approach has allowed me to correctly identify and plan their data needs.”
5. To effectively manage a company's data infrastructure, it is important for a data architect to have an in-depth understanding of the business and its strategic challenges. How have you approached this requirement in your past position?
How to Answer
Missing the bigger picture is a common problem for data architects, due to the technical nature of their work. With your answer, you have to reassure the hiring manager that you’re capable of taking proactive steps and stay on track with the overall business strategy and goals of the company.
Answer Example
“In my experience as a data architect, I’ve learned that in order to improve my performance, I have to be constantly aware of the company’s short-term and long-term goals. This is why I’ve been proactive in my communication with management and c-level executives. I’ve also attended corporate trainings on a regular basis. This has given me a chance to ask the right questions to the right people.”
6. What issues have you faced while leading teams tasked with data/database strategy development? Tell us how you solved these issues.
How to Answer
You can approach this question in a more general way, or describe a real situation you and your team have faced when working on a specific task. Either way, make sure you point out your problem-solving skills and the ability to work in a team to reach a common goal.
Answer Example
“In my experience as a data architect, I’ve often worked with teams to develop changes in the data architecture of our company. Of course, people on a team come from different backgrounds and have varying opinions that affect their priorities. What I’ve discovered is that making compromises is crucial to the success of the task, along with staying open-minded to others’ ideas. That said, once we’ve identified our common goals, a consensus has always been easy to reach.”
7. What is/(are) your greatest strength/(s)?
A question that leaves a much more pleasant flavor than “What is your biggest weakness?” Nevertheless, you need to prepare to answer it, because it is an important one and it comes up at almost all HR interviews.
Think of the role you are applying for. What are the greatest strengths that someone who wants to be successful in this position must have? Let’s say that you are interviewing for the position of Project Manager. A Project Manager needs to be a great:
- Communicator
- Motivator
- Team player
- Problem Solver
If the interviewer asks you for your greatest strength (singular) pick one of these qualities. The one that is, in fact, your greatest strength and make sure that you have a great story illustrating that you are really good at this skill. If you are asked to list multiple strengths, you can pick up to three of these qualities. Don’t list more than three strengths, as it will come off as though you are strong with everything, which will dilute the effect that you obtained in the first place.
Avoid vague words (such as maybe, probably, guess, usually) when you talk about your biggest strength/s.
8. What would you do if a colleague was using a company phone for personal use?
These types of questions about the unethical behavior of one of your colleagues are difficult to answer. First of all, it is a very awkward situation. Most people don’t want to rat their co-workers but are not OK with unethical behavior neither. That means that they need to make a tough decision between two conflicting actions.
Depending on how serious and unethical the actions of your colleague are, you usually have two options:
- Talk with them before reporting to the manager and try to convince them to change their behavior
- Report them directly to the manager
In this case, given that a personal phone call from a company phone is not something that endangers the company and its reputation in the long run, you might try to fix the issue yourself by talking to your colleague and explaining to him that using the company phone for private conversations is not allowed. Strengthen your argument by saying that if everybody started doing such things, the company would eventually go bankrupt. Furthermore, he is setting a wrong example for the rest of your colleagues. Given that the company trusted you with this job, you need to repay that with solid work and consistently ethical behavior. If the pattern continues even after you talked to your colleague, you should contact Management.
Had the question involved a more serious violation (sexual harassment, stealing, disclosure of confidential information, etc.) you need to demonstrate your readiness to report the issue directly to your supervisor.
9. How would you deal with a significant mistake at work?
The best way to deal with a mistake at work is to own up to it. Otherwise, it will haunt you and will probably transform into something that cannot be fixed. A timely reaction could prevent the damage deriving from your mistake and shows strength of character.
One of the worst things that can happen to you is to have a manager who has lost trust in your work.
Hiding mistakes can cause that. It will be much better to confront your manager immediately and admit that you made a mistake. Then, once he knows about the situation, he will be able to take appropriate action in order to resolve the situation. It is more likely that he will know how to address the issue because he is more experienced than you.
The more subtle aspect of this question is about how you learn from significant mistakes. Are you going to remember that mistake and learn from it in the future? Are you going to do everything possible in order to avoid it in the future? What type of precautionary measures would you take? Everybody makes mistakes, yes. The important thing is that you show that you are determined to learn from yours.
10. What would you do if one of your colleagues was not performing well?
Open communication is the best way to address problems when you are working with people. Remember that. By openly sharing your concerns with your colleague and hearing his opinion, you will make sure that both of you are on the same page about the current situation. You need to fully understand what caused his weak performance. It could be due to:
- Misunderstanding of his tasks
- Lack of experience in handling this type of tasks
- Personal problems
- Anxiousness to do too much
Then, once you have figured out what the problem is, the next step is to figure out a way to resolve the issue.
For example, you can propose the following solutions:
- Misunderstanding of his tasks – Go through his tasks together and tackle any problematic areas
- Lack of experience in handling this type of task – Depending on the knowledge gap and the deadline that you have you can i) propose to go through the unfamiliar topics together ii) propose to change his assignment with something that he is familiar with and where he can excel
- Personal problems – Offer flexible hours or suggest that he asks the manager for help and explain his personal situation; say that you are behind him and that everyone has difficulties at some point
- Anxiousness to do too much – Explain that the best employees are great at doing well the small things; assure him that he needs to focus on doing well his ordinary tasks without being distracted by issues that are outside of his current capabilities.
MORE DATA ARCHITECT INTERVIEW QUESTIONS...
Data Science Interview Preparation: What Else Do You Need To Prepare For A Data Science Interview?
What’s The Data Science Interview Process Like?
The data science interview process isn’t restricted to the technical interview only. Without a doubt, knowing the answers to the most complex data modeling, algorithm, statistics, and probability interview questions will give you a great advantage. However, showcasing your data science knowledge is only part of making an outstanding impression. And, to be prepared for the non-technical aspect of the data science interview process, you need some in-depth insight into the Hiring Manager’s mind.
What’s A Data Scientist Hiring Manager Looking For?
For a hiring manager, interviews are the way to staff their department with creative, hard-working, independent and reliable data science talents who can deliver results on time and under pressure.
Today’s successful businesses have both the resources and the drive to expand their data science teams to get the most of their data in terms of growth and higher revenue. Companies across all industries already view data science professionals as business partners with the rest of the management in achieving their business goals. However, this also brings a higher responsibility to pick the right people. So, the challenge for you is not only to be able to do the job but also to clearly demonstrate that at the interview. And how can you do that? By addressing the four basic needs of every hiring manager:
Can you do the job?
You might mistake that for the easiest part. After all, you’re already proficient in SQL, Tableau, Python, and R. You also boast some experience in building machine learning algorithms, and deep learning is no stranger. But can you fulfill industry-specific tasks, such as developing an all-in-one software that performs real-time root-cause analysis using existing ERP systems integration?
I think you get the point.
Being able to do the job is much more than having the right skills. It means understanding the technology, the company, the industry, and the position. Employers hire solutions, not people. So, at the end of the day what matters to them is whether you can solve their particular challenge by applying your technical expertise. And it’s up to you to do the research and tailor how you present yourself at the interview. Remember, your goal is to show the hiring manager you’re the best problem-solver. That said, they have to firmly believe that you meet all the requirements and you’ll fit in perfectly with the rest of the team.
How much will you cost?
Your cost is much more than just your salary. Benefits, relocation, perks – they’re all included. Not to mention the training you’ll probably undergo until you learn the ropes of the industry and how the company handles business processes. So, if you want to stand out, make sure you emphasize the value you bring to the company. Don’t be afraid to talk about your skills and experience in similar projects using the same tools and processes. And, of course, this won’t hurt if you are thinking of negotiating for a higher starting salary.
Will you be here in the long run?
To a company, a new employee means an investment. And no employer wants to discover they’ve invested in the wrong candidate in just a few months’ time. So, you have to convey an impression of stability and commitment throughout the data science interview.
Will you fit in?
Every company has a specific culture and looks for similar personalities, work ethics, and motivation. And here’s where soft skills come into play. Remember, no matter how much technical expertise you have, you’ll always be a part of a team. So forget about answering with one-liners and be ready to give some information beyond the competency part.
Build A High-Quality Data Science Project Portfolio
A data science portfolio with high-quality projects takes time and dedication. However, it is all worth it because it proves to your future employer what you can actually do. Although school projects are a good place to start, it’s best to include personal projects that demonstrate your interests and passion for data science. That said, if you’d like to learn more about the ways to build an outstanding portfolio, you can check out our Ultimate Data Science Career Guide.
Why Use Networking As A Tool For A Successful Data Science Interview?
According to Mark Meloon, “The best way to get an interview is to make a connection with someone. LinkedIn can be very helpful but sending the right message to the right person requires a skill. Meeting people at conferences, those who can help you with your search is a great way of fast-tracking your search. Don’t waste time talking with people also looking for jobs! Also, going to these conferences help you understand what data science is being used for today.” So, you heard it from the most reliable source – establish the network that can truly support you reach your goals and strike while the iron is hot!
How To Answer Tricky Data Science Interview Questions?
As an aspiring data scientist, you should know that employers search for curiosity to look for what might go wrong. So, when asked tricky questions, remember: don’t get defensive, if possible downplay red flags such as frequent job changes or lay-offs, and, most importantly – address the Hiring Manager’s hidden needs.
One of their favorite questions to ask you is to describe a problem for them to solve.
They can give you pretty basic instructions and ask them how they would start. As Mark Meloon, Senior Data Scientist at ServiceNow says,
“Many give lip service to things like fully understanding the problem, data issues, EDA, etc. If someone takes the time to ask if there are missing values, skewed distributions, etc., that is something I like to see.Another thing I look for is the answer “I would check with the subject matter expert and/or stakeholder” when I ask them things like “How would you measure the performance of your model?”.
I don’t want someone assuming they know the right metric to use because the business may want something else (e.g., accuracy vs. precision). That said, curiosity and a knack for creative problem-solving will quite possibly take you exactly where you want to be.
How To Answer The Question “Do you have any questions”?
If there’s one question in the history of data science interview questions you can never answer “no”, that’s the one! Asking questions not only gives you a chance to show a genuine interest in the data science position you apply for, but it also demonstrates that you’ve done your research and are well-familiar with the company’s mission, policies, and initiatives.
So, if you want to leave a lasting good impression on the interviewer, follow these quick and logical guidelines:
- Ask specific questions that will help you get a good overall idea of what the day-to-day working process will be like;
- Focus on technical questions to ask the interviewer. Those will help you assess the position, the company and its departments, and the next steps of the hiring process you should expect;
- Don’t ask questions with obvious answers that you can easily find on the company’s website;
- A first interview is definitely not the time to ask about salary, health benefits, sick leave policy, or perks.
Here’s a quick tip: prepare a list of questions in advance. This way you’ll avoid the awkward moment of silence while going through your head in search of a suitable question.
And that pretty much brings the data science interview to a close.
We believe this concise guide will help you “expect the unexpected” and enter your first data science interview with confidence. And if you need more help, check out our course on Starting a Career in Data Science: Project Portfolio, Resume, and Interview Proces.
Ready to take the next step toward a career in data science?
Build up your knowledge and obtain the right skills to succeed in the field of data science. You can enroll in our Data Scientist Career Track and get a verifiable certificate upon completion to showcase what've learned to interviewers. Our team of industry experts provides lots of valuable insights and practical examples to help you get up to speed with industry trends and practices.
Start with the fundamentals with our Statistics, Maths, and Excel courses, build up step-by-step experience with SQL, Python, R, and Tableau, and upgrade your skillset with Machine Learning, Deep Learning, Credit Risk Modeling, Time Series Analysis, and Customer Analytics in Python.