Can I Become a Data Scientist: Research into 1,001 Data Scientist Profiles

Getting the job 18 min read
data scientist

Can I Become a Data Scientist: Research into 1,001 Data Scientist Profiles

18 min read


Silver Blog
Data science is a super-hot topic and the data scientist job is the sexiest job of the 21st century. But how does one actually become a data scientist? I know that for you it doesn’t really matter how others became data scientists. What you are interested in is whether you can become one.

You can ask around, read Quora answers, or talk to someone in the industry. But these methods will yield information biased by someone.

If you want to become a data scientist, you should approach the problem as one.

Well, that’s exactly what we did. We gathered data from 1,001 publicly listed LinkedIn profiles of data scientists. Obviously, the underlying assumption that drove our research is that one’s LinkedIn profile is an unbiased estimator of one’s CV. Therefore, with a ‘reasonable degree of scientific certainty’, we found some amazing insights.

To navigate your way through the pool of data science job opportunities, you could read Data scientist career path: How to find your way through the data science maze.

Aggregate data for a research of this type is the best data, as it will keep the data of the data scientists included in our research private, while highlighting the main drivers of their career success.

Here is a list of all our findings. You can read the whole research or jump to whatever part you find most interesting:

1. Summary
2. Level of education
3. Level of education and work experience
4. Previous jobs
5. Academic degree
6. University ranking
7. Self-preparation and online courses
8. Online courses and university ranking
9. Programming languages
10. Country of employment and programming languages
11. Country of employment and industry
12. Country of employment and academic degree
13. Country of employment and work experience
14. Company size and programming language
15. Company size and university ranking

1. Summary

 

Try to find yourself in the data

Alright.

The brute summary of the data is not quite gender neutral, but it is extremely interesting nonetheless.

The average data scientist is a man, who speaks at least one foreign language. Gender aside, if English is your first or second language… so far, so good.

data scientist

The median experience with the data scientist title is 2 years. And that’s not unexpected. The word “data science” has been around for no more than 10 years, we can’t expect data scientists to outdate their own field!

What’s more interesting is how long it took the average data scientist to earn that title. According to our data, the median total work experience was 4.5 years; thus, it takes 2.5 years on average to earn the title.

Cool. It seems that the rewards in data science come much quicker than in other fields.

So, if you are in your 20s or 30s, you’re just about the right age.

2. Level of education

 

I know that’s certainly an issue for many people. Is this another PhD exclusive career?

Well, not really.

We found that while a significant chunk of data scientists have a PhD degree, a Master’s is completely sufficient. 48% of the entire sample held a Master’s degree. Bachelor degrees are rarer but not absent, unlike MBAs and MDs (yes, there are a couple of MBAs and this one person who graduated as a doctor and became a data scientist)!

data scientist 1

The moral of the story is that if you want to pursue a PhD to become a data scientist, maybe you should reconsider. Having a PhD is definitely an advantage, but perhaps you can do this the old-fashioned way – start as an intern.

3. Level of education and work experience

 

We don’t know the success rate of interns, but we do know that 18% of the data scientists reached the top of the data science ladder after completing an internship. 65% of them had a Master’s degree.

data scientist 2

So, if you have a Master’s, it is a great idea to look for an internship in the field! Maybe a 4-year PhD track is not worth it in the end.

4. Previous jobs

 

The typical data scientist career progression

So, an Internship is one option, right? What other ways are there to get the title?

There are many ways to enter a job, but for the data scientist, it seems clear that the easiest way in is… if you are already a data scientist. Tough luck.

However, there are several big job clusters from which we get most of the data scientists. They are analysts (data analyst, BI analyst, business analyst included), scholars, interns, IT specialists, and consultants. 2% of our sample haven’t stated any previous experience in their LinkedIn profiles, which can be interpreted as them getting the data scientist title straight away. We, however, find this an unlikely state of events.

data scientist 3

It’s not only interesting to see the previous job of the current data scientist, but the one before that, too. Let’s take a look at those who were data scientists before getting their current data scientist position.

data scientist 4

Well, what do you know, it looks almost in the same way.

So, assuming you are not a data scientist already, the typical previous positions seem to be – analyst, academia, internship, IT or consulting.

Combining the information so far, PhDs are likely to go through academia, while Masters on average will go through the analyst, intern, or IT positions.

Now that we have seen the ‘previous-previous’ job of people whose previous job was a data scientist, let’s zoom-out and see the ‘previous-previous job’ of the entire sample.

Here’s is a diagram of the aggregate “previous-previous” position of data scientists.

data scientist 5

What seemed interesting to us was that IT was more common than consulting, for instance. So, solid programming knowledge sounds like something worth looking into. Maybe education does matter, after all (wow!).

What are your thoughts on that?

Don’t ask me, ask the data!

5. Academic degree

 

You are right. Let’s ask the data.

So, we’ve established how the average data scientist got the title, and the type of degree they obtain. It is only right to check what exactly they studied at university.

Currently, there is little to none formal “data scientist education”. It goes without saying that if on average professionals acquired the title 2 years ago, the occupation is so new that they didn’t study data science at college. And indeed, that’s the case.

In our sample, we had 500+ academic degrees. We had to cluster them in some way. We identified 7 clusters. They are not completely different, but we can say: they are different enough.

The clusters are:

  • Economics and Social Sciences
  • Statistics and Mathematics
  • Natural Sciences (Physics, Chemistry, Biology)
  • Data Science and Analysis (which includes Machine Learning)
  • Computer Science (which excludes Machine Learning)
  • Engineering

And … of course…

  • Other

So, what were the findings?

data scientist 6

Currently, there isn’t a single degree that stands out. In a way, any degree can lead you to the data scientist job. Just make sure it is mostly quantitative.

The biggest concentration of data scientist professionals is in Computer Science. It should be noted that we included Machine Learning in the Data Science cluster, rather than the CS one. Had we not done that, CS would be the clear winner.

Logically, the runner-ups are mathematicians and statisticians.

A popular definition of the data scientist job is:

‘The data scientist is a better statistician than most programmers, and a better programmer than most statisticians’

The data have spoken! This quote may actually be quite close to the true definition of the profession.

But…

Business and economics majors should not be discouraged. The “Economics and Social Sciences” cluster is standing strong in our sample. We anticipated a strong presence in the field, but not as much. It seems like the above definition should go more like:

‘A data scientist is a better statistician and economist than most programmers, a better programmer and economist than most statisticians, and a better statistician and programmer than most economists’

What about the data science cluster?

The ‘Data Science and Analysis’ cluster is lagging behind at 10% of our sample. This is a relatively new field, and this is proof universities are not ready to meet the high job market demand for data scientists currently taking place.

So far in the analysis we’ve looked at the ‘How’ (previous positions and relevant experiences) and at the ‘What’ (academic achievements and field of study). So, what do you think comes after the ‘How’ and ‘What’? One answer is ‘Where’.

6. University ranking

 

These degrees are a bit heterogeneous. Perhaps university ranking is an important factor?

So, where did data scientists graduate from?

Maybe it is an Ivy League exclusive profession. We used the ‘Times Higher Education World University Ranking’ to find the Alma Mater of our cohort. It seems that better universities are indeed producing more data scientists, like in most high-paying jobs. So, taking all universities in the Times ranking, we notice the anticipated trend.

data scientist 7
when you read the diagram, keep in mind that the clusters are of different sizes!!!

There is one tiny detail though. 25% of the data scientists come from universities that weren’t even ranked by the Times. That figure is almost equal to the number of people who came from top 50 universities!

Now that’s something.

So, university matters but to a certain point. It looks like you don’t need a diploma from a target school to become a data scientist. We can compare this result to what’s happening in similar paying careers, such as investment banking. Non-target schools are rare there, if not completely absent.

Data scientists, however, seem to have the ability to enter the job, based on their knowledge, rather than the signal their education is sending. So long signaling theory!

Turns out not only is the degree of the average data scientist ‘something quantitative’, but the ranking of their university is ‘somewhere there’.

7. Self-preparation and online courses

 

So, how did data scientists gain the knowledge needed for the job?

Well, that’s what we wondered, too!

Self-preparation is certainly something they engaged in. However, this was a hard metric to measure. The closest proxy is the available information about certificates from online courses posted on data scientists’ LinkedIn profiles.

We found that 40% of the data scientists have included an online course. Assuming one would not post (or would not be able to post) an online course certificate if they hadn’t completed it, then this is the lower bound. At least 40% of the data scientists have taken online courses, just that some of them didn’t post it online.

data scientist 8

I can imagine this being the case as I myself didn’t post (and doesn’t make sense anymore to post) the first machine learning course I took.

At least 40% posted at least one online course.

What about total number of certifications? The data are not as unequivocal. There is a median of 0, a mode of 0, but the mean is 3.3 certificates. It seems some data scientists relied heavily on extra-curricular education, while others didn’t.

data scientist 9

You know what?

Let’s check out which of them relied on online courses and which didn’t. It’s somewhat logical that data scientists who went to higher ranked universities would be less willing to take an online course, isn’t it? Or maybe they are the biggest geeks in general, so they were constantly taking online courses?

8. Online courses and university ranking

 

Using the same ranking as before, we visualized this interesting relationship.

data scientist 10

As we already said 40% of our sample have taken online courses. As the clusters you see are not of the same size, comparability is somewhat restricted. It is useful to compare them to the average, though.

Let’s review them in turn.

Those data scientists who had the best education in terms of university ranking don’t seem to like or need that many extra qualifications. Interestingly enough, the ones graduating from second tier universities (51-100th in the ranking) were even less likely to engage in extra education.

The biggest insight we get is that people from the lowest ranked universities and the ones coming from schools that were not ranked at all are significantly above the average in terms of online course taking. They were also the ones who drove the online course taking average so high.

Now you know how these data scientists acquired the extra skills they needed to prosper in their career.

Which are the data scientist skills employers actually look for?

9. Programming skills

 

We took note of the top 3 data science related endorsements data scientists had in their profile. Apart from the occasional generic keywords: machine learning and data science, it was pretty clear that it is programming languages we are looking at.

So which ones stood out?

R, Python and SQL.

This insight corroborates all other research out there. But confirmatory research is reliable, so here you go.

Similar to what others before us have found, R and Python are the most commonly used languages. Statistically, based on our sample they are equally used by a bit more than 50% of data scientists. We don’t really have a large enough sample to state that one is better than the other. Moreover, we don’t know how much each one of them is used by our cohort, but they surely top the skill list.

data scientist 11

The 3rd most popular programming language is SQL. Database handling is an essential part of the data scientist’s job, so unsurprisingly we observe that 40% of the professionals in our sample ‘speak’ SQL. Check out our resources for SQL: Database vs SpreadsheetDatabase TerminologyInstalling MySQLSQL Best Practices.

Since it is skills we are talking about, it is useful to dig a bit deeper. MATLAB, Java, and C/C++ are the next languages in line.

Personal opinion here: a lot of the MATLAB usage is driven by ‘old-timers’ and scholars, since R and Python are taking the lead. Supposedly, MATLAB will be declining more and more, while Python is expected to grow even further in the years to come.

Finally, Java and C/C++ are definitely driven by IT specialists as most professionals who learn to code and are headed for a data science career would normally go for Python and R. The payoff to invest your time in Java and C/C++ is just not worth it in the current situation.

If you are interested in learning Python, these are some great articles to start with: Introduction to Programming with PythonPython Functions for Beginners, Basic Python Syntax – Introduction to Syntax and Operators.

10. Country of employment and programming skills

 

Are some countries more susceptible to change in terms of programming languages?

In our research, location refers to the country where the data scientists in our sample worked at the time of data collection. We have divided the sample in four regions: US, UK, India and Others, due to the sampling method we used (see below).

So? What are our findings?

Python and R are surely the two most common coding languages. In the US and the UK, Python is the clear winner, while in India and the other countries – R.

data scientist12

While this graph does not give us the greatest of insights, we can surely notice that there is a difference across countries. Perhaps different languages are used in different industries.

11. Country of employment and industry

 

Maybe the industries where these data scientists apply their skills differ, so they require a slightly different bundle of tools.

data scientist 13

Well, industry surely explains a large part of the variability.

People occupied in India work predominantly in the Technological/IT sector. The presence of ex-IT specialists and CS graduates is considerable, and naturally, they have had access to a plethora of coding languages mathematicians or economists didn’t really had the time to learn.

In the US, the sample is more or less split between the tech and industrial clusters (industrial involves retail, energy, FMCG, etc.).

The situation in the UK is similar, but what stands out is the Financial sector. Well, London is (was?) one of the strongest financial centers in the world. It makes sense that the Brits bet on data-driven finance, rather than hunch.

In the financial sector though, the programming knowledges required differ somewhat from the typical data scientist skillset. Languages like Java and C/C++ are more valuable. Moreover, going back to the previous graph (and looking into the data absent from the graph), UK-based data scientists massively reported MATLAB and LaTeX as their “featured skills & endorsements”.

Why would you feature LaTeX as one of your top 3 skills, unless… unless you used to be a scholar!

12. Country of employment and academic degree

 

And indeed, the UK is the number 1 PhD employer. Remember the average? On average, 27% of data scientists have a PhD. Tough competition, Europe.

data scientist 14

The US and the ‘Others’ move around the averages we reported earlier. In India, however, the level of education required differs significantly. No need for a PhD, guys. Even a Bachelor’s could be sufficient!

It seems that it is relatively easy to become a data scientist in India, but very hard in the UK, right?

13. Country of employment and work experience

 

Let’s check which country will lead to the fastest career progression

To get that insight, it is worth looking at the experience the data wizards had before earning the title of their dreams.

data scientist 15

India and the UK seem to be the places to be right now. With 22% of the data scientists with just 0-12 months of prior experience!

Unfortunately, UK is also the PhD club, remember? Data scientists don’t have much experience, but they probably have a bunch of publications…

In the US, the field looks more mature. You work hard for a longer period of time and then you become a data scientist. Even without a PhD.

Speaking of ‘mature’, we decided to contrast giant Fortune 500 companies with other firms (start-ups, or small to medium enterprises, and big non-F500 companies).

14. Company size and programming language

 

In terms of coding languages, F500 is lagging behind

Unsurprising.

data scientist 16

Fortune 500 firms rely heavily on established corporate languages such as SAS, and are lagging in the adoption of R and Python.

Most importantly, they don’t use SQL as much, since Hadoop proves more useful to them. And logically so. Big data is king there.

Around the end of this long analysis, you probably want to know not only if you can become a data scientist, but a F500 one, given your education so far.

15. Company size and university ranking

 

Remarkably, university ranking doesn’t make a difference when it comes to the data scientist’s employer

data scientist 17

Data scientists are needed everywhere. In F500 companies and in tech start-ups.

Looking at this graph I am reassured that personal skills and self-preparation are more important in the data science industry than in any other!

 

Finally, we want to make a note on how the data were gathered:

 

We conducted our own research on the topic. Our study involves 1,001 LinkedIn resumes of data scientists. The cohort was divided into two groups depending on whether the person was employed by a Fortune 500 Company or not (roughly equal groups). This way we were able to compare F500 companies and non-F500. Further, the sample involved data scientists working in the US (40%), UK (30%), India (15%), and other (15%). Thus, the data were collected from data scientists with various backgrounds to limit bias. The country quotas were chosen the way they were according to preliminary research on the most popular countries for data science, where information is public.

 

What about you, dear reader?

Will you be a part of the cohort in future research on the data scientist job?

Leave a Reply

Your email address will not be published.