It’s hardly a surprise to anyone in the tech and related industries that “data scientist” is the best job to have in the States. After all, this has been what sources like the Harvard Business Review and Glassdoor report for what is now four years in a row. And even if we take the base salary of $117,000 out of the equation, the position is still plenty attractive on all other dimensions. For example, the current shift towards using machine learning to fast-track business growth secures a steady cross-industry need for skilled professionals capable of handling data and the emerging technologies. To quantify the need: 151,717 data scientists would be warmly welcomed into the workforce, according to the LinkedIn Workforce Report. And let’s not forget the on-the-job satisfaction levels – most data scientists are actually happy (their words, not ours).
But this article is not to give an overview of data science as a smart career decision.
The purpose of this report is to feed the “data scientist”, the collective entity, to the number crunching algorithm and understand what makes a data scientist.
Reverse engineering the data scientist entails analyzing their skillset, employment history, the industry they work for, academic background, and their formal qualifications. Knowing that, the aspiring data scientist can take informed professional steps to securing the title.
When we at 365 Data Science first attempted to dismantle the data scientist in 2018, we revealed a rich professional profile. Twelve months have passed since our initial research and the replicated study suggests that the field is evolving and, with it, the typical professional evolves as well.
A note on methodology
The collective ‘data scientist’ profile was informed by a study on 1,001 professionals currently employed as data scientist. The data was collected from these data participants’ LinkedIn profiles and according to a series of prerequisites. Forty percent of the sample were currently employed at a Fortune 500 company, whereas the remainder worked elsewhere; in addition, location quotas were introduced to ensure limited bias: US (40%), UK (30%), India (15%) and other countries (15%). The selection was based on preliminary research on the most popular countries for data science, where information is public.
Is the typical data scientist of 2018 relevant in 2019?
At a glance, absolutely! The domain is still strongly dominated by men (69%), who can hold a conversation in at least two languages (not to be confused with programming languages, which, if included, would at least double this number). They have been in the workforce for 8 years, but only working as data scientists for 2.3 of them. They can proudly frame up a second-cycle academic degree (74% hold either a Master’s or a PhD), and do a lot more than program “Hello World” in at least Python or R (73%), often both.
Lucky for those of us who are female, or have not yet earned our Doctorates, the segmentation of the data tells a richer and truer story.
Does ‘data scientist’ imply Doctor of Philosophy?
Just as the field is not impregnable by women, so is having a PhD a prerequisite for the position. In fact, less than a third of the data scientists in the cohort hold a Doctorate degree (28%). This is a comparable number to last year’s 27%, which seems to entail that industry does not intentionally introduce an unattainable degree of academic prowess.
On the other hand, if Master’s degree is something into which the aspiring data scientist is willing to invest time and effort, it seems to be the golden standard for academic qualifications (46% of the sample hold a Master).
There is a trend that appears to take shape, however, and that is that the percentage of professionals in a ‘data scientist’ position who have a second-cycle academic degree will decrease, being evened out by data scientists penetrating the field with only a Bachelor’s. The data corroborates this speculation as there is a 4% increase compared to last year in the number of data scientists with only a Bachelor’s degree (19% in 2019 vs. 15% in 2018).
Finally, the fact that some graduate Law school (one in 1,001) and make it as data scientists leaves us with some wiggle room when it comes to the level of academic degree the aspiring data scientist can obtain and get away with it.
Level of education and work experience
From university, to an internship, to the final destination of a ‘data scientist’. This is the story of 8% of the data scientists in our cohort. These are the professionals for whom what it took to land the best job in the USA, was one internship position and a Master’s degree (71%) or a Bachelor’s degree (18%).
For others, the path never strayed from leaving academia (9%). As can be expected, the split here between data scientists with a Master’s degree (47%) and those with a PhD (44%) is pretty equal, with other levels of education barely having any presence at all.
So, apart from interning or coming straight from academia, are there other ways to open the door to a data science career?
Two men and a woman walk into a room: who will be the next data scientist!
- The academic researcher
- The IT specialist
- The intern
According to our data, all of these are gateway positions into data science with a comparable success rate: 9%, 9%, and 8%, respectively. While this might not be the revelation some of our readers are hoping for, these numbers begin to paint the picture of a profession that has many entry points.
Consider also the Data Analyst position (13%), the Consultant (6%), and of course the cluster of Other jobs (13%) the current data scientists may have held (comprised of no less than 15 odd positions and titles).
If you are mathematically inclined, which we hope you are, because data science implies some analytical prowess, you are just about noticing that there is a large percentage of the previous job position analysis here that is missing. Indeed, 42% of the people who are currently data scientists in our cohort, have already been working as data scientists in their previous position. There is the paradox – the best way to secure getting the title ‘data scientist’ is to already have it.
We leave to the reader to speculate further if data scientists like to move jobs (they know their value, too!), or the fact that self-reported data is not always the best data.
Should I study Computer Science and Mathematics or can I learn Botanics and still make it as a data scientist?
Alright. To become a medical practitioner, you would go to medical school; to become a lawyer – law school; police officer – a special academy, and so on. Data science schools are scarce as of the time of writing, if existent at all, so what do data scientists study?
As a matter of fact, a respectable chunk of our cohort studied data science and analysis but a note on notation before we proceed. Due to the massive amount of unique degrees available for academic pursuit, we clustered them into seven areas of academic study.
- Economics and social sciences, which includes studies pertaining to economics, finance, business studies, politics, psychology, philosophy, history, and marketing and management
- Natural science, including physics, chemistry, and biology
- Statistics and mathematics, consisting of statistics, and mathematics-centred degrees
- Computer science, which excludes machine learning
- Data science and analysis, which includes machine learning
- Other, where you can find Art and Design, Atmospheric Science, and… others
That said, 12% percent of the data scientist professionals in our research studied Data science and analysis. Although the field itself is new, there is a growing number of universities that offer specialized degrees to prepare you for a future in data science. Given what the typical data scientist profile reveals about the level of education received, it comes as no surprise that most of these are at a Master’s level.
However, Data science and analysis is not the most prevalent pursued degree in our cohort.
Instead, that would be Computer science (22%). A substantial part of the data scientist’s toolbox is programming languages and number-crunching tools, and this degree is a natural fit, for a lack of more widely accessible alternatives.
Surprisingly, the runner-up degree in this year’s study is not Statistics and Mathematics (which comes as a solid third with 16% prevalence), but Economics and Social Sciences (21%). Nonetheless, Engineering graduates make up for another 9% of the cohort, which lends support to the idea that still, the holy trinity of degrees that can best prepare you for processing and handling big data are Computer Science, Statistics and Mathematics, and Engineering (collectively, 50% of the cohort).
On the other hand, the heavy prevalence of Economics and Social Sciences graduates in the sample (21%) might be welcome news for the less mathematically inclined aspiring data scientists! Almost as many current professionals come into the field from the Humanities side of the spectrum, as do from the Computer science field alone. And a further 11% from a Natural Sciences background. Considering that data science is often programming with a statistics twist (or statistics with a programming twist), this result is quite interesting.
Ivy league or ‘Bye-bye, data science…’?
Data Science is definitely not a private playing field for Ivy league graduates. Although a third of the professionals in our sample graduated from a Top 50 University (according to the Times Higher Education World University Ranking for 2019), the second largest cohort is represented by graduates of universities not even ranked by the Times (23%). A collective sigh of relief – the field is not gated for professionals with limited access to world-class higher education.
The data suggests that data science is a field in which you can be successful based on skill and merit alone, which is not entirely shocking. After all, analysing and processing data are practical, hands-on skills. The most enthusiastic of us could possibly learn to be exceptional with practice, curiosity, and resourcefulness.
Regarding the remaining clusters in the ranking, they are relatively equally populated. The reader should notice, however, that clusters are of different sizes (e.g., *1-50 vs *301-500 or *501-1000). Nonetheless, this is all great news for the aspiring data scientist: the field is not only impregnable by people from various backgrounds and levels of academic studies received, but it is also welcoming to professionals from universities ranking anywhere on the Times scale.
Self-preparation and online courses
It’s no secret that data scientists come from many different backgrounds. Data science is a discipline unlike many others that have a strong educational infrastructure in place. This means that many people wanting to become successful in the field need to take on the responsibility of learning the skills themselves.
But how exactly can we measure who did this?
The most reliable way is to look at online certificates posted on their profiles. With a plethora of online platforms offering quality courses, and at prices comparable to a romantic dinner date, building a tailor-made set of skills has never been easier (or cheaper).
We found that 43% of the profiles we gathered had posted at least one online course with 3 certificates being the average.
Of course, some data scientists may well have self-taught themselves through different means. While others may not even post all, or any, of the certificates they have received. If they don’t believe them to be relevant once they have gained more experience, why would they waste time or space on their digital resume? This is something worth keeping in mind given the platform where the data was collected.
So, what can we do with this information?
Firstly, we could see if there are any correlations to the backgrounds of people who took (or at least posted) online courses and those who didn’t. Is level of academic education a factor? It may not be a stretch to assume that individuals coming from lower ranked universities would be taking more online courses. But maybe students from higher ranked universities are more devoted to their education, so, let’s not assume at all, and instead look the data.
Online courses and university ranking
There is, in fact, a very interesting result when comparing these two factors.
Before we review though, remember that the university ranking clusters do vary in size, both in terms of participants of the study and university ranks. We also want to note that the *1000+ cluster only contains 7 participants and that is not enough data. So, we won’t be discussing them as valid results.
That said, we get some very interesting insights from the remaining data.
The first fascinating result is that Universities ranked in the four clusters making up the Top 500 have marginally different results. This is not overly surprising, but it does show a distinct difference from the results of last year’s study, where graduates of the Top 100 universities posted significantly less online courses.
Perhaps this goes to show that self-preparation is valuable and valued, even for students of prestigious universities.
The next interesting result is seen in the *501-1000 ranking cluster, which shows a dramatic increase in the number of certificates. Again, this wouldn’t be considered shocking: we are willing to speculate that the lower ranked the university you went to, the more you develop a desire to stand out with a high number of online certificates. What is surprising and introduces doubt into the validity of our hypothesis is the behavior of the next cluster (not ranked). There, the number of certificates drops right down to a percentage comparable to those of the Top 500.
Although it is hard to know the exact reason why this is, it’s not tricky to see the importance of self-preparation and considering that the number has risen since last year, especially in top tier universities, aspiring data scientists are realizing that too.
Gaining online certificates is one thing, but which skills are the most beneficial to learn? Which ones do employers look for? Well, of course, we’ve got the data to discover just that.
Country of employment and programming skills
Due to the country of employment quotas placed, we could not only look at aggregates, but also make reliable comparisons across countries. We have divided the sample into four regions: the US, UK, India, and others, placing the same sample weights as in last year’s research (see methodology above).
Before we go into geographic segmentation, we need some context which is best provided by the aggregate data.
The most prevalent programming language in the data science community is Python, followed by R. It is worth noting that R has experienced a 10% decrease in popularity compared to last year, which, given the versatility of Python, is not entirely surprising.
But let’s segment further.
Looking at region segmentation, the findings are not too far from the aggregates.
For the sake of simplicity, we’ve looked into the 4 most used languages: Python, R, SQL, and MATLAB. The first most prominent difference with last year’s edition is that Java has been replaced by MATLAB as the fourth most used programming language. The leading trio remains unchanged in terms of contestants, but not in composition. For a couple of years now, Python has been eating away at R.
In fact, that’s what we see here, too. Python is hands-down the most used data science language worldwide, with R parring it in the only in the US and India.
What’s worth noting is that relational database usage seems flat across the globe with SQL being equally employed everywhere.
With India being the outsourcing heaven lately, it may be worth taking a look at yet another segmentation – industry.
Country of employment and industry
We’ve divided our sample into 4 big clusters of industries: Industrial, Healthcare, Financial, and Tech. Healthcare is an insignificant part of the whole, so it makes sense to focus on the rest.
In fact, the data does not show great cross-country differences (see below), apart from UK’s data science still being more ‘Financial’ than ‘Tech’, and India’s data science being less ‘Industrial’ than the rest. While the latter is far from surprising, what’s worth noting is that in our previous study, the Tech sector boasted 70% of the data scientist talent. In the past year, this seems to have changed drastically.
Finally, the financial-related data science has grown dramatically in India and the rest of the world, catching up to other industries.
For the US-based data scientists, we can say that the tech and industrial clusters are acquiring most of the talent.
Country of employment and academic degree
What was one of the most interesting segmentations for us was that by academic degree? Subconsciously we expect that PhDs will be dominating the field. And last year the data begged to differ with only a third of the sample being a ‘Doctor of Philosophy’. ‘Just’ a Master’s degree seemed sufficient in more than half of the cases.
This year, we see little difference. For the UK, India and the Others, we’ve got no change. A Bachelor’s degree is enough, but being a Master is preferable.
What does catch the eye though is the increase in the PhDs in the US. Pair that with the fact that the Tech industry has always gained a bigger share in America, and you’ll have it – highly educated, tech-oriented data scientists probably working harder than ever to deliver the futuristic data science we’re all looking forward to.
If you are shy of a PhD and are on our list, chances are you are from India. But how does that translate when it comes to actual work experience?
Country of employment and work experience
Which location leads to the fastest career progression?
To get that insight, it is worth looking at the experience the data ‘wizards’ had before achieving their dream of becoming a data scientist.
In the 2018 edition, we saw a tremendous gap between countries. More than half of the data scientists in the USA had more than 5 years of experience.
Nothing like it.
They may have quit their jobs… or became managers. After all, it is also frustrating being a data scientist. But the ‘youngsters’ are making way. And even more so in India and the UK. It seems that there are more job openings and even without prior experience you have a 20-30% chance of getting that place!
Company size and programming language
While Python is still more widely used in non-F500 companies, in almost every other category, F500 is moving hand-in-hand with non-F500 companies. This is great news for anyone new to data science that prefers the technology of the day – R and Python. There’s been a lot of catching up done compared to last year’s results…
Company size and university ranking
Surprisingly, university ranking doesn’t make a difference when it comes to the data scientist’s employer.
Quoting ourselves from last year: ‘Data scientists are needed everywhere. From F500 to a tech start-up!’
This also reinforces our belief that personal skills and self-preparation are much more prominent factors to becoming a successful data scientist, and employers know that.
Hopefully, this article does not make you doubt whether the data scientist profession is something you could realistically pursue. Instead, we hope to have lent a reassuring hand. One of the main messages we extracted from our study both last and this year, is that if you have the skill base that makes a data scientist, you can be a data scientist. It will be interesting to see how the data science profession changes in the next 2-5 years, but right now, a universal data scientist profile appears to be taking shape: a unique programming language toolbox desired across industries and locations; preferably a Master’s degree, or a Bachelor’s and proof of practical abilities; and a confident learning-on-the-go attitude are the currencies of the field.
One final note: we are aiming to create a complete and useful profile of the data scientist as it develops through the years. If anyone has suggestions or comments, feedback and ideas for things we can do better or examine more closely, please let us know! Open conversation is crucial for helping the aspiring data scientist make informed career decisions.
Good luck with your data science journey and thank you for reading!
Here’s our study from last year. Maybe you can spot a trend we missed?