In our daily lives, we make decisions all the time. We choose what to cook for dinner among several dishes, how to get to work, where to go on holiday, who to ask for help, or even when to go to bed. More often than not, you won’t sit down to draft and compare the options we’ve got. You simply make up your mind and go for what you’ve chosen without analyzing much.
In some cases, however, problems, circumstances, and consequences will be too complex to haphazardly run the possible outcomes through your head and pick one – especially if you’re an aspiring data scientist dabbling into machine learning for the first time. Initially, all the choices you have to make while training your model can feel overwhelming.
Thankfully, decision trees allow you to create easily interpretable outcomes and pick the best possible solution. Moreover, in your future career working with data, you’ll often be given tasks, such as making predictions on your company’s growth, that a tree-based algorithm can promptly resolve.
What Is a Decision Tree?
Before it became a major part of programming, this approach dealt with the human concept of learning. Nowadays, decision tree analysis is considered a supervised learning technique we use for regression and classification.
The ultimate goal is to create a model that predicts a target variable by using a tree-like pattern of decisions. Essentially, decision trees mimic human thinking, which makes them easy to understand.
What Is the Structure of a Decision Tree?
A tree consists of 2 major components:
- Decision node – the point where you make a decision
- Leaf node – the output of said decision; it does not contain any further branches
The algorithm starts from the first decision node, known as the root node. It represents the entire dataset, which is further divided into 2 or more homogeneous sets. The decision nodes represent the dataset’s features, branches denote the decision rules, and each leaf node signifies the outcome.
Decision Trees: A Practical Example
Suppose that you receive a job offer for a data analyst position and now you’re wondering whether to accept or reject it.
To solve the problem, you construct a decision tree:
First, start with the root node or, in this case, the salary range. If the number is not what you’re looking for, then decline the offer. However, if the salary is within your price expectations, go to the next feature, which represents the distance between the office and your home. If they’re not in the desired proximity, you’ll reject the offer. If the answer is “Yes”, on the other hand, go to the next branch, which then considers the “possibility to work remotely”. Once again, you have 2 outcomes - to decline or accept.
This simple example shows you the mechanics of a decision tree in a nutshell. But how can we decide which feature to use first and how to continue building the model?
To answer this, we need to dig into the evergreen concept of any machine learning algorithm – the entropy or loss function! If you’re curious to learn more, you can read our dedicated tutorial on the cross-entropy loss function.
What Are the Advantages of Decision Trees?
As a budding data professional, you’ll have plenty of responsibility at your future position, therefore, it’s important to know which techniques are most beneficial to you. There are many advantages to using decision trees that can help you improve your skills and advance in your data science journey, such as:
- Decision trees are easy to understand. Because of their structure, which follows the natural flow of human thought, most people will have little trouble interpreting them. In addition, visualizing the model is effortless and allows you to see exactly what decisions are being made.
- There is little to no need for data preprocessing. Unlike other algorithms, decision trees take less time to model as they require less coding, analysis, or even dummy variables. The reason is that the technique looks at each data point individually instead of the set as a whole.
- Versatile when it comes to data. In other words, standardizing the collected data is not a necessity. You can imbue both numerical and categorical data into the model as it’s able to work with features of both types.
All of these make decision trees ideal for communicating with business stakeholders as they’ll be able to follow along without any specialized knowledge required.
What Are the Disadvantages of Decision Trees?
Of course, where there are benefits, there are also limitations. This is true even for an intuitive analysis method such as a decision tree. Some of the disadvantages include:
- There is a tendency to overfit. Essentially, the model performs so well on the training data that it compromises the decision-making process. You can prevent this by either stopping the decision tree before it has a chance to do so or, alternatively, letting it grow and then pruning the decision tree after overfitting occurs.
- Mathematical equations are more costly. Not only does the decision tree require more time to calculate, but it also consumes more memory. This is not ideal as sometimes you will have to work with substantial amounts of data and stricter deadlines – efficiency is of the essence.
- Decision trees can be unstable. For example, a minor modification of the data can lead to significant changes – perhaps even generating a new tree with contrary results. Another instance is the model producing biased decisions if some of the classes dominate over the rest.
Don’t be discouraged, however, as these disadvantages can be easily overcome with the right techniques. You just have to be conscious of how you approach them and prepare appropriately.
Decision Trees: Next Steps
Many organizations utilize decision tree analysis in their business models to make informed decisions before taking their next steps. As you begin your journey and rise through the ranks in the field of data, you’re highly likely to encounter this technique. Not to mention that gaining as many skills, such as working with decision trees, is a great way to boost your career outlook and gain a competitive advantage.
Are you ready for the next step toward a career in data science?
The 365 Data Science Program offers self-paced courses led by renowned industry experts. Starting from the very basics all the way to advanced specialization, you will learn by doing with a myriad of practical exercises and real-world business cases. If you want to see how the training works, start with our free lessons by signing up below.