I'd like to know if I have well understood
The algorithm uses this method to build the decision tree: it initially takes the data as they are splitted in the dataset; then, it takes one feature per time and searches for a threshold: it goes from the minimum value of feature_1 up to the maximum value of feature_1, splits the data between the ones lower than that threshold and the ones greater than that threshold and calculates the Gini impurity every time. It does the same for the other features, too. Then, it shows in the root node the value of the threshold of the feature that minimizes the Gini impurity. The same process is repeated for the other nodes. Am I correct? Thank you
0 answers ( 0 marked as helpful)