In a Random Forest, the importance of a feature is computed by looking at how much the feature decreases the impurity (e.g., variance for regression tasks) across all the trees in the forest. The more a feature decreases the impurity, the more important it is considered.
The calculated importance scores for all features are then normalized to give relative importance as a percentage. This shows the relative contribution of each feature to the prediction task.
Features with high relative importance percentages have a strong impact on the model's predictions. They are crucial for accurate predictions and indicate key areas where performance matters most.
Features with low relative importance have a minimal impact on the model's predictions. While they can still contribute, they are less critical.
Using Random Forest Regressor, the importance of traditional metrics has been quantified and is displayed below:
Key Insights:
- Greens in Regulation is the most important factor at 31.23%.
- Scrambling and PPGIR contribute significantly to score as well, at around 27% each.
- Driving Accuracy is the least impactful, with only 5.12% influence.
Random Forest Regressor was also applied to determine the importance of Par metrics. The results are displayed below:
Key Insights:
- Par 4 holes have the highest impact on score (65.23%).
- Par 3 and Par 5 holes have relatively lower but still notable importance.
The table below shows the top-5 ranked players across the two different Random Forest models above.