Determinants of Scoring in the Andalucia Masters
This analysis is based on scores and stats from individual rounds in the last seven Andalucia Masters: 2,716 rounds in total. Note: this event was played at Valderrama until 2022.
Section 1: Absolute Correlation Coefficients with Score
Key Points
- SGApp was consistently critical in both venues, but had slightly less impact in 2023.
- SGTee showed increased importance at Sotogrande, highlighting the value of strong tee shots in 2023.
- SGP (Putting) played a larger role at Sotogrande, reflecting its enhanced impact on scores in 2023.
Key Points
- DrivingAccuracy was much more crucial at Sotogrande, showing higher correlations than at Valderrama.
- GreensInRegulation, while still important, had slightly less influence in 2023.
- Scrambling saw an increased correlation in 2023, emphasising recovery skills at Sotogrande.
Key Points
- Par4 performance remained the most important across both venues, but was slightly more impactful at Sotogrande.
- Par5 performance gained importance in 2023, reflecting the course design at Sotogrande.
- Par3 metrics showed more variability, but no significant change in influence between the two venues.
Section 2: Importance of Each Metric in Determining Score
Random Forest Regressor and Feature Importance
Random Forest Regressor is an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction. It combines the predictions of several models to improve accuracy and robustness.
Feature importance is a technique used to interpret a machine learning model. It refers to the score that quantifies the contribution of each feature to the prediction made by the model.
In a Random Forest, the importance of a feature is computed by looking at how much the feature decreases the impurity (e.g., variance for regression tasks) across all the trees in the forest. The more a feature decreases the impurity, the more important it is considered.
The calculated importance scores for all features are then normalized to give relative importance as a percentage. This shows the relative contribution of each feature to the prediction task.
Interpreting Feature Importance
Features with high relative importance percentages have a strong impact on the model's predictions. They are crucial for accurate predictions and indicate key areas where performance matters most.
Features with low relative importance have a minimal impact on the model's predictions. While they can still contribute, they are less critical.
Key Points
- SGTee was relatively less impactful at the Andalucía Masters compared to the DP World Tour average.
- SGApp had consistent importance, reflecting the challenging nature of the approach shots.
- SGP (Putting) was more/less impactful, indicating the course’s specific green conditions.
Relative Importance of Traditional Metrics on Score
Key Points
- DrivingDistance had more/less impact compared to the average, reflecting the course layout.
- GreensInRegulation was still important but slightly more/less than expected.
- Scrambling remained a vital factor, reflecting the course's difficulty and need for recovery shots.
Relative Importance of Par Metrics on Score
Key Points
- Par4 holes were still the key determinant, similar to the trend across the DP World Tour.
- Par5 performance was more/less impactful, likely due to the course layout.
- Par3 performance had slightly higher/lower importance, reflecting the specific hole designs.
Top 5 Ranked Players - 2024 Andalucia Masters
The table below shows the top-5 ranked players across the three different Random Forest models above.
Rank |
Surname |
Firstname |
Average Predicted Score |
1 |
Rahm |
Jon |
70.00 |
2 |
Soderberg |
Sebastian |
70.45 |
3 |
Mckibbin |
Tom |
70.56 |
4 |
Wallace |
Matt |
70.63 |
5 |
Puig |
David |
70.65 |
Rankings and estimated scores for all players can be found here.