Determinants of Scoring in the Australian PGA Championship
This analysis is based on scores and stats from individual rounds in the last ten Australian PGA Championships: 4,456 rounds in total.
Section 1: Absolute Correlation Coefficients with Score
Key Points:
- SGP (Putting) demonstrates the highest absolute correlation with Score in recent years, underscoring its critical importance at the Australian PGA Championship.
- SGApp (Approach) also shows notable correlations, indicating its consistent role in influencing scores.
- SGTee and SGATG have relatively lower correlations, suggesting a lesser impact compared to putting and approach play.
Key Points:
- Greens in Regulation (GIR) consistently exhibits the strongest correlation with Score, emphasising its importance.
- Scrambling also shows strong correlations, particularly in earlier years, highlighting the role of recovery play.
- DrivingAccuracy plays a moderate role, whereas DrivingDistance shows relatively low correlations.
Key Points:
- Par4 consistently shows the highest correlation with Score, indicating its critical role in determining performance.
- Par3 correlations are moderate, reflecting their secondary but notable importance.
- Par5 correlations are relatively lower, suggesting fewer scoring opportunities on these holes.
Section 2: Importance of Each Metric in Determining Score
Random Forest Regressor and Feature Importance
Random Forest Regressor is an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction. It combines the predictions of several models to improve accuracy and robustness.
Feature importance is a technique used to interpret a machine learning model. It refers to the score that quantifies the contribution of each feature to the prediction made by the model.
In a Random Forest, the importance of a feature is computed by looking at how much the feature decreases the impurity (e.g., variance for regression tasks) across all the trees in the forest. The more a feature decreases the impurity, the more important it is considered.
The calculated importance scores for all features are then normalized to give relative importance as a percentage. This shows the relative contribution of each feature to the prediction task.
Interpreting Feature Importance
Features with high relative importance percentages have a strong impact on the model's predictions. They are crucial for accurate predictions and indicate key areas where performance matters most.
Features with low relative importance have a minimal impact on the model's predictions. While they can still contribute, they are less critical.
Key Points:
- SGP (Putting) has significantly higher importance (42.91%) than the DP World Tour average of 23.33%.
- SGApp (Approach) aligns closely with the DP World Tour average of 28.85%.
- SGTee (Tee) has lower importance (10.47%) compared to the average of 25.36%.
Key Points:
- Greens in Regulation (32.05%) is slightly more impactful than the DP World Tour average of 28.39%.
- Scrambling (28.69%) and PPGIR (28.76%) closely match their averages, highlighting consistency.
- DrivingAccuracy and DrivingDistance are less influential, with lower importances than their averages.
Key Points:
- Par4 importance (63.19%) aligns closely with the DP World Tour average of 64.77%.
- Par3 shows higher importance (18.61%) compared to the average of 15.36%.
- Par5 is slightly less impactful (18.21%) compared to the average of 19.87%.
Top 5 Ranked Players - 2024 Australian PGA Championship
The table below shows the top-5 ranked players across the three different Random Forest models above.
Surname |
Firstname |
Average Score |
Herbert |
Lucas |
69.38 |
Day |
Jason |
69.59 |
Higgs |
Harry |
70.05 |
Micheluzzi |
David |
70.07 |
Parry |
John |
70.10 |
Rankings and estimated scores for all players can be found here.