Determinants of Scoring in the Nedbank Golf Challenge

This analysis is based on scores and stats from individual rounds in the last nine Nedbank Golf Challenges: 1,990 rounds in total.

Section 1: Absolute Correlation Coefficients with Score

Absolute Correlation between Score and SG Metrics

1. The metric 'SGApp' showed the highest average absolute correlation with Score across the years, indicating its significant impact during the Nedbank Golf Challenge.

2. The metric 'SGATG' had the lowest average correlation, suggesting a comparatively smaller influence.

3. The overall trend showed 9 years of data, with notable fluctuations in correlation for 'SGApp'.

Absolute Correlation between Score and Traditional Metrics

1. The metric 'GreensInRegulation' showed the highest average absolute correlation with Score across the years, indicating its significant impact during the Nedbank Golf Challenge.

2. The metric 'DrivingDistance' had the lowest average correlation, suggesting a comparatively smaller influence.

3. The overall trend showed 9 years of data, with notable fluctuations in correlation for 'GreensInRegulation'.

Absolute Correlation between Score and Par Metrics

1. The metric 'Par4' showed the highest average absolute correlation with Score across the years, indicating its significant impact during the Nedbank Golf Challenge.

2. The metric 'Par3' had the lowest average correlation, suggesting a comparatively smaller influence.

3. The overall trend showed 9 years of data, with notable fluctuations in correlation for 'Par4'.

Section 2: Importance of Each Metric in Determining Score

Random Forest Regressor and Feature Importance

Random Forest Regressor is an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction. It combines the predictions of several models to improve accuracy and robustness.

Feature importance is a technique used to interpret a machine learning model. It refers to the score that quantifies the contribution of each feature to the prediction made by the model.

In a Random Forest, the importance of a feature is computed by looking at how much the feature decreases the impurity (e.g., variance for regression tasks) across all the trees in the forest. The more a feature decreases the impurity, the more important it is considered.

The calculated importance scores for all features are then normalized to give relative importance as a percentage. This shows the relative contribution of each feature to the prediction task.

Interpreting Feature Importance

Features with high relative importance percentages have a strong impact on the model's predictions. They are crucial for accurate predictions and indicate key areas where performance matters most.

Features with low relative importance have a minimal impact on the model's predictions. While they can still contribute, they are less critical.

Relative Importance of SG Metrics on Score

1. The metric 'SGApp' had a calculated importance of 37.67%, which is higher than the DP World Tour average of 28.85%.

2. 'SGTee' importance was 17.60%, showing a lesser influence compared to its DP World Tour average of 25.36%.

3. The overall importance of 'SGATG' and 'SGP' demonstrated noteworthy deviations from their averages, highlighting event-specific trends.

Relative Importance of Traditional Metrics on Score

1. 'GreensInRegulation' exhibited an importance of 26.25%, consistent with its dominance (DP World Tour average: 28.39%).

2. 'DrivingDistance' showed a calculated importance of 8.54%, aligning with the DP World Tour average of 9.56%.

3. 'Scrambling' and 'PPGIR' emerged as key differentiators, reflecting their relative importance in specific scenarios.

Relative Importance of Par Metrics on Score

1. 'Par4' retained its dominance with an importance of 55.82%, comparable to the DP World Tour average of 64.77%.

2. 'Par3' showed an importance of 14.25%, lower than its average of 15.36%.

3. 'Par5' demonstrated a significant role, with its influence reflecting specific characteristics of the Nedbank Golf Challenge.

Top 5 Ranked Players - 2024 Nedbank Golf Challenge

The table below shows the top-5 ranked players across the three different Random Forest models above.

Surname Firstname Average Score
Soderberg Sebastian 68.15
Lawrence Thriston 68.77
Bezuidenhout Christiaan 68.87
Conners Corey 69.07
Wiesberger Bernd 69.10

Rankings and estimated scores for all players can be found here.