Determinants of Scoring at Le Golf National

This analysis is based on scores and stats from individual rounds in the last 10 DP World Tour events at Le Golf National: 6,113 rounds in total.

Section 1: Absolute Correlation Coefficients with Score

Absolute Correlation between Score and SG Metrics

The graph "Absolute Correlation between Score and SG Metrics" shows the variation in the absolute value of the correlation coefficients between the Score and various SG (Strokes Gained) metrics (SGTee, SGApp, SGATG, and SGP) by year.

Absolute Correlation between Score and Traditional Metrics

The graph "Absolute Correlation between Score and Traditional Metrics" illustrates how the absolute value of the correlation coefficients between the Score and traditional metrics (DrivingDistance, DrivingAccuracy, GreensInRegulation, Scrambling, and PPGIR) varies by year.

Absolute Correlation between Score and Par Metrics

The graph "Absolute Correlation between Score and Par Metrics" shows the variation in the absolute value of the correlation coefficients between the Score and Par metrics (Par3, Par4, and Par5) by year.

Section 2: Partial Dependence Plots against Score

Partial dependence plots (PDPs) are a tool used in machine learning and statistical modeling to illustrate the relationship between a target variable and one or more feature (e.g. SGApp, SGATG, DrivingDistance, GreensInRegulation). They show the marginal effect of a feature on the predicted outcome of a model. PDPs are particularly useful for understanding how individual features impact the target variable, allowing for better interpretation and insights from the model.

In determining the value of Score, PDPs can help visualize how changes in each feature impact the predicted score, holding other features constant. This can provide insights into which features are most influential and how they affect the score.

Partial Dependence Plots for Traditional Metrics

The partial dependence plots for SG metrics (SGTee, SGApp, SGATG, and SGP) against Score provide insights into how changes in these metrics impact the overall score.

Partial Dependence Plots for Traditional Metrics

The partial dependence plots for traditional metrics (DrivingDistance, DrivingAccuracy, GreensInRegulation, Scrambling, and PPGIR) against Score provide valuable insights into their impact on overall performance.

Partial Dependence Plots for Par Metrics

The partial dependence plots for par metrics (Par3, Par4, and Par5) against Score illustrate their influence on overall scoring.

Section 3: Importance of Each Metric in Determining Score

Random Forest Regressor and Feature Importance

Random Forest Regressor is an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction. It combines the predictions of several models to improve accuracy and robustness.

Feature importance is a technique used to interpret a machine learning model. It refers to the score that quantifies the contribution of each feature to the prediction made by the model.

In a Random Forest, the importance of a feature is computed by looking at how much the feature decreases the impurity (e.g., variance for regression tasks) across all the trees in the forest. The more a feature decreases the impurity, the more important it is considered.

The calculated importance scores for all features are then normalized to give relative importance as a percentage. This shows the relative contribution of each feature to the prediction task.

Interpreting Feature Importance

Features with high relative importance percentages have a strong impact on the model's predictions. They are crucial for accurate predictions and indicate key areas where performance matters most.

Features with low relative importance have a minimal impact on the model's predictions. While they can still contribute, they are less critical.

Section 3(a): Relative Importance of SG Metrics on Score

Relative Importance of SG Metrics on Score

Relative Importance of SG Metrics on Score

The bar chart titled "Relative Importance of SG Metrics on Score" shows the following relative importances:

The analysis shows that approach shots (SGApp) are the most critical factor in determining the score at Le Golf National, followed by putting (SGP). Tee shots (SGTee) and around-the-green play (SGATG) are less influential but still important.

When comparing these findings with the PGA Tour averages:

Relative Importance of Traditional Metrics on Score

Relative Importance of Traditional Metrics on Score

The bar chart titled "Relative Importance of Traditional Metrics on Score" shows the following relative importances:

The analysis reveals that reaching greens in regulation (GIR) is the most critical factor in determining the score at Le Golf National, followed closely by scrambling and putting per GIR. Driving distance and accuracy are less influential.

When comparing these findings with the PGA Tour averages:

Relative Importance of Par Metrics on Score

Relative Importance of Par Metrics on Score

The bar chart titled "Relative Importance of Par Metrics on Score" shows the following relative importances:

The analysis reveals that performance on par 4 holes is the most critical factor in determining the score at Le Golf National, followed by par 5 and par 3 holes.

When comparing these findings with the PGA Tour averages:

These results indicate that the relative importance of performance on par 3, par 4, and par 5 holes at Le Golf National aligns closely with the averages observed on the PGA Tour.

Top 5 Ranked Players - 2024 Olympic Men's Golf Championship

The table below shows the top-5 ranked players and their average estimated scores from the three different Random Forest models above.

Player Score
Xander Schauffele 68.66
Scottie Scheffler 68.68
Ludvig Aberg 68.90
Rory McIlroy 69.44
Jon Rahm 69.55

Estimated scores for all players can be found here.