Determinants of Scoring in the Open Championship

This analysis is based on scores and stats from individual rounds in the last 10 Open Championships: 4,673 rounds in total.

Section 1: Absolute Correlation Coefficients with Score

Absolute Correlation between Score and Traditional Metrics

This graph presents the absolute correlation coefficients between the 18-hole score and traditional golf metrics, including Driving Distance, Driving Accuracy, Greens in Regulation, Scrambling, and PPGIR, for all available years.

Interpretation and Discussion:

Absolute Correlation between Score and Par Metrics

This graph shows the absolute correlation coefficients between the 18-hole score and Par metrics (Par3, Par4, and Par5) from 2013 to 2023.

Interpretation and Discussion:

Section 2: Partial Dependence Plots against Score

Partial dependence plots (PDPs) are a tool used in machine learning and statistical modeling to illustrate the relationship between a target variable and one or more feature (e.g. SGApp, SGATG, DrivingDistance, GreensInRegulation). They show the marginal effect of a feature on the predicted outcome of a model. PDPs are particularly useful for understanding how individual features impact the target variable, allowing for better interpretation and insights from the model.

In determining the value of Score, PDPs can help visualize how changes in each feature impact the predicted score, holding other features constant. This can provide insights into which features are most influential and how they affect the score.

Partial Dependence Plots for Traditional Metrics

This figure presents the partial dependence plots for traditional golf metrics, including Driving Distance, Driving Accuracy, Greens in Regulation, Scrambling, and PPGIR.

Partial Dependence Plots for Par Metrics

This figure shows the partial dependence plots for Par metrics, including Par3, Par4, and Par5.

Section 3: Importance of Each Metric in Determining Score

Random Forest Regressor and Feature Importance

Random Forest Regressor is an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction. It combines the predictions of several models to improve accuracy and robustness.

Feature importance is a technique used to interpret a machine learning model. It refers to the score that quantifies the contribution of each feature to the prediction made by the model.

In a Random Forest, the importance of a feature is computed by looking at how much the feature decreases the impurity (e.g., variance for regression tasks) across all the trees in the forest. The more a feature decreases the impurity, the more important it is considered.

The calculated importance scores for all features are then normalized to give relative importance as a percentage. This shows the relative contribution of each feature to the prediction task.

Interpreting Feature Importance

Features with high relative importance percentages have a strong impact on the model's predictions. They are crucial for accurate predictions and indicate key areas where performance matters most.

Features with low relative importance have a minimal impact on the model's predictions. While they can still contribute, they are less critical.

Using Random Forest Regressor, the relative importance of each factor on Score is quantified as follows:

Feature Importance of Driving Metrics on Score

The findings indicate that the key to achieving low scores in the Open Championship lies significantly in the ability to hit greens in regulation, recover effectively (scrambling), and excel in putting (PPGIR). While driving distance and accuracy do play a role, they are less critical compared to short game skills and putting.

At The Open Championship, played on links courses with challenging conditions, these factors become even more pronounced. Links courses typically feature deep bunkers, undulating greens, and unpredictable weather, making scrambling and putting paramount. Players who can navigate these challenges by hitting greens, recovering effectively when they miss, and putting well are more likely to succeed.

For comparison, here is the relative importance of each factor across all DP World Tour events over the last 10 years:

Using Random Forest Regressor, the relative importance of each factor on Score is quantified as follows:

Feature Importance of Par3, Par4, and Par5 on Score

The importance of par 4 performance highlighted in the analysis aligns with the typical course setup of The Open, where par 4s are the most common hole type. Additionally, the ability to take advantage of scoring opportunities on par 5s and avoid significant errors on par 3s are important in this event.

For comparison, here is the relative importance of each factor across all DP World Tour events over the last 10 years:

Top 5 Ranked Players - 2024 Open Championship

The table below shows the top-5 ranked players and their average estimated scores from the three different Random Forest models above.

Player Score
Scottie Scheffler 68.42
Joaquin Niemann 68.97
Ludvig Aberg 69.14
Xander Schauffele 69.14
Jon Rahm 69.18

Estimated scores for all players can be found here.