The initial analysis of the dataset provides valuable insights into the key determinants of the Score variable. Here's a detailed breakdown of the analysis: Descriptive StatisticsThe dataset contains 3,171 records with the following key variables: - Score: Ranges from 61 to 82, with a mean of approximately 69.63.
- DrivingDistance: Ranges from 0 to 357.8 yards, with a mean of approximately 297.32 yards.
- DrivingAccuracy: Averages around 68.55%, with a standard deviation of 14.36%.
- PuttingAverage: Averages around 1.75, with a standard deviation of 0.15.
Correlation AnalysisThe correlation matrix indicates how different variables relate to the Score. Key correlations with Score include: - PuttingAverage (PA): High negative correlation (-0.64).
- SGP (Strokes Gained Putting): Moderate negative correlation (-0.58).
- Par4 (Average Par 4 Score): Moderate positive correlation (0.45).
- GIR (Greens in Regulation): Moderate negative correlation (-0.40).
Key Determinants AnalysisTo determine the top factors influencing the Score, we need to consider: - Interactions between top determinants
- Non-linear relationships using polynomials
- Categorical intervals for continuous variables
- Log transformations to handle skewed distributions
Visual AnalysisLet's visualize these relationships to better understand the trends and interactions. Scatter Plots and Polynomial FitsWe will create scatter plots with polynomial fits for key variables. Pair Plot for Key VariablesWe will create a pair plot for key variables to visualize interactions. Binning Continuous VariablesWe will bin variables such as DrivingDistance and PuttingAverage into categorical intervals. Log TransformationsWe will apply log transformations to skewed variables and visualize their distributions. VisualizationsLet's create these visualizations.
|