Beksl
Sell All The Youngsters
I'll have some spare time in the coming weeks and decided to do a side project and evaluate how informative Squad Cost (and other variables) really are in predicting final league position.
First I need to construct a data set from which I'll do my analysis. Obviously the target variable is league position but what I need from you AMers is suggestions which other attributes/variables to include. Squad cost and wage bill are the obvious ones but I need as much attributes/variables as possible (the data must be available, mind you). Was thinking something along the line of number of passes, cleans sheets, posession %, goals scored etc. (data from 20172018 season). Maybe even some more advanced stats like xG, I'm open to suggestions.
Then I'll evaluate all the attibutes using information theory, computing parameters like Entropy, Gini Index, Information Gain, Gain ratio, Gini Decrease etc. This will tell me which of the attibutes is the most informative, has the highest measure of information of the distributions associated with random variables.
Then I'll train different predictive models/classifiers to a specific/target category (league position). I'll use different algorithms to see which gives me the highest classification accuracy and then test the model(s) on new/test data (from 18/19 season) to see how accurate my classifiers really are.
Attributes/Variables I'll definitely use:
I'll post the results in the coming weeks.
First I need to construct a data set from which I'll do my analysis. Obviously the target variable is league position but what I need from you AMers is suggestions which other attributes/variables to include. Squad cost and wage bill are the obvious ones but I need as much attributes/variables as possible (the data must be available, mind you). Was thinking something along the line of number of passes, cleans sheets, posession %, goals scored etc. (data from 20172018 season). Maybe even some more advanced stats like xG, I'm open to suggestions.
Then I'll evaluate all the attibutes using information theory, computing parameters like Entropy, Gini Index, Information Gain, Gain ratio, Gini Decrease etc. This will tell me which of the attibutes is the most informative, has the highest measure of information of the distributions associated with random variables.
Then I'll train different predictive models/classifiers to a specific/target category (league position). I'll use different algorithms to see which gives me the highest classification accuracy and then test the model(s) on new/test data (from 18/19 season) to see how accurate my classifiers really are.
Attributes/Variables I'll definitely use:
- League position
- Squad cost
- Wage bill
I'll post the results in the coming weeks.