Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] Sport game prediction (alien.top)

submitted 2 years ago by exater@alien.top to c/machinelearning@academy.garden

2 comments fedilink hide all child comments

I have quite a large dataset of historical games for a sport. Generally speaking what is the best way to predict the winner of these games?

Currently I have a program transforming every game into a bunch of features (participant ages on the day, their wins, stats at the time, etc) and this outputs a binary value whether team 0 or team 1 wins. I guess my questions are:

Generally speaking when training a complex model for something like game predictions where its hard to determine whether a parameter is particularly useful or not, is it better to just have as many parameters as possible? Or is it possible that too many can be detrimental. For example, I could have a single parameter for “career minutes played”. Or would it be more effective to have the career minutes played and also career minutes played for every quarter because players could have varying experience in certain times of the game
What kind of model architecture is generally perceived as the best for something like this where we have 100s of input parameters all boiling down to probabilities for the outcome being 0 or 1? Currently I am trying to use both random forest classification and feed forward neural nets. If neural networks are the avenue I should pursue, is it generally agreed upon that bigger is better for FNNs? More hidden layers? Larger hidden layers?

you are viewing a single comment's thread
view the rest of the comments

[–] DatYungChebyshev420@alien.top 1 points 2 years ago

When I do sports analysis, xgboost , elastic nets, and MaRS models are my friends. Stack a few together. Tune them well.

Sports data is usually as structured and clean as anything in the world, so I don’t think a big neural network will be necessary or helpful.

Lastly, I recommend modeling the proportion of points scored by the home team rather than winner/loser as a binary outcome, as this is more informative.

I recommend starting with as many variables as you can, fitting your model, and seeing how many variables you can cut out before your cross-validated performance starts dropping substantially.