Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 1 year ago

MODERATORS

communick@academy.garden

[D] Sport game prediction (alien.top)

submitted 1 year ago by exater@alien.top to c/machinelearning@academy.garden

2 comments fedilink hide all child comments

I have quite a large dataset of historical games for a sport. Generally speaking what is the best way to predict the winner of these games?

Currently I have a program transforming every game into a bunch of features (participant ages on the day, their wins, stats at the time, etc) and this outputs a binary value whether team 0 or team 1 wins. I guess my questions are:

Generally speaking when training a complex model for something like game predictions where its hard to determine whether a parameter is particularly useful or not, is it better to just have as many parameters as possible? Or is it possible that too many can be detrimental. For example, I could have a single parameter for “career minutes played”. Or would it be more effective to have the career minutes played and also career minutes played for every quarter because players could have varying experience in certain times of the game
What kind of model architecture is generally perceived as the best for something like this where we have 100s of input parameters all boiling down to probabilities for the outcome being 0 or 1? Currently I am trying to use both random forest classification and feed forward neural nets. If neural networks are the avenue I should pursue, is it generally agreed upon that bigger is better for FNNs? More hidden layers? Larger hidden layers?

you are viewing a single comment's thread
view the rest of the comments

[–] Ty4Readin@alien.top 1 points 1 year ago

Couple of things to break down here.

You call them "parameters" but we would normally call those "features", just a small note.

Your two questions are pretty similar:

Q1. Is it better to add more features or less features?

Q2. Is it better to have a more complex/larger model or simpler/smaller model (like a neural network)?

The answer to both is: it depends!

When you add more features and make your model larger/more complex, then that means your model will be able to capture more complex patterns which could be beneficial or could be harmful!

You should read up on overfitting vs underfitting error. Generally speaking, you can reduce underfitting error by adding features and increase model complexity but that comes with the trade-off of increasing overfitting error usually.

The question then becomes: is the gain in underfitting error outweighing the loss in overfitting error?

The only way to know for sure is usually to test out both approaches on a validation set and choose the model and feature set that performed best.