Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] Seeking Advice for Achieving a 0.97 Score (alien.top)

submitted 2 years ago by poolyhymnia@alien.top to c/machinelearning@academy.garden

6 comments fedilink hide all child comments

I'm currently working on a machine learning task on Kaggle, and I'm striving to achieve a minimum of 0.97 score in accuracy. While I've made some progress, I've hit a plateau at 0.91 and can't seem to improve beyond that.

Task Description: I'm working on a classification task where tweets need to be classified into two categories: "Sports" or "Politics." I've used various models, including BERT, and have explored hyperparameter tuning, but I haven't been able to achieve the desired accuracy.

Current State: My best model currently has an accuracy of 0.91. I'm looking for ideas, strategies, and any advice that might help me break through this barrier and achieve a 0.97 accuracy score. I'm open to trying new approaches or techniques, and I'd love to hear from anyone who has experience with similar tasks.

Questions:

Are there specific techniques or approaches you recommend for improving model accuracy?
How can I make the most out of my training data and optimize the model further?
Any insights on feature engineering or data augmentation that could help?

I greatly appreciate any insights or feedback you can provide. Please share your experiences, suggestions, or any resources you think might be helpful.

top 6 comments

sorted by: hot top controversial new old

[–] trajo123@alien.top 1 points 2 years ago

Have you already tried using pre-trained models?

[–] dual_carriageway@alien.top 1 points 2 years ago (1 children)

First off, it’s kind of a funny task given people like to complain some people treat politics like sport these days.

What data do you have access to - just the tweet text or is there other metadata like username, time, bio, profile picture etc. ?

[–] poolyhymnia@alien.top 1 points 2 years ago

I added a sample data to the post body it's basically this:

Data fields

TweetId - an anonymous id unique to a given tweet
Label - the associated label which is either Sports or Politics
TweetText - the text in a tweet

[–] Zahlii@alien.top 1 points 2 years ago

Who set up the score of 0.97 as a goal? Are you sure it is attainable given the data? Have others provided kernels/notebooks that attain these scores? In most cases it’s not the lack of modelling on your side but rather lack of data on the other.

[–] ucals@alien.top 1 points 2 years ago

That’s easy: model stacking/ensembling. All winning Kaggle grandmasters use it.

You have one model only, using only one technique.

There are several other classic approaches to NLP classification tasks: Naive Bayes, SVMs, CBOW, etc.

The idea behind model stacking: train different models, each one using a different method. Then, train a meta-model, which uses as features the output of each individual model.

This will significantly improve your score. It’s how people win Kaggle competitions.

[–] mathbbR@alien.top 1 points 2 years ago

Have you tried training on additional data? There's a lot of sports and politics text out there. If the competition is not using the 20 newsgroups dataset I think you might want to check it out.