this post was submitted on 26 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

I'm trying to create an NLP Emotion Classification Model for a research project but kind of confused on where and how to start. I have this huge dataset of Reddit posts and want to classify each post into like 12 different emotion categories.

Is there a way to do this using existing models eg. BERT or can I also do this using unsupervised learning?

I have at least 12000 different posts and so want to avoid supervised learning because its going to take so long to label a set for training data also I might lose a lot of time doing that.

Whats the most efficient and accurate way to do this? Any help would be amazing!

top 4 comments
sorted by: hot top controversial new old
[–] NoFairYouCheated@alien.top 1 points 11 months ago

You can label a balanced subset and try something like SeTFiT

[–] Ok-Kangaroo-59@alien.top 1 points 11 months ago

Go google the setfit library on GitHub. Frame it as a few shot learning task, won’t get perfect results but seems tractable as a problem.

[–] Ronny_Jotten@alien.top 1 points 11 months ago

I doubt you'll find 12 different emotions on Reddit. I think everything can fit into:

  1. sarcasm
  2. rage
  3. polemic rage
  4. bewilderment
  5. jocularity
  6. serious

I might have missed one or two, but I'm sure there aren't 12.

[–] hwah317@alien.top 1 points 11 months ago

Try the Go Emotions dataset from Google