Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 11 months ago

MODERATORS

communick@academy.garden

[Project] Big 5 Personality Project Question (alien.top)

submitted 10 months ago by OpenJuggernaut8556@alien.top to c/machinelearning@academy.garden

4 comments fedilink hide all child comments

I'm looking for some advice regarding a project idea I have. I would like to predict the big five personality traits for authors based on an analysis of their writing samples. However, would I need to have had some authors take the big five personality assessment and have a training set with those results in order to do a project like this? Or is their a way to "guess" what certain writing patterns would correlate with? What would be the potential strategy for orienting an ml project like this?

you are viewing a single comment's thread
view the rest of the comments

[–] sshh12@alien.top 1 points 10 months ago (2 children)

Definitely one tricky part as you mentioned is the dataset. In an ideal world, you'll have a supervised dataset of (document, personality type) pairs and you can train a model on these (just like u/Veggies-are-okay mentioned).

Assuming you don't have this data, a couple options:

Make the data. Some quick google searches show that many celebrities do have known Big-5s. You could manually curate Big-5s and text written by these celebrities to build these pairs.
Use synthetic data. Try asking an LLM (like ChatGPT) to write a text on a random topic as if they were $RANDOM big-5 then just use these results as your training pairs.
Try clustering. Potentially similar personality types have similar embeddings. Take a dataset of writings, embed them using something like BERT, label/best-effort-guess a few and then predict personalities based on the proximity of a piece of known big-5 text in the embedding space. You could extend this to training a model that asks "do text A and text B display the same big-5" which could potentially be an easier problem to get samples for and then run this model against a set of know big-5s and your unknown example.
Use a proxy. There might be datasets/models out there that predict heuristics that could be combined to find big 5. Like maybe a sentiment score is correlated with agreeableness. Potentially you might be able to create word/phrase banks such that using certain phrases is potentially indicative of a leaning on big-5 ("has_neurotic_phrases" is then a feature in your model)

[–] gettotea@alien.top 1 points 10 months ago

Really great ideas. The synthetic data one is worth going deeper into.

[–] Veggies-are-okay@alien.top 1 points 10 months ago

Will back up the ChatGPT advice. It’s really amazing how much value LLMs can have in creating synthetic data.