LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

What do you think about GPT-isms polluting datasets? Do you consider them a problem? If so, how big of a problem do you think it is? (alien.top)

submitted 2 years ago by OC2608@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

It's no secret that many language models and fine-tunes are trained using datasets, many of them are made using GPT models. The problem arises when many "GPT-isms" end up in the dataset. And I am not only referring to the typical expressions like "however, it's important to...", "I understand your desire to...", but I am also referring to the structure of the outputs in the model's responses. ChatGPT (GPT models in general) tend to have a very predictable structure when in its "soulless assistant" mode, which makes it very easy to say "this is very GPT-like".

What do you think about this? Oh, and by the way, forgive my English.

you are viewing a single comment's thread
view the rest of the comments

[–] stereoplegic@alien.top 1 points 2 years ago

I'm more concerned with the community's outsized reliance on/promotion of OAI-generated datasets and models trained on them. But then, commercial viability isn't generally a concern when you want a spicy waifu.