this post was submitted on 17 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] wryso@alien.top 1 points 11 months ago (2 children)

It is most plausible the board found out something where this was their only choice given their fiduciary duties. I’m betting OpenAI trained their next generation models on a ton of copyrighted data, and this was going to be made public or otherwise used against them. If the impact of this was hundreds of millions of dollars or even over a billion (and months of dev time) wasted on training models that have now limited commercial utility, I could understand the board having to take action.

It’s well known that many “public” datasets used by researchers are contaminated with copyrighted materials, and publishers are getting more and more litigious about it. If there were a paper trail establishing that Sam knew but said to proceed anyway, they might not have had a choice. And there are many parties in this space who probably have firsthand knowledge (from researchers moving between major shops) and who are incentivized to strategically time this kind of torpedoing.

[–] cuyler72@alien.top 1 points 11 months ago (1 children)

It's already been decided that using copyrighted material in AI models is fine and not subject to copyright in multiple court cases though.

[–] wryso@alien.top 1 points 11 months ago

This is far from completely litigated, and even if the derivative works created by generative AI that has been trained on copyrighted material are not subject to copyright by the owners of the original works, this doesn’t mean:

  • companies can just use illegally obtained copyrighted works to train their AIs
  • companies are free to violate their contracts, either agreements they’re directly a party to, or implicitly like the instructions of robots.txt on crawl
  • users of the models these companies produce are free from liability for decisions made in data inclusion on models they use

So I’d say that the data question remains a critical one.