this post was submitted on 17 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It is most plausible the board found out something where this was their only choice given their fiduciary duties. I’m betting OpenAI trained their next generation models on a ton of copyrighted data, and this was going to be made public or otherwise used against them. If the impact of this was hundreds of millions of dollars or even over a billion (and months of dev time) wasted on training models that have now limited commercial utility, I could understand the board having to take action.
It’s well known that many “public” datasets used by researchers are contaminated with copyrighted materials, and publishers are getting more and more litigious about it. If there were a paper trail establishing that Sam knew but said to proceed anyway, they might not have had a choice. And there are many parties in this space who probably have firsthand knowledge (from researchers moving between major shops) and who are incentivized to strategically time this kind of torpedoing.
It's already been decided that using copyrighted material in AI models is fine and not subject to copyright in multiple court cases though.
This is far from completely litigated, and even if the derivative works created by generative AI that has been trained on copyrighted material are not subject to copyright by the owners of the original works, this doesn’t mean:
So I’d say that the data question remains a critical one.