I've seen and heard your argument made before, not just for LLM's but also for text-to-image programs. My counterpoint is that humans learn in a very similar way to these programs, by taking stuff we've seen/read and developing a certain style inspired by those things. They also don't just recite texts from memory, instead creating new ones based on probabilities of certain words and phrases occuring in the parts of their training data related to the prompt. In a way too simplified but accurate enough comparison, saying these programs violate copyright law is like saying every cosmic horror writer is plagiarising Lovecraft, or that every surrealist painter is copying Dali.
I've seen and heard your argument made before, not just for LLM's but also for text-to-image programs. My counterpoint is that humans learn in a very similar way to these programs, by taking stuff we've seen/read and developing a certain style inspired by those things. They also don't just recite texts from memory, instead creating new ones based on probabilities of certain words and phrases occuring in the parts of their training data related to the prompt. In a way too simplified but accurate enough comparison, saying these programs violate copyright law is like saying every cosmic horror writer is plagiarising Lovecraft, or that every surrealist painter is copying Dali.