Artificial intelligence firm Anthropic hits out at copyright lawsuit filed by music publishing corporations, claiming the content ingested into its models falls under ‘fair use’ and that any licensing regime created to manage its use of copyrighted material in training data would be too complex and costly to work in practice
GenAI tools ‘could not exist’ if firms are made to pay copyright::undefined
This goes back to my previous comment of handwaving away the details. There is a model out there that clearly is reproducing copyrighted materials almost identically (nytimes article), we also have issues with models spitting out training data https://www.wired.com/story/chatgpt-poem-forever-security-roundup/. Clearly people studying these models don’t fully know what is actually possible.
Additionally, it only takes one instance to show that these models, in general, can and do have issues with regurgitating copyrighted data. Whether that passes the bar for legal consequences we’ll have to see, but i think it’s dangerous to take a couple of statements made by people who don’t seem to understand the unknowns in this space at face value.
The ultimate issue is that the models don’t encode the training data in any way that we historically have considered infringement of copyright. This is true for both transformer architectures (gpt) and diffusion ones (most image generators). From a lay perspective, it’s probably good and relatively accurate for our purposes to imagine the models themselves as enormous nets that learn vague, muddled, impressions of multiple portions of multiple pieces of the training data at arbitrary locations within the net. Now, this may still have IP implications for the outputs and here music copyright is pretty instructive, albeit very case-by-case. If a piece is too “inspired” by a particular previous work, even if it is not explicit copying it may still be regarded as infringement of copyright. But, like I said, this is very case specific and precedent cuts both ways on it.
The article dealt with Stable Diffusion, the only open model that allowed people to study it. If there were more problems with Stable Diffusion, we’d’ve heard of them by now. These are the critical solutions Open-source development offers here. By making AI accessible, we maximize public participation and understanding, foster responsible development, as well as prevent harmful control attempts.
As it stands, she was much better informed than you are and is an expert in law to boot. On the other hand, you’re making a sweeping generalization right into an appeal to ignorance. It’s dangerous to assert a proposition just because it has not been proven false.
This goes back to my previous comment of handwaving away the details. There is a model out there that clearly is reproducing copyrighted materials almost identically (nytimes article), we also have issues with models spitting out training data https://www.wired.com/story/chatgpt-poem-forever-security-roundup/. Clearly people studying these models don’t fully know what is actually possible.
Additionally, it only takes one instance to show that these models, in general, can and do have issues with regurgitating copyrighted data. Whether that passes the bar for legal consequences we’ll have to see, but i think it’s dangerous to take a couple of statements made by people who don’t seem to understand the unknowns in this space at face value.
The ultimate issue is that the models don’t encode the training data in any way that we historically have considered infringement of copyright. This is true for both transformer architectures (gpt) and diffusion ones (most image generators). From a lay perspective, it’s probably good and relatively accurate for our purposes to imagine the models themselves as enormous nets that learn vague, muddled, impressions of multiple portions of multiple pieces of the training data at arbitrary locations within the net. Now, this may still have IP implications for the outputs and here music copyright is pretty instructive, albeit very case-by-case. If a piece is too “inspired” by a particular previous work, even if it is not explicit copying it may still be regarded as infringement of copyright. But, like I said, this is very case specific and precedent cuts both ways on it.
The article dealt with Stable Diffusion, the only open model that allowed people to study it. If there were more problems with Stable Diffusion, we’d’ve heard of them by now. These are the critical solutions Open-source development offers here. By making AI accessible, we maximize public participation and understanding, foster responsible development, as well as prevent harmful control attempts.
As it stands, she was much better informed than you are and is an expert in law to boot. On the other hand, you’re making a sweeping generalization right into an appeal to ignorance. It’s dangerous to assert a proposition just because it has not been proven false.