The Japanese government has made a formal request asking OpenAI to refrain from copyright infringement. This comes as a response to Sora 2’s ability to generate videos featuring the likenesses of copyrighted characters from anime and video games.
So, the “don’t use copyrighted data in a training corpus” crowd probably isn’t going to win the IP argument. And I would be quite surprised if IP law changes to accommodate them.
However, the “don’t generate and distribute infringing material” is a whole different story. IP holders are on pretty solid ground there. One thing that I am very certain that IP law is not going to permit is just passing copyrighted data into a model and then generating and distributing material that would otherwise be infringing. I understand that anime rightsholders often have something of a tradition of sometimes letting fan-created material slide, but if generative AI massively reduces the bar to creating content, I suspect that that is likely to change.
Right now, you have generative AI companies saying — maybe legally plausibly — that they aren’t the liable ones if a user generates infringing material with their model.
And while you can maybe go after someone who is outright generating and selling material that is infringing, something doesn’t have to be commercially sold to be infringing. Like, if LucasArts wants to block for-fun fan art of Luke and Leia and Han, they can do that.
One issue is attribution. Like, generative AI companies are not lying when they say that there isn’t a great way to just “reverse” what training corpus data contributed more to an output.
However, I am also very confident that it is very possible to do better than they do today. From a purely black-box standpoint, one possibility would be, for example, to use TinEye-style fuzzy hashing of images and then try to reverse an image, probably with a fuzzier hash than TinEye uses, to warn a user that they might be generating an image that would be derivative. That won’t solve all cases, especially if you do 3d vision and generative AI producing models (though then you could also maybe do computer vision and a TinEye-equivalent for 3D models).
Another complicating factor is that copyright only restricts distribution of derivative works. I can make my own, personal art of Leia all I want. What I can’t do is go distribute it. I think — though I don’t absolutely know what case law is like for this, especially internationally — that generating images on hardware at OpenAI or whatever and then having them move to me doesn’t count as distribution. Otherwise, software-as-a-service in general, stuff like Office 365, would have major restrictions on working with IP that locally-running software would not. Point is that I expect that it should be perfectly legal for me to go to an image generator and generate material as long as I do not subsequently redistribute it, even if it would be infringing had I done so. And the AI company involved has no way of knowing what I’m doing with the material that I’m generating. If they block me from making material with Leia, that’s an excessively-broad restriction.
But IP holders are going to want to have a practical route to either be able to go after the generative AI company producing the material that gets distributed, or the users generating infringing material and then distributing it. AI companies are probably going to say that it’s the users, and that’s probably correct. Problem is from a rightsholder standpoint, yeah, they could go after the users before, but if it’s a lot cheaper and easier to create the material now, that presents them with practical problems. If any Tom, Dick, and Harry can go out and generate material, they’ve got a lot more moles to whack in their whack-a-mole game.
And in that vein, an issue that I haven’t seen come up is what happens if generative AI companies start permitting deterministic generation of content – that is, where if I plug in the same inputs, I get the same outputs. Maybe they already do; I don’t know, run my generative AI stuff locally. But supposing you have a scenario like this:
I make a game called “Generic RPG”, which I sell.
I distribute — or sell — DLC for this game. This uses a remote, generative AI service to generate art for the game using a set of prompts sold as part of the DLC for that game. No art is distributed as part of the game. Let’s say I call that “Adventures A Long Time Ago In A Universe Far, Far Away” or something that doesn’t directly run afoul of LucasArts, creates enough distance. And let’s set aside trademark concerns, for the sake of discussion. And lets say that the prompts are not, themselves infringing on copyright (though I could imagine them doing so, let’s say that they’re sufficiently distant to avoid being derivative works).
Every user buys the DLC, and then on their computer, reconstitutes the images for the game. At least if done purely-locally, this should be legal under case law — the GPL specifically depends on the fact that one can combine material locally to produce a derivative work as long as one does not then distribute it. Mods to (copyrighted) games can just distribute the deltas, producing a derivative work when the mod is applied, and that’s definitely legal.
One winds up with someone selling and distributing what is effectively a “Star Wars” game.
Now, maybe training the model on images of Star Wars content so that it knows what Star Wars looks like isn’t, as a single step, creating an infringing work. Maybe distributing the model that knows about Star Wars isn’t infringement. Maybe the prompts being distributed designed to run against that model are not infringing. Maybe reconstituting the apparently-Star-Wars images in a deterministic fashion using SaaS to hardware that can run the model is not infringing. But if the net effect is equivalent to distributing an infringing work, my suspicion is that courts are going to be willing to create some kind of legal doctrine that restricts it, if they haven’t already.
Now, this situation is kind of contrived, but I expect that people will do it, sooner or later, absent legal restrictions.
However, the “don’t generate and distribute infringing material” is a whole different story. IP holders are on pretty solid ground there.
Is any of it infringing?
Explain the knock-off music & art in popular media when they don’t want to pay royalty fees for the authentic article.
Explain knock-off brands.
Cheap imitations to sidestep copyright restrictions have been around long before generative AI, yet businesses aren’t getting sued: they apparently understand legal standards enough to safely imitate.
Why is shoddy imitation for distribution okay when human-generated yet not when AI-generated?
I don’t think your understanding of copyright infringement is solid.
Even supposing someone manages to generate work whose distribution infringes copyright, wouldn’t legality follow the same model as a human requesting a commercial (human-based) service to generate that work?
This is a distressingly unusually solid analysis for lemmy. I agree with one exception–writing to memory absolutely counts as a distribution. Accordingly, if a generative model output an infringing work, it for sure could create liability for infringement. I think this will ultimately work similarly to music copyright where conscious/explicitly intentional copying is not itself the threshold test, but rather degree of similarity. And if you have prompts that specifically target towards infringement, you’re going to get some sort of contributory infringement structure. I think there is also potentially useful case law to look at in terms of infringement arising out of work-for-hire situations, where the contractor may not have infringed intentionally but the supervisor knew and intended their instructions to produce an effectively infringing work. That is, if there is any case law on this pretty narrow fact pattern.
It sounds like it would be an analogue issue that is already similarly solved in other respects.
For example, its not only illegal for someone to make and sell known illegal drugs, but its additionally illegal to make or sell anything that is not the specifically illegal drug but is analogous to it in terms of effect (and especially facets of chemical structure)
So any process that produces an end result analogous to copyright infringement would be viewed as copyright infringement, even if it skirts the existing laws on a technical basis, is probably what the prevailing approach will be
For example, its not only illegal for someone to make and sell known illegal drugs, but its additionally illegal to make or sell anything that is not the specifically illegal drug but is analogous to it in terms of effect (and especially facets of chemical structure)
Hmm. I’m not familiar with that as a legal doctrine.
kagis
At least in the US — and this may not be the case everywhere — it sounds like there’s a law that produces this, rather than a doctrine. So I don’t think that there’s a general legal doctrine that would automatically apply here.
The Federal Analogue Act, 21 U.S.C. § 813, is a section of the United States Controlled Substances Act passed in 1986 which allows any chemical “substantially similar” to a controlled substance listed in Schedule I or II to be treated as if it were listed in Schedule I, but only if intended for human consumption. These similar substances are often called designer drugs. The law’s broad reach has been used to successfully prosecute possession of chemicals openly sold as dietary supplements and naturally contained in foods (e.g., the possession of phenethylamine, a compound found in chocolate, has been successfully prosecuted based on its “substantial similarity” to the controlled substance methamphetamine).[1] The law’s constitutionality has been questioned by now Supreme Court Justice Neil Gorsuch[2] on the basis of Vagueness doctrine.
But I guess that it might be possible to pass a similar such law for copyright, though.
So, the “don’t use copyrighted data in a training corpus” crowd probably isn’t going to win the IP argument. And I would be quite surprised if IP law changes to accommodate them.
However, the “don’t generate and distribute infringing material” is a whole different story. IP holders are on pretty solid ground there. One thing that I am very certain that IP law is not going to permit is just passing copyrighted data into a model and then generating and distributing material that would otherwise be infringing. I understand that anime rightsholders often have something of a tradition of sometimes letting fan-created material slide, but if generative AI massively reduces the bar to creating content, I suspect that that is likely to change.
Right now, you have generative AI companies saying — maybe legally plausibly — that they aren’t the liable ones if a user generates infringing material with their model.
And while you can maybe go after someone who is outright generating and selling material that is infringing, something doesn’t have to be commercially sold to be infringing. Like, if LucasArts wants to block for-fun fan art of Luke and Leia and Han, they can do that.
One issue is attribution. Like, generative AI companies are not lying when they say that there isn’t a great way to just “reverse” what training corpus data contributed more to an output.
However, I am also very confident that it is very possible to do better than they do today. From a purely black-box standpoint, one possibility would be, for example, to use TinEye-style fuzzy hashing of images and then try to reverse an image, probably with a fuzzier hash than TinEye uses, to warn a user that they might be generating an image that would be derivative. That won’t solve all cases, especially if you do 3d vision and generative AI producing models (though then you could also maybe do computer vision and a TinEye-equivalent for 3D models).
Another complicating factor is that copyright only restricts distribution of derivative works. I can make my own, personal art of Leia all I want. What I can’t do is go distribute it. I think — though I don’t absolutely know what case law is like for this, especially internationally — that generating images on hardware at OpenAI or whatever and then having them move to me doesn’t count as distribution. Otherwise, software-as-a-service in general, stuff like Office 365, would have major restrictions on working with IP that locally-running software would not. Point is that I expect that it should be perfectly legal for me to go to an image generator and generate material as long as I do not subsequently redistribute it, even if it would be infringing had I done so. And the AI company involved has no way of knowing what I’m doing with the material that I’m generating. If they block me from making material with Leia, that’s an excessively-broad restriction.
But IP holders are going to want to have a practical route to either be able to go after the generative AI company producing the material that gets distributed, or the users generating infringing material and then distributing it. AI companies are probably going to say that it’s the users, and that’s probably correct. Problem is from a rightsholder standpoint, yeah, they could go after the users before, but if it’s a lot cheaper and easier to create the material now, that presents them with practical problems. If any Tom, Dick, and Harry can go out and generate material, they’ve got a lot more moles to whack in their whack-a-mole game.
And in that vein, an issue that I haven’t seen come up is what happens if generative AI companies start permitting deterministic generation of content – that is, where if I plug in the same inputs, I get the same outputs. Maybe they already do; I don’t know, run my generative AI stuff locally. But supposing you have a scenario like this:
I make a game called “Generic RPG”, which I sell.
I distribute — or sell — DLC for this game. This uses a remote, generative AI service to generate art for the game using a set of prompts sold as part of the DLC for that game. No art is distributed as part of the game. Let’s say I call that “Adventures A Long Time Ago In A Universe Far, Far Away” or something that doesn’t directly run afoul of LucasArts, creates enough distance. And let’s set aside trademark concerns, for the sake of discussion. And lets say that the prompts are not, themselves infringing on copyright (though I could imagine them doing so, let’s say that they’re sufficiently distant to avoid being derivative works).
Every user buys the DLC, and then on their computer, reconstitutes the images for the game. At least if done purely-locally, this should be legal under case law — the GPL specifically depends on the fact that one can combine material locally to produce a derivative work as long as one does not then distribute it. Mods to (copyrighted) games can just distribute the deltas, producing a derivative work when the mod is applied, and that’s definitely legal.
One winds up with someone selling and distributing what is effectively a “Star Wars” game.
Now, maybe training the model on images of Star Wars content so that it knows what Star Wars looks like isn’t, as a single step, creating an infringing work. Maybe distributing the model that knows about Star Wars isn’t infringement. Maybe the prompts being distributed designed to run against that model are not infringing. Maybe reconstituting the apparently-Star-Wars images in a deterministic fashion using SaaS to hardware that can run the model is not infringing. But if the net effect is equivalent to distributing an infringing work, my suspicion is that courts are going to be willing to create some kind of legal doctrine that restricts it, if they haven’t already.
Now, this situation is kind of contrived, but I expect that people will do it, sooner or later, absent legal restrictions.
Is any of it infringing? Explain the knock-off music & art in popular media when they don’t want to pay royalty fees for the authentic article. Explain knock-off brands. Cheap imitations to sidestep copyright restrictions have been around long before generative AI, yet businesses aren’t getting sued: they apparently understand legal standards enough to safely imitate. Why is shoddy imitation for distribution okay when human-generated yet not when AI-generated?
I don’t think your understanding of copyright infringement is solid.
Even supposing someone manages to generate work whose distribution infringes copyright, wouldn’t legality follow the same model as a human requesting a commercial (human-based) service to generate that work?
This is a distressingly unusually solid analysis for lemmy. I agree with one exception–writing to memory absolutely counts as a distribution. Accordingly, if a generative model output an infringing work, it for sure could create liability for infringement. I think this will ultimately work similarly to music copyright where conscious/explicitly intentional copying is not itself the threshold test, but rather degree of similarity. And if you have prompts that specifically target towards infringement, you’re going to get some sort of contributory infringement structure. I think there is also potentially useful case law to look at in terms of infringement arising out of work-for-hire situations, where the contractor may not have infringed intentionally but the supervisor knew and intended their instructions to produce an effectively infringing work. That is, if there is any case law on this pretty narrow fact pattern.
It sounds like it would be an analogue issue that is already similarly solved in other respects.
For example, its not only illegal for someone to make and sell known illegal drugs, but its additionally illegal to make or sell anything that is not the specifically illegal drug but is analogous to it in terms of effect (and especially facets of chemical structure)
So any process that produces an end result analogous to copyright infringement would be viewed as copyright infringement, even if it skirts the existing laws on a technical basis, is probably what the prevailing approach will be
Hmm. I’m not familiar with that as a legal doctrine.
kagis
At least in the US — and this may not be the case everywhere — it sounds like there’s a law that produces this, rather than a doctrine. So I don’t think that there’s a general legal doctrine that would automatically apply here.
https://en.wikipedia.org/wiki/Federal_Analogue_Act
But I guess that it might be possible to pass a similar such law for copyright, though.