Tech
I
March 17, 2023

Artificial artist, or artificial plagiarist?

AI artwork is going mainstream, but it’s doing so off the back of human artists.

What’s happening?

In the last few months, the creative artificial intelligence systems that I wrote about back in 2021 have traded the comfort and anonymity of universities, labs and Big Tech data centres for the bright lights of social media and the scrutiny of the mainstream press.

By now you will have seen friends, celebrities and “influencers” swap their traditional profile photos for what look like custom, contemporary portraits, possibly of them wearing a space-suit. Don’t be fooled. These people have not commissioned an artist to capture their essence on canvas (or in pixels). They have instead bought an app, handed an AI a dozen or so images of themselves, selected some themes and waited a minute or two, with some fairly spectacular results:

Lensa via Instagram

This image was created with Lensa, which rapidly climbed to the top of the Apple app store within days of its release late last year.

The systems that create these images, like Lensa or Midjourney, are called GANs, or “Generative Adversarial Networks”. In a nutshell, one AI (the “discriminator”) teaches itself what real art looks like, having been fed massive amounts of data consisting of images and labels explaining what's in the image. A second AI (the “generator”) creates random images that are sent to the discriminator, which determines whether or not they look like a real, human-produced image. Images that don’t pass muster are sent back with a realness score and the generator tries again. This happens over and over until the generator produces an image that the discriminator believes is real.

Why is it important?

Every single part of this is worthy of a deep dive, but our concern (as technology and IP lawyers) is with the data sets used to train the discriminators. Machine learning (the concept underpinning all modern AI) relies on vast amounts of data, frequently served up in the form of datasets comprised of information “scraped” from the internet by “bots” (automated programs designed to carry out a specific task).

Midjourney, for instance, (along with its competitors like DALL-E and Stable Diffusion) has been trained on a dataset called LAION-5B. This is an openly accessible image/text database that contains URLs pointing to approximately 5.85 billion images and their associated captions/descriptions. At no point has LAION, the dataset’s compiler (a non-profit organisation with the aim of making it easier for the general public to research and develop machine learning models), asked the creators or copyright holders of the scraped images for permission. And herein lies the problem.

From a pure copyright law perspective, LAION has an argument that they haven’t done anything wrong (although this is going to be tested soon – as we’ll come to). Almost all global copyright rules provide for a certain amount of “fair use”, and it is arguable that compiling a dataset for educational and research purposes is “fair”. In LAION’s case, it is unclear whether they would even be considered to have used the images at all, as the dataset they compiled just contains the URLs (the web addresses where the images can be found).

Furthermore, Midjourney and the like no longer need to rely on the dataset. The AIs have been trained, and don’t need to directly reference the images. Much like how I don’t need to reference an annotated picture of a 747 to know when I’m looking at a 747, Midjourney no longer needs to reference existing images to create new ones. If I ask it to create a “mid-century home in a lush valley with gravel driveway and a BMW E9”, it can create an entirely original image in a few seconds without copying an existing one:

Midjourney – released under a Creative Commons licence

Car nerds among you will know that that is most definitely not an E9, but it captures the essence of one, much like if someone tried to draw one from memory.

However, leaving aside technical arguments, these systems represent a major challenge to the fundamental purpose of copyright. Put simply, copyright laws recognise the value to society of creative output, and grant a creator the monopoly over their work, allowing them to monetise it and protect it from theft, derogatory treatment and misattribution. But Midjourney can take an artist’s style (provided they were included in the training dataset) and create a brand new image in seconds. Possibly one that the artist would not want to be associated with. I asked Midjourney to draw Joseph Stalin in the style of the legendary Jack Kirby (I intentionally chose an artist who isn’t with us anymore) and it made these in about 15 seconds:

Midjourney – released under a Creative Commons licence

Those will look pretty familiar if you have looked at a comic book in last 50 years, but not exactly Mr Kirby’s usual subject matter (such as Captain America)!

Web cartoonist Sarah Andersen (of Sarah’s Scribbles) recently wrote in the New York Times about her struggles with her work being appropriated, and the ease with which AI image generators allow this to happen. It started with crudely photoshopped images swapping out the speech bubbles, but with these new services, entirely new frames can be generated using Sarah’s distinctive, black-and-white style. Her work was included in the LAION-5B dataset without her knowledge or permission.

The implications of this are profound, not only from an artistic integrity standpoint, but from a commercial one too. Where an artist with a reasonable online profile might have previously been commissioned to create a work, the potential customer can now, for a few bucks (or even a free trial) have a machine create a specific work “in the style of” the artist of their choosing. This has the potential to seriously dent the earnings of an entire industry. Add to this the implications of a world in which the quintessentially human practice of creating and consuming art is diluted by a glut of soulless, AI-created works, and the fun new apps start to look a little dystopian.

Images in training dataset

5.8bn

No items found.

What now?

It turns out we may not have to wait too long to find out what the courts make of all this. Andersen, along with several other established artists, has now launched legal action against Midjourney and Stability AI (the company behind Stable Diffusion). They allege that, in using the LAION-5B dataset to train their AIs, these companies have infringed the rights of “millions of artists”, as the art was scraped without their consent. Similar suits have also been launched against other AI developers, specifically in relation to CoPilot, an AI programming/coding model that was trained using scraped code. These actions are being taken in the USA, but given the intertwined nature of global IP laws, the outcomes will reverberate across the world and are bound to influence how the New Zealand courts approach these sorts of issues.

What can I do?

In the meantime, if you are an artist wondering how you can protect yourself from the rise of the machines, there are a couple of things you can do. First, you can find out if your work was included in the dataset, by checking “Have I Been Trained” here. Not much you can do about it yet, but at least you’ll know. As I’ve discussed before, it is essentially impossible to “untrain” an AI, much like asking someone to forget something is unlikely to get you very far. Hopefully, as these offerings mature, artists will be able to request that the services block their name from being used as a prompt.

If you haven’t been included in the set, or are just embarking on your artistic career, there are some things you can do to prevent your work being included in a future set (while still maintaining an online presence):

  • Robots.txt  Make sure your website, and any website that hosts or sells your work, contains a robots.txt file. This is a tiny file that sits on the back-end web-server. It prevents legitimate scraping tools from downloading or recording anything on that server. Any hosting service worth their salt will know what this is, and will be able to deploy it in minutes. That said, this may make your work harder to find on search engines.
  • Incorrect tags  The data sets are only useful if the images are also tagged with an accurate description. It may make your work harder to find, but it will also make the images unusable for the purposes of AI training.
  • Watermarks  I know they are ugly, but they will limit how useful the image is.
  • Terms of Use  If your website has Terms of Use (ideally with a “click to accept” before accessing the content) that set out that scraping is not permitted, then you may have a contractual path to recourse should your work get scraped – although the enforceability of these terms – from both a legal and practical perspective – is not clear-cut.
  • Passing off and misleading conduct in trade  Where AI is used to recreate a specific artist’s style (particularly where this is for commercial gain) it will be worth considering whether this could be challenged on the basis that it constitutes passing off, or a breach of applicable consumer or fair trading legislation.

Clearly none of these options are ideal, so hopefully the industry (or the courts) can find solutions to these issues soon, as these will not be the last AI tools to train with scraped online data.

Article Link

Dowload Resource

Dowload Resource

Insights

Tech
March 17, 2023

Artificial artist, or artificial plagiarist?

AI artwork is going mainstream, but it’s doing so off the back of human artists.

What’s happening?

In the last few months, the creative artificial intelligence systems that I wrote about back in 2021 have traded the comfort and anonymity of universities, labs and Big Tech data centres for the bright lights of social media and the scrutiny of the mainstream press.

By now you will have seen friends, celebrities and “influencers” swap their traditional profile photos for what look like custom, contemporary portraits, possibly of them wearing a space-suit. Don’t be fooled. These people have not commissioned an artist to capture their essence on canvas (or in pixels). They have instead bought an app, handed an AI a dozen or so images of themselves, selected some themes and waited a minute or two, with some fairly spectacular results:

Lensa via Instagram

This image was created with Lensa, which rapidly climbed to the top of the Apple app store within days of its release late last year.

The systems that create these images, like Lensa or Midjourney, are called GANs, or “Generative Adversarial Networks”. In a nutshell, one AI (the “discriminator”) teaches itself what real art looks like, having been fed massive amounts of data consisting of images and labels explaining what's in the image. A second AI (the “generator”) creates random images that are sent to the discriminator, which determines whether or not they look like a real, human-produced image. Images that don’t pass muster are sent back with a realness score and the generator tries again. This happens over and over until the generator produces an image that the discriminator believes is real.

Why is it important?

Every single part of this is worthy of a deep dive, but our concern (as technology and IP lawyers) is with the data sets used to train the discriminators. Machine learning (the concept underpinning all modern AI) relies on vast amounts of data, frequently served up in the form of datasets comprised of information “scraped” from the internet by “bots” (automated programs designed to carry out a specific task).

Midjourney, for instance, (along with its competitors like DALL-E and Stable Diffusion) has been trained on a dataset called LAION-5B. This is an openly accessible image/text database that contains URLs pointing to approximately 5.85 billion images and their associated captions/descriptions. At no point has LAION, the dataset’s compiler (a non-profit organisation with the aim of making it easier for the general public to research and develop machine learning models), asked the creators or copyright holders of the scraped images for permission. And herein lies the problem.

From a pure copyright law perspective, LAION has an argument that they haven’t done anything wrong (although this is going to be tested soon – as we’ll come to). Almost all global copyright rules provide for a certain amount of “fair use”, and it is arguable that compiling a dataset for educational and research purposes is “fair”. In LAION’s case, it is unclear whether they would even be considered to have used the images at all, as the dataset they compiled just contains the URLs (the web addresses where the images can be found).

Furthermore, Midjourney and the like no longer need to rely on the dataset. The AIs have been trained, and don’t need to directly reference the images. Much like how I don’t need to reference an annotated picture of a 747 to know when I’m looking at a 747, Midjourney no longer needs to reference existing images to create new ones. If I ask it to create a “mid-century home in a lush valley with gravel driveway and a BMW E9”, it can create an entirely original image in a few seconds without copying an existing one:

Midjourney – released under a Creative Commons licence

Car nerds among you will know that that is most definitely not an E9, but it captures the essence of one, much like if someone tried to draw one from memory.

However, leaving aside technical arguments, these systems represent a major challenge to the fundamental purpose of copyright. Put simply, copyright laws recognise the value to society of creative output, and grant a creator the monopoly over their work, allowing them to monetise it and protect it from theft, derogatory treatment and misattribution. But Midjourney can take an artist’s style (provided they were included in the training dataset) and create a brand new image in seconds. Possibly one that the artist would not want to be associated with. I asked Midjourney to draw Joseph Stalin in the style of the legendary Jack Kirby (I intentionally chose an artist who isn’t with us anymore) and it made these in about 15 seconds:

Midjourney – released under a Creative Commons licence

Those will look pretty familiar if you have looked at a comic book in last 50 years, but not exactly Mr Kirby’s usual subject matter (such as Captain America)!

Web cartoonist Sarah Andersen (of Sarah’s Scribbles) recently wrote in the New York Times about her struggles with her work being appropriated, and the ease with which AI image generators allow this to happen. It started with crudely photoshopped images swapping out the speech bubbles, but with these new services, entirely new frames can be generated using Sarah’s distinctive, black-and-white style. Her work was included in the LAION-5B dataset without her knowledge or permission.

The implications of this are profound, not only from an artistic integrity standpoint, but from a commercial one too. Where an artist with a reasonable online profile might have previously been commissioned to create a work, the potential customer can now, for a few bucks (or even a free trial) have a machine create a specific work “in the style of” the artist of their choosing. This has the potential to seriously dent the earnings of an entire industry. Add to this the implications of a world in which the quintessentially human practice of creating and consuming art is diluted by a glut of soulless, AI-created works, and the fun new apps start to look a little dystopian.

Images in training dataset

5.8bn

No items found.

What now?

It turns out we may not have to wait too long to find out what the courts make of all this. Andersen, along with several other established artists, has now launched legal action against Midjourney and Stability AI (the company behind Stable Diffusion). They allege that, in using the LAION-5B dataset to train their AIs, these companies have infringed the rights of “millions of artists”, as the art was scraped without their consent. Similar suits have also been launched against other AI developers, specifically in relation to CoPilot, an AI programming/coding model that was trained using scraped code. These actions are being taken in the USA, but given the intertwined nature of global IP laws, the outcomes will reverberate across the world and are bound to influence how the New Zealand courts approach these sorts of issues.

What can I do?

In the meantime, if you are an artist wondering how you can protect yourself from the rise of the machines, there are a couple of things you can do. First, you can find out if your work was included in the dataset, by checking “Have I Been Trained” here. Not much you can do about it yet, but at least you’ll know. As I’ve discussed before, it is essentially impossible to “untrain” an AI, much like asking someone to forget something is unlikely to get you very far. Hopefully, as these offerings mature, artists will be able to request that the services block their name from being used as a prompt.

If you haven’t been included in the set, or are just embarking on your artistic career, there are some things you can do to prevent your work being included in a future set (while still maintaining an online presence):

  • Robots.txt  Make sure your website, and any website that hosts or sells your work, contains a robots.txt file. This is a tiny file that sits on the back-end web-server. It prevents legitimate scraping tools from downloading or recording anything on that server. Any hosting service worth their salt will know what this is, and will be able to deploy it in minutes. That said, this may make your work harder to find on search engines.
  • Incorrect tags  The data sets are only useful if the images are also tagged with an accurate description. It may make your work harder to find, but it will also make the images unusable for the purposes of AI training.
  • Watermarks  I know they are ugly, but they will limit how useful the image is.
  • Terms of Use  If your website has Terms of Use (ideally with a “click to accept” before accessing the content) that set out that scraping is not permitted, then you may have a contractual path to recourse should your work get scraped – although the enforceability of these terms – from both a legal and practical perspective – is not clear-cut.
  • Passing off and misleading conduct in trade  Where AI is used to recreate a specific artist’s style (particularly where this is for commercial gain) it will be worth considering whether this could be challenged on the basis that it constitutes passing off, or a breach of applicable consumer or fair trading legislation.

Clearly none of these options are ideal, so hopefully the industry (or the courts) can find solutions to these issues soon, as these will not be the last AI tools to train with scraped online data.

Article Link

Dowload Resource

Dowload Resource

Insights

Get in Touch