Why AI Is Not Stealing Your Stuff: At Least Not In The Way You Think It Is…

Social media is often a miserable place. Twitter/X is now almost completely overrun by Incels, Instagram is just for funny videos, Facebook is full of old people complaining about shit, and Threads, so promising as it started, has settled down into a hybrid zone of shameless self promotion and tribal allegiances along political lines.  But there’s one thing that the majority of users across all social media agree on: they hate AI.

The reasons given are a wildly mixed bunch of preconceptions, prejudices, misinformation and moral panic, reminiscent of the reactions to technologies and cultural forms as diverse as the novel, cinema, dance music, computers, cassette tapes, the iPod and media piracy.

The reasoning for this AI hatred is clustered around a number of issues including plagiarism, moral rights, ethical limitations, aesthetic prejudice, environmental impacts, and imaginary applications of AI technologies in the future.

Let’s just discuss the first one, as it’s the most pervasive. If you visit social media you’ll be bombarded with posts from people lamenting the fact that generative AI is essentially plagiarism. That’s the emotive term that gets used – plagiarism software – and it’s a very effective talking point.

Yet this argument is wrong. It’s not plagiarism. As I’ve argued before, AI is like an averaging machine that takes the sample data and produces an averaged output of all the relevant images and videos suggested by a prompt, be it text, or an image, or a video.

Don’t just take my word for it. I asked

Perplexity

0(0%) to write a summary of why generative AI isn’t the same as plagiarism. It very helpfully offered the following:

AI image generation is not the same as plagiarism for several key reasons:

  • “Novel creation: AI models synthesise new images based on patterns learned from training data, rather than directly copying exisiting works. The output is typically a new combination of elements, not an exact reproduction.
  • “Transformation of inputs: AI systems process and transform their training data, learning abstract features and styles rather than storing and recalling specific images. The generated images are the result of complex statistical processes, not simple copying.
  • ‘“Broad vs target influence: AI training data typically includes millions of diverse images, so any single source has minimal influence. This differs from targeted copying in traditional plagiarism.”

Perplexity itself is an AI that rivals ChatGPT in terms of what it can do with text, but unlike ChatGPT, it offers footnoted summaries of ideas, issues and arguments by searching key words online, collating the text and creating a report on your query, with references to what sources were used and where they came from [complete with URL].

While I asked Perplexity for the preceding summary, it unprompted also offered the following:

  • “Ethical concerns remain about AI’s use of copyrighted training data without permission. As the technology evolves, so must our frameworks for evaluating originality, attribution and fair use in AI generated content”

The tangle of issues around AI, particularly when it comes to the assertion that it’s plagiarism, get caught up with other related practices such as web scraping [collecting data from public facing websites, social media platforms and online archives for training sets], attribution and permission.

In Australia, the US and elsewhere, web scraping is legal and has been the basis of numerous products, apps and software, from financial market analysis tools, to mapping apps, to applications such as Perplexity AI.

The issues expand when you consider the ethical question that while it might be legal to scrape web sites for data for training sets, what do multibillion dollar commercial entities owe to the artists that their product is built upon? And if an artist can prove that their work has been used in the data set, are they entitled to financial compensation?

You’ll notice that these questions are not directly related to the strict definition of plagiarism. They have more to do with the moral and ethical questions of creating a product from public facing data.

There have been multiple cases around the world where courts have decided that someone using AI to create images, movies, designs etc cannot claim copyright over the generative output because they did not have enough input into the process to claim it as their own. The key issue was control, rather than process.

Again, this is a fundamental misunderstanding of the way generative AI is created, but as we’ve seen in the US, the courts are not exactly up to date with the technology, nor what is meant by the more arcane artistic arguments of what constitutes creativity etc.

Since March, when SORA was announced, there have been a number of new apps appearing that attempt to rival its ability to create incredibly convincing AI video from text and image prompts. These new apps like LumaLabs have the ability to produce some mind-bending weirdness as the initial prompt or image breaks down and the AI starts to ‘hallucinate’.

While all that’s been going on MidJourney, the leading text-to-image generative AI software, has introduced more control over output where users can now specify styles, stipulate the consistency of generated characters, and much more.

And just today RunwayML, SORAs main competitor, launched Gen3 for wide use. While it’s still early days, the software will soon include controls for camera angles, movement, in-painting [the ability to control the animation of up to 5 in-picture elements], shot length and so on.

As the controls become stronger the ‘AI is a tool’ argument gains even greater weight. If the person using the software can specify the outcome down to very fine details, and the images themselves are not plagiarised, then how could the argument that AI is inherently illegitimate as art have credibility?

There is one other issue here that should be mentioned. In the process of sampling data from the open web, temporary copies are made of every image, video and audio element, then they’re discarded. This is in itself a clear breech of copyright law – the unauthorised transfer of copyright material from one medium to another, what in the olden days was called ‘mechanical rights’.

While the argument that generative AI outputs are plagiarism are incorrect, the inputs, the basis of the generative software, are indeed infringements of mechanical copyright. And artists, filmmakers and musicians have a case.

The issue then is compensation, a royalty perhaps, for every unwitting contributor to training datasets upon which hugely profitable applications are built. Maybe a class action suit will be required. But just remember, AI isn’t stealing your stuff, it has borrowed it, and they need to pay.