Authors: Monica Stătescu (partner), Marius Gheldiu (associate)
The realm of artificial intelligence (AI) has recently found itself entangled in legal disputes revolving around copyright infringement. These legal battles have raised important questions about the boundaries of AI training data and the implications for copyright protection.
After the major class-action lawsuit against Microsoft, its subsidiary GitHub, and OpenAI that accuses the companies of engaging in “software piracy on an unprecedented scale” through the development of their AI-powered coding assistant, GitHub Copilot, another significant legal battle has emerged.
Recently, authors Mona Awad and Paul Tremblay have filed a class-action lawsuit against OpenAI, the organization behind ChatGPT, claiming that their copyrighted works were unlawfully processed and used as training material for Chat GPT.
The authors argue that ChatGPT generated highly accurate summaries of their novels, indicating a direct utilization of their copyrighted material without permission. To demonstrate the alleged utilization of copyrighted material by ChatGPT sample summaries are included in the lawsuit.
The lawsuit against ChatGPT raises critical questions about the use of copyrighted works in training AI models. The legal battle will likely explore the boundaries of AI’s fair use of copyrighted material, particularly in the context of large language models.
While authors possess legal protections for their copyrighted works, in a press article written for The Guardian, it is argued that the challenge lies in establishing a direct financial loss attributable to AI systems trained on such material. Moreover, proving a causal link between AI-generated summaries and authors’ financial harm may pose a significant hurdle in building the case against ChatGPT.
The implications of the ChatGPT lawsuit are not limited to the United States; they extend across the Atlantic to Europe, where copyright laws differ.
European countries have in place specific exceptions to copyright law for text and data mining (TDM), allowing free use of copyright material for certain purposes in certain conditions. In concrete, through Directive 2019/790 on copyright in the Digital Single Market, two new mandatory exceptions for text and data mining were introduced in the EU legislature.
The first exception covers text and data mining carried out by research organizations and cultural heritage institutions for scientific research purposes, while the second exception is much more general and allows any entity to reproduce and use legally accessible works for the purposes of text and data mining, as long as the use of the works and other protected subject matter has not been expressly reserved by the rights holders.
As pointed out by Professor Eleonora Rosati, in the article Copyright as an Obstacle or an Enabler? A European Perspective on Text and Data Mining and its Role in the Development of AI Creativity dated 26 September 2019, by expressly providing the text and data mining exceptions, the EU legislator clarified that engaging in such activities without a licensing agreement or falling within the scope of available exceptions may lead to potential liability for copyright infringement.
In this context, while the regulatory framework may differ between the US and European Union, the lawsuits against ChatGPT and GitHub Copilot demonstrate the growing concerns surrounding AI’s use of copyrighted material and may offer valuable insights on this unmapped subject. Thus, the outcomes of these legal battles have the potential to shape the future of AI copyright, not only in the United States but also in Europe.
As AI continues to evolve, striking a balance between innovation and protecting creators’ rights becomes increasingly important. The courts’ decisions in these cases will play a pivotal role in establishing guidelines and precedents for AI’s interaction with copyrighted works, impacting the development and deployment of AI systems worldwide.