Comedian and author Sarah Silverman, along with authors Christopher Golden and Richard Kadrey, have filed separate lawsuits against OpenAI and Meta in a US District Court, alleging copyright infringement. The lawsuits claim that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally-acquired datasets that contain their works. The authors state that these datasets were obtained from “shadow library” websites such as Bibliotik, Library Genesis, Z-Library, and others, where their books are available for download through torrent systems.
Golden and Kadrey have chosen not to comment on the lawsuit, while Silverman’s team has not responded to press inquiries at this time. Exhibits provided in the OpenAI suit demonstrate that ChatGPT can summarize the authors’ books when prompted, thus infringing on their copyrights. Silverman’s book, “Bedwetter,” is the first example shown being summarized by ChatGPT, followed by Golden’s book, “Ararat,” and Kadrey’s book, “Sandman Slim.” The claim emphasizes that the chatbot fails to reproduce any of the copyright management information included in the authors’ published works.
In the lawsuit against Meta, the authors allege that their books were included in the datasets used to train Meta’s LLaMA models. Meta introduced LLaMA, a set of open-source AI models, in February. The complaint provides a step-by-step explanation of why the plaintiffs believe these datasets have illicit origins. Meta’s own documentation on LLaMA points to sources for its training datasets, including one called ThePile, which was assembled by EleutherAI. ThePile, as mentioned in an EleutherAI paper, was created from a copy of the contents of the Bibliotik private tracker. The lawsuit argues that Bibliotik and other similar “shadow libraries” are blatantly illegal.
Both lawsuits claim that the authors did not give consent for their copyrighted books to be used as training material for the companies’ AI models. The lawsuits include six counts each, covering various types of copyright violations, negligence, unjust enrichment, and unfair competition. The authors are seeking statutory damages, restitution of profits, and more.
Attorneys Joseph Saveri and Matthew Butterick, who represent the three authors, have expressed concerns about ChatGPT’s ability to generate text similar to copyrighted materials. They have heard from other writers, authors, and publishers who share these concerns. Saveri has previously filed litigation against AI companies on behalf of programmers and artists. Getty Images has also filed an AI lawsuit, accusing Stability AI of training its model on millions of images protected by copyright. Saveri and Butterick are representing authors Mona Awad and Paul Tremblay in a similar case involving the company’s chatbot.
These lawsuits not only pose challenges for OpenAI and other AI companies but also push the boundaries of copyright law. The Vergecast has previously discussed the likelihood of lawsuits centered around copyright infringement in the AI field for years to come.
Inquiries for comments were made to Meta, OpenAI, and the Joseph Saveri Law Firm, but no responses were received by the time of publication.