Richard Flanagan, winner of the Booker Prize for Fiction, has described a recent incident as “the biggest act of copyright theft in the history of the world.” This incident may involve stealing thousands of books written by some of Australia’s most renowned authors.
According to the allegations, the works have been stolen by the Books3 dataset, which is situated in the United States, and used to train generative AI for businesses like Meta and Bloomberg.
Flanagan, who discovered 10 of his works, including the multi-international award-winning novel The Narrow Road to the Deep North, published in 2013, on the Books3 dataset, expressed to Guardian Australia that he was deeply surprised by the finding made several days ago. Among the works that were found on the dataset was The Narrow Road to the Deep North.
“I felt as if my soul had been strip mined and I was powerless to stop it,” he said in a statement. “I felt as if my soul had been stripped mined.”
This is the most significant infringement of copyright in the history of the world.
It was confirmed to Australia on Wednesday by the Australian Publishers Association that as many as 18,000 fiction and nonfiction titles with Australian ISBNs (unique international standard book numbers) appeared to be affected by the copyright infringement. However, it is not yet clear what proportion of these are Australian editions of books written by international authors.
“We’re still working through [the data] to work out the impact in terms of Australian authors,” said Stuart Glover, a spokesperson for the Australian Publishers Association (APA).
This presents a significant legal and ethical dilemma for the publishing industry as a whole as well as for individual authors all over the world.
A search tool that was published on Monday by the American media platform The Atlantic and uploaded by the US Authors Guild on Wednesday revealed that the works of Peter Carey, Helen Garner, Kate Grenville, Anna Funder, Christos Tsiolkas and Thomas Keneally, as well as Flanagan and dozens of other high-profile Australian authors, were included in the pirated dataset that contained more than 180,000 titles. Flanagan was one of the authors who was included in the dataset.
The Australian Society of Authors has published a statement saying that it is “horrified” to find that the works of Australian writers are being used to train artificial intelligence without the permission of the authors. The statement was released on Thursday and said that the organization was “horrified” to learn this.
“Authors appropriately feel outraged,” added Lanchester. “Despite the fact that this technology is dependent on the books, journals, and essays written by authors, neither permission nor compensation was sought nor granted.”
Lanchester stated that the Australian literary sector, although not objecting in and of itself to developing technologies such as artificial intelligence (AI), was gravely worried about the lack of transparency visible in the development and monetisation of AI by global technology companies.
“Turning a blind eye to the legitimate rights of copyright owners threatens to diminish already precarious creative careers,” she added. “Turning a blind eye to the legitimate rights of copyright owners.”
“The prosperity of a few large corporations comes at the expense of the work of hundreds upon thousands of independent creators. This is not how a competitive market operates at all.
Josephine Johnston, the chief executive officer of Australia’s Copyright Agency, referred to the development of Books3 as “a free kick to big tech” at the expense of Australia’s creative and cultural life.
“Before people can truly understand what their legal rights may be,” she said, “we are going to need greater transparency – how these tools have been developed, trained, and how they operate.”
“It looks like we’re in this terrible situation now, where content owners – keeping in mind that the vast majority of them will be individual authors – may actually need to file lawsuits in order to enforce their rights.”
The Australian copyright law prevents data scraping, which protects the people who create the original information.
The inventor of ChatGPT, OpenAI, is currently facing legal action in the United States for the use of two book datasets called Books1 and Books2 (which do not appear to be connected with Books3), both of which are believed to have been stolen.
The North American horror and fantasy authors Mona Awad (author of Bunny) and Paul Tremblay (author of Cabin at End of the World) filed a lawsuit in July in a federal court in San Francisco, claiming that ChatGPT improperly ingested their books as part of its artificial intelligence (AI) training data. The authors claim that ChatGPT violated their intellectual property rights.
On the 28th of August, OpenAI submitted motion to dismiss lawsuit, arguing that authors “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models that are now at the forefront of artificial intelligence.”
On September 19th, the Writers Guild and 17 of its members, including blockbuster novelists John Grisham, George RR Martin, and Jodi Picoult, filed a case in a New York district court against OpenAI, seeking restitution for “flagrant and harmful infringements” of guild members’ registered copyrights. The complaint was filed against OpenAI in an effort to seek justice for “flagrant and harmful infringements” of guild members’ registered copyrights.
While the guild is aware that businesses like Meta and Bloomberg have utilized the Books3 dataset to train their LLMs, the guild states in a statement published on its website that it is not yet known whether OpenAI is utilizing Books3 to train its ChatGPT models GPT 3.5 or GPT 4.
Both OpenAI, which has not yet provided an official response to the guild’s complaint, and Meta have been contacted by Guardian Australia to request a statement.
Wired, a magazine published in the United States that focuses on technology, said on September 4 that Bloomberg had informed a Danish anti-piracy organization called Rights Alliance that the corporation did not intend to train future versions of the BloombergGPT using Books3. Books3 was mentioned in the report.
The media questions were met with a refusal to comment from Bloomberg.
The American Psychological Association (APA) stated that due to the international scope of the problem, enforcement and punishment would face major difficulties. In light of this, the APA has joined the authors’ organization in advocating for the regulation of AI technologies.
The Department of Industry, Science, and Resources’ discussion paper on promoting responsible artificial intelligence had its consultation period end one month ago.
There is now an investigation being conducted by the Australian parliament into the potential applications of generative artificial intelligence inside the country’s educational system.
He stated, “It has power, and we do not,” implying that we do not possess any power.
“If it really cares about our culture, it needs to stand up for it right now and fight for it.”