9 April 07:08

New study claims OpenAI may have used copyrighted content in training its artificial intelligence models

Adrian Rusu

Science IT&C

Foto: pixabay.com/ro

The research, conducted by teams from the University of Washington, Stanford and the University of Copenhagen, brings to light a method for detecting whether AI models have 'memorized' parts of their training data - a possible copyright infringement. The study focuses on identifying unique and unusual words in literary texts, known as 'big surprise' words. The results showed that GPT-4, one of OpenAI's models, appeared to have memorized parts of copyrighted fiction books, specifically from a dataset called BookMIA.

Sources

Control F5

New Study Suggests OpenAI Models Memorized Copyrighted Material

Personalized news feed, AI-powered search, and notifications in a more interactive experience.

OpenAi artificial intelligence