9 April 07:08
New study claims OpenAI may have used copyrighted content in training its artificial intelligence models
Adrian Rusu
.webp)
Science IT&C
Foto: pixabay.com/ro
The research, conducted by teams from the University of Washington, Stanford and the University of Copenhagen, brings to light a method for detecting whether AI models have 'memorized' parts of their training data - a possible copyright infringement. The study focuses on identifying unique and unusual words in literary texts, known as 'big surprise' words. The results showed that GPT-4, one of OpenAI's models, appeared to have memorized parts of copyrighted fiction books, specifically from a dataset called BookMIA.