Blog

Eleuther AI releases 8TB collection of licensed and open training data – Computerworld


AI research organization Eleuther AI has launched a massive text database, Common Pile v0.1, that can be used to train AI systems, according to Techcrunch. The 8TB database consists exclusively of publicly licensed texts, or texts that are classified as public domain.

Common Pile v0.1 was developed over two years in collaboration with Poolside, Hugging Face, the US Library of Congress and the University of Toronto, among others.

The data collection was released after concerns arose about several generative AI (genAI) companies using copyrighted material to train their models without the permission of the copyright owners. Eleuther AI was also behind the collection, The Pile, which has become a central point in the debate; it now wants to show with Common Pile v0.1 that training is possible without copyrighted material.


Source link

Related Articles

Back to top button
close