The Atlantic has launched a groundbreaking searchable database cataloging millions of music tracks currently used to train artificial intelligence models, raising profound questions about copyright and ethical AI practices. In a detailed investigation, reporter Alex Reisner has identified four extensive datasets that have garnered substantial attention for their scale and availability.
Two of these datasets are particularly staggering, containing 12 million and 9 million tracks, respectively, while the other two, although smaller, offer over 100,000 songs each. This remarkable initiative not only makes the music accessible but also sheds light on a shadowy corner of AI development.
Reisner's research reveals that these datasets have been downloaded thousands of times. Major tech players, including Google and Stability AI, have acknowledged using some of this music in their research, highlighting the pervasive nature of these datasets in training AI algorithms. However, the accessibility of this music is complicated; while many songs are available online, they often come with legal caveats. For instance, the Free Music Archive allows personal streaming but mandates licensing for any commercial application.
Interestingly, three of the four datasets consist of links to songs available on popular platforms like YouTube and Spotify. To compile these datasets, AI developers often employ automation tools that can bypass login protocols and ads, violating the terms of service of these platforms. This practice raises significant ethical and legal concerns as it undermines the revenue that creators rely on for their work.
The extensive range of artists featured in these datasets includes iconic names such as Lady Gaga, Radiohead, Wu-Tang Clan, and Bruce Springsteen, as well as underground figures like experimental composer Hainbach. The public can explore these datasets for themselves via The Atlantic's dedicated AI Watchdog site, which facilitates searches through the diverse array of songs, books, and media contributing to the evolution of AI.
This development reflects a growing tension in the music industry and raises questions about copyright, ownership, and the future of creative expression in an era dominated by AI technology. As the conversation surrounding AI and music continues to evolve, The Atlantic's database serves as both a resource and a reminder of the complexities facing artists and technologists alike.
Source: The Verge
Source: The Verge