The competition to lead in the field of A.I. has turned into a frantic search for the digital data necessary to advance the technology. Tech giants like OpenAI, Google, and Meta have resorted to questionable tactics, disregarding company policies and considering bending the law, as reported by The New York Times.
At Meta, the parent company of Facebook and Instagram, discussions were held last year among managers, lawyers, and engineers about potentially acquiring Simon & Schuster to access longer works, as revealed in internal meeting recordings obtained by The Times. They also explored the idea of collecting copyrighted data from various online sources, even if it meant facing legal challenges. They believed negotiating licenses with publishers, artists, musicians, and the news industry would be too time-consuming.
Similarly, Google, like OpenAI, reportedly transcribed YouTube videos to extract text for its A.I. models, potentially infringing on the copyrights of the original creators, according to sources familiar with the company’s practices.
Google also expanded its terms of service last year, allowing the company to access publicly available content such as Google Docs and restaurant reviews on Google Maps for its A.I. products. This move was intended to enhance the capabilities of its A.I. technology, as indicated by members of Google’s privacy team and an internal message seen by The Times.
These actions by tech companies highlight how online information has become crucial for the development of A.I. systems. The A.I. industry heavily relies on a wide range of data sources, including news articles, fictional works, user-generated content, and multimedia, to train the technology to generate text, images, sounds, and videos that mimic human creations.