Common Crawl, a nonprofit that archives billions of webpages, faces controversy for allegedly scraping paywalled articles for AI model training, benefiting companies like OpenAI and Google. Despite claiming to only collect freely available content, investigations reveal the organization has not complied with publishers’ requests to remove their articles. Rich Skrenta, Common Crawl’s executive director, argues that publishers should accept this practice as part of the evolving digital landscape. Consequently, millions of news articles from major outlets are included in AI training datasets, raising ethical concerns about content ownership and access.
Want More Context? 🔎
Loading PerspectiveSplit analysis...
