Summary
Wikipedia is discouraging AI developers from scraping its platform by releasing a structured dataset optimized for AI training, in collaboration with Kaggle. The dataset, available in English and French, includes machine-readable article data designed for modeling and analysis while omitting references and non-written elements. This initiative aims to alleviate server strain caused by automated bots and improve data access for smaller companies and independent data scientists, with Kaggle expressing excitement about hosting the Wikimedia Foundation’s data.