Particle.news

Download on the App Store

Wikimedia Partners with Kaggle to Launch AI-Optimized Wikipedia Dataset

The newly released beta dataset offers machine-readable Wikipedia content to reduce server strain and improve access for AI developers.

Overview

  • The Wikimedia Foundation has partnered with Kaggle to release a beta dataset of structured Wikipedia content in English and French, designed for AI workflows.
  • The dataset includes abstracts, short descriptions, infobox data, image links, and segmented article sections, but excludes references and multimedia files.
  • Freely licensed under Creative Commons and similar licenses, the dataset aims to support smaller companies and independent researchers by providing accessible and lawful data.
  • This initiative seeks to deter unauthorized scraping by AI bots, which have significantly increased Wikimedia's server costs and disrupted human user access.
  • Wikimedia will monitor the dataset's impact on reducing server strain while continuing to explore broader solutions to protect its open knowledge mission.

Loading Articles...

Loading Quotes...