Particle.news

Download on the App Store

Wikimedia Deutschland Launches Wikidata Embedding Project With Vector Search and MCP Support

The vectorized release gives LLMs a public, verifiable knowledge base designed for retrieval‑augmented generation.

Overview

  • The database is now publicly available on Toolforge, with a developer webinar scheduled for October 9.
  • The project applies vector-based semantic search to roughly 120 million Wikidata entries to power natural-language queries.
  • Support for the Model Context Protocol lets AI systems connect directly to the dataset as a structured external source.
  • Wikimedia Deutschland developed the effort with Jina and IBM-owned DataStax, with leaders stressing an open, collaborative approach.
  • The new resource improves on prior keyword and SPARQL tools to ground model outputs in editor-verified facts as AI developers seek higher-quality data, with legal pressures exemplified by reports of Anthropic’s $1.5 billion settlement offer.