Wikimedia Deutschland Launches Wikidata Embedding Project With Vector Search and MCP Support
The vectorized release gives LLMs a public, verifiable knowledge base designed for retrieval‑augmented generation.
Overview
- The database is now publicly available on Toolforge, with a developer webinar scheduled for October 9.
- The project applies vector-based semantic search to roughly 120 million Wikidata entries to power natural-language queries.
- Support for the Model Context Protocol lets AI systems connect directly to the dataset as a structured external source.
- Wikimedia Deutschland developed the effort with Jina and IBM-owned DataStax, with leaders stressing an open, collaborative approach.
- The new resource improves on prior keyword and SPARQL tools to ground model outputs in editor-verified facts as AI developers seek higher-quality data, with legal pressures exemplified by reports of Anthropic’s $1.5 billion settlement offer.