UK-LLM, NVIDIA Unveil Welsh-Reasoning AI Trained on Isambard-AI
Backed by government supercomputing with linguistic validation, the open release targets public services.
Overview
- The bilingual model builds on NVIDIA’s Nemotron family, with Llama Nemotron Super (49B) and Nemotron Nano (9B) post-trained to reason in Welsh.
- To create sufficient data, the team translated more than 30 million entries using NVIDIA NIM microservices with gpt-oss-120b and DeepSeek-R1.
- Training ran on the government-backed Isambard-AI supercomputer using DGX Cloud Lepton and hundreds of GH200 Grace Hopper Superchips.
- Bangor University’s Canolfan Bedwyr, led by senior terminologist Gruffudd Prys, verified machine-translated data and evaluated Welsh-specific grammar and usage.
- The model and Welsh datasets are slated for open availability to enterprise and public-sector users via providers including Nscale, with plans to extend the approach to other UK and international minority languages.