Overview
- Google published model weights, code, evaluation scripts, and privacy-accounting tools, with downloads available on Hugging Face and Kaggle.
- VaultGemma applies differential privacy at the sequence level to curb memorization and reduce the risk of reproducing training data.
- New scaling laws and optimization guidance advise training smaller models with much larger batch sizes to balance compute, privacy, and utility.
- Benchmark results show performance roughly comparable to older, similar-sized non-private models such as GPT-2, highlighting a current utility gap.
- Positioned for research rather than production use, the release targets safer experimentation on sensitive datasets across regulated sectors.