Particle.news
Download on the App Store

Google Releases T5Gemma 2, a Compact Multimodal Encoder-Decoder With 128K Context

Pre-trained checkpoints are available today across major platforms for developers to post-train before deployment.

Overview

  • The family ships in compact configurations of 270M-270M (~370M total excluding the vision encoder), 1B-1B (~1.7B) and 4B-4B (~7B) parameters suitable for on-device use.
  • Tied encoder–decoder embeddings and a merged decoder self- and cross-attention layer reduce parameters and simplify the architecture for more efficient inference.
  • A built-in vision encoder enables image-plus-text understanding for tasks such as visual question answering, and training data expands coverage to more than 140 languages.
  • The models handle context windows up to 128K tokens using Gemma 3’s alternating local and global attention, with reported quality gains over Gemma 3 and the original T5Gemma.
  • Pre-trained checkpoints and the paper are live on arXiv, Kaggle, Hugging Face, Colab and Vertex AI, and Google is not releasing post-trained or instruction-tuned checkpoints.