Bidirectional encoder representations from transformers is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state-of-the-art for large language models. As of 2020, BERT is a ubiquitous baseline in natural language processing experiments.
BERT is trained by masked token prediction and next sentence prediction. From Wikipedia