Events Calendar
Sign Up

SpeakerTristan Bepler (New York Structural Biology Center)

Title: Breaking scaling laws in protein language models with the protein evolutionary transformer (PoET)

Abstract: Protein language models (PLMs) are powerful tools for extracting information from large natural protein sequence databases. By learning from the manifold of natural proteins, these models are able to learn structural and functional properties in an unsupervised manner, making them powerful foundation models for protein structure and function prediction. However, these models become increasingly impractical as they scale to more and more parameters, need to be retrained to incorporate new data, and generally lack controllability. In this talk, I'll discuss PoET (the Protein Evolutionary Transformer), a fully generative model of whole protein families as sequences-of-sequences. By reformulating protein language modeling as a family-level, rather than individual protein-level, generative problem, PoET learns to extract evolutionary signatures from extremely small numbers of example sequences to generate novel proteins and model fitness distributions. Through controlling this sequence context, PoET can be prompted to generate proteins from any target distribution of interest while benefiting from learning general principles across the entire natural protein landscape. This enables PoET to achieve state-of-the-art performance for zero-shot variant effect prediction across deep mutational scanning and clinical datasets, without any structure conditioning. Homology-augmented PoET embeddings are state-of-the-art for transfer learning and protein function prediction, enabling accurate function predictors to be trained with 10x less data than alternative foundation models. PoET is also an order-of-magnitude smaller than other transformer-based PLMs. PoET is open source on github (https://github.com/OpenProteinAI/PoET) and is also available through the OpenProtein.AI web app where it underpins our protein property prediction and design tools.

In person or on Zoom at https://mit.zoom.us/j/93513735220

Event Details