Events Calendar
Sign up

182 MEMORIAL DR, Cambridge, MA 02139

https://math.mit.edu/nmpde/
View map

Speaker:  Borjan Geshkovski (MIT-Math)

Title:  A mathematical perspective on transformers

Abstract:

This talk will report on several results, insights and perspectives Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet and I have found regarding Transformers. We model Transformers as interacting particle systems (each particle representing a token), with a non-linear coupling called self-attention. When considering pure self-attention Transformers, we show that particles cluster in long time to different geometric configurations determined by spectral properties of the model weights. We also cover Transformers with layer normalization, which amounts to considering the interacting particle system on the unit sphere. On high-dimensional spheres, we prove that all randomly initialized particles converge to a single cluster. The result is made more precise by describing the precise phase transition between the clustering and non-clustering regimes. The appearance of metastability, and ideas for the low-dimensional regime, will be discussed.

 

Event Details

See Who Is Interested

0 people are interested in this event