About this Seminar

The success of deep learning has hinged on learned functions dramatically outperforming hand-designed functions for many tasks. However, we still train models using hand designed optimizers acting on hand designed loss functions. I will argue that these hand designed components are typically mismatched to the desired behavior, and that we can expect meta-learned optimizers to perform much better. I will discuss the challenges and pathologies that make meta-training learned optimizers difficult. These include: chaotic and high variance meta-loss landscapes; extreme computational costs for meta-training; lack of comprehensive meta-training datasets; challenges designing learned optimizers with the right inductive biases; challenges interpreting the method of action of learned optimizers. I will share solutions to some of these challenges. I will show experimental results where learned optimizers outperform hand-designed optimizers in many contexts, and I will discuss novel capabilities that are enabled by meta-training learned optimizers.



Jascha Sohl-Dickstein is a principal scientist in Google DeepMind. He is most (in)famous for inventing diffusion models. His recent work has focused on theory of overparameterized neural networks, meta-training of learned optimizers, and understanding the capabilities of large language models. Before working at Google, Jascha was a visiting scholar in Surya Ganguli's lab at Stanford University, and an academic resident at the Khan Academy education nonprofit.  He earned his PhD in 2012 in the Redwood Center for Theoretical Neuroscience at UC Berkeley, in Bruno Olshausen's lab. Prior to his PhD, he worked sending rovers to Mars.

Seminar Details
Seminar Date
Thursday, October 12, 2023
12:00 PM - 1:00 PM
Happening As Scheduled