Hi! I am a second year PhD student at Mila and Université de Montréal, DIRO under the supervision of Gauthier Gidel and Courtney Paquette. Before my PhD, I studied Mathematics and Physics at Ecole Normale Supérieure, Paris and graduated from the Master Mathématiques de l'aléatoire, Orsay .

Research


I am working on the theoretical foundations of the scaling laws of performance-vs-compute empirically observed when training large AI models. I believe that it is possible to derive principled scaling strategies for the training hyperparameters (learning-rate, batch size, weight-decay, compte-optimal model and data sizes...), to maximize performance at scale. I am also more broadly interested in generative models, alignment on human preferences, and loss landscape analysis (lottery ticket hypothesis, linear mode connectivity).

Publications


Dimension-adapted Momentum Outscales SGD

Damien Ferbach, Katie Everett, Gauthier Gidel, Elliot Paquette, Courtney Paquette

Preprint.

We show how to choose the learning rates of stochastic momentum algorithm to improve the scaling laws exponents over SGD.

Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences

Damien Ferbach, Quentin Bertrand, Joey Bose, Gauthier Gidel

NeurIPS 2024 (spotlight).

A theoretical study of generative models trained on synthetic data which is curated through human preferences.

Proving Linear Mode Connectivity of Neural Networks via Optimal Transport

Damien Ferbach, Baptiste Goujaud, Gauthier Gidel, Aymeric Dieuleveut

AISTATS 2024.

We show that wide two-layer networks trained with SGD or multi-layer networks with iid weights can be linked in parameter space by low loss paths modulo a permutation of the neurons.

A General Framework for Proving the Equivariant Strong Lottery Ticket Hypothesis

Damien Ferbach*, Christos Tsirigotis*, Gauthier Gidel, Joey Bose

ICLR 2023.

We study the existence of sparse subnetworks within overparametrized equivariant networks, that can approximate any smaller equivariant network.

News


Jun 2025 Invited Talk at CRM Workshop: "Dimension-adapted Momentum Outscales SGD"
May 2025 Preprint: "Dimension-adapted Momentum Outscales SGD" available on arXiv
Apr 2025 Invited Talk at Google Deepmind: "Dimension-adapted Momentum to Outscale SGD"
Apr 2025 Invited Talk at RMT-ML-OPT seminar: "Compute-Optimal Scaling Laws of Stochastic Momentum Algorithms"
Sept 2024 Paper Accepted: "Self-Consuming Generative Models on Curated Data Provably Optimize For Human Preferences" accepted as spotlight at NeurIPS 2024

Awards