See also my Google Scholar and Semantic Scholar pages.
Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
Yihe Dong, Lorenzo Noci, Mikhail Khodak, and M. Li
Preprint (2025). [arXiv]
Don’t be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey, Bin Claire Zhang, Lorenzo Noci, M. Li, Blake Bordelon, Shane Bergsma, Cengiz Pehlevan, Boris Hanin, and Joel Hestness
NeurIPS (2025). [arXiv]
Analysis of Langevin Monte Carlo from Poincaré to Log-Sobolev
Sinho Chewi, Murat A. Erdogdu, M. Li, Ruoqi Shen, and Matthew Zhang (alphabetical).
Foundations of Computational Mathematics (2024). COLT (2022) Extended Abstract. [arXiv] [Journal] [Proceeding]
Sampling from the Mean-Field Stationary Distribution
Yunbum Kook, Matthew S. Zhang, Sinho Chewi, Murat A. Erdogdu, and M. Li.
COLT (2024). [arXiv] [Proceeding]
Differential Equation Scaling Limits of Shaped and Unshaped Neural Networks
M. Li and Mihai Nica.
TMLR (2024). [arXiv] [OpenReview]
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon, Lorenzo Noci, M. Li, Boris Hanin, and Cengiz Pehlevan.
ICLR (2024). [arXiv] [OpenReview]
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci*, Chuning Li*, M. Li*, Bobby He, Thomas Hofmann, Chris Maddison, and Daniel M. Roy.
NeurIPS (2023). [arXiv] [Proceeding]
Improved Discretization Analysis for Underdamped Langevin Monte Carlo
Matthew Zhang, Sinho Chewi, M. Li, Krishnakumar Balasubramanian, and Murat A. Erdogdu.
COLT (2023). [arXiv] [Proceeding]
Riemannian Langevin Algorithm for Solving Semidefinite Programs
M. Li and Murat A. Erdogdu.
Bernoulli (2023). [arXiv] [Journal]
Student Research Presentation Award at SSC 2021.
The Neural Covariance SDE: Shaped Infinite-Depth-and-Width Networks at Initialization
M. Li, Mihai Nica, and Daniel M. Roy.
NeurIPS (2022), Oral. [arXiv] [Proceeding] [Code] [DL Foundations at UMD (Video)] [OPTML++ at MIT (Video)]
Acceleration of Gossip Algorithms through the Euler-Poisson-Darboux Equation
Raphaël Berthier and M. Li (alphabetical).
IMA Journal of Applied Mathematics (2022). [arXiv] [Journal]
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
M. Li, Mihai Nica, and Daniel M. Roy.
NeurIPS (2021). [arXiv] [Proceeding] [Code]
Higher Order Generalization Error for First Order Discretization of Langevin Diffusion
M. Li and Maxime Gazeau.
Preprint (2021). [arXiv]
* Equal Contribution.
