Mountain Banner
I’m Jyo Pari, a PhD student at MIT, studying how models can continually learn through advances in architecture and optimization.
▸ Alongside writing educational posts, I’ll also share more refined brainstorming articles as part of an open-science effort to encourage collaboration. If any idea resonates with you and you’d like to explore it further, feel free to reach out!
▸ Also, check out Scale-ML a student led MIT organization focused on scaling in deep learning

[Paper Notes] Improving Recurrent Models with Group Theory

Notes on some papers that use householder transformations to enable state-tracking

Mar 22, 2025  Author:  Jyo Pari   |   Editor:  N/A

Reuse Can Be Useful

Thoughts and mini experiments on layer reuse in Transformers

Mar 1, 2025  Author:  Jyo Pari   |   Editor:  N/A

[Paper Notes] Recurrent Networks and Test Time Training (TTT)

Notes on some papers that study how recurrent models are doing a form of TTT....

Feb 1, 2025  Author:  Jyo Pari   |   Editor:  N/A

Towards Self-Editing Models: Part 1

Models that self-update, refine, and grow in complexity over time introduce unique possibilities and challenges....

Jan 19, 2025  Author:  Jyo Pari   |   Editor:  N/A

[Paper Notes] Model Merging

Techniques and challenges in merging multiple machine learning models into a cohesive system...

[Paper Notes] Mixture of Experts (MoE)

Notes on some papers that study MoEs...

Fall, 2023  Author:  Jyo Pari   |   Editor:   Minyoung (Jacob) Huh

[Paper Notes] Distances Between Subspaces

Grassman Metric

[Paper Notes] Symmetries in Neural Networks

Understanding how symmetry in networks can improve optimization...

Discrete Optimal Transport

An exploration of discrete optimal transport methods and their applications in machine learning...

Euler-Lagrange Equation

Euler-Lagrange equation and its significance in calculus of variations...

[Paper Notes] Gumbel Softmax

An overview of the Gumbel-Softmax distribution and its utility in differentiable sampling...

Langevin Sampling

Exploring Langevin dynamics as a method for sampling from complex distributions...

Centered Kernel Alignment

How can we compare representations between networks...