Jyo Pari

I’m a PhD student at MIT, creating continually learning models. Feel free to reach out if you are interested in collaborating / chatting. Also, check out Scale-ML!

Email | Scholar | Twitter

Papers + Mini Research + Ideas

> [Mini Research] Optimizing the Optimization Trajectory Sept 28, 2025 > [Paper] RL's Razor: Why Online Reinforcement Learning Forgets Less Sept 4, 2025 > [Mini Research] Peculiarities of Mixture-of-Expert Optimization July 13, 2025 > [Paper] Self-Adapting Language Models Jun 23, 2025 > [Mini Research] Reuse Can Be Useful Mar 1, 2025 > [Paper] General Intelligence Requires Reward-based Pretraining Feb 26, 2025 > [Idea] Towards Self-Editing Models: Part 1 Jan 19, 2025 > [Paper] Collective Model Intelligence Requires Compatible Specialization Nov 4, 2024

Literature Review

> [Paper Notes] Understanding PaTH Attention Sept 16, 2025 > [Paper Notes] Improving Recurrent Models with Group Theory Mar 22, 2025 > [Paper Notes] Recurrent Networks and Test Time Training (TTT) Feb 1, 2025 > [Paper Notes] Model Merging 2024 > [Paper Notes] Mixture of Experts (MoE) Fall 2023

Technical Notes

> [Notes] Distances Between Subspaces > [Notes] Symmetries in Neural Networks > [Notes] Discrete Optimal Transport > [Notes] Euler-Lagrange Equation > [Notes] Gumbel Softmax > [Notes] Langevin Sampling > [Notes] Centered Kernel Alignment