▸ Alongside writing educational posts, I’ll also share more refined brainstorming articles as part of an open-science effort to encourage collaboration. If any idea resonates with you and you’d like to explore it further, feel free to reach out!
▸ Also, check out Scale-ML a student led MIT organization focused on scaling in deep learning
[Paper Notes] Improving Recurrent Models with Group Theory
Notes on some papers that use householder transformations to enable state-tracking
Reuse Can Be Useful
Thoughts and mini experiments on layer reuse in Transformers
[Paper Notes] Recurrent Networks and Test Time Training (TTT)
Notes on some papers that study how recurrent models are doing a form of TTT....
Towards Self-Editing Models: Part 1
Models that self-update, refine, and grow in complexity over time introduce unique possibilities and challenges....
[Paper Notes] Model Merging
Techniques and challenges in merging multiple machine learning models into a cohesive system...
[Paper Notes] Mixture of Experts (MoE)
Notes on some papers that study MoEs...
[Paper Notes] Distances Between Subspaces
Grassman Metric
[Paper Notes] Symmetries in Neural Networks
Understanding how symmetry in networks can improve optimization...
Discrete Optimal Transport
An exploration of discrete optimal transport methods and their applications in machine learning...
Euler-Lagrange Equation
Euler-Lagrange equation and its significance in calculus of variations...
[Paper Notes] Gumbel Softmax
An overview of the Gumbel-Softmax distribution and its utility in differentiable sampling...
Langevin Sampling
Exploring Langevin dynamics as a method for sampling from complex distributions...
Centered Kernel Alignment
How can we compare representations between networks...