-
The Ceiling
Transformers are provably limited to the TC⁰ complexity class. They cannot, by construction, perform entity tracking or code execution. A new paper from UC Berkeley proposes M²RNN — non-linear RNNs with matrix-valued states — that break through this mathematical ceiling while remaining efficient enough for 7-billion-parameter models.
-
Attention Residuals: The 11-Year Oversight
Residual connections have been unchanged since ResNet in 2015. Kimi's Attention Residuals paper fixes a fundamental flaw — and does it with a beautiful theoretical insight about the duality between depth and time.