The interpretation this paper offers is very questionable.
It observes a so-called "replacement model" as a stand-in because it has a different architecture than the common LLMs, and lends itself to observing some "activation" patterns.
Then it liberally labels patterns observed in the "replacement model" with words borrowed from psychology, neuroscience and cognitive science. It's all very fanciful and clearly directed at being pointed at as evidence of something deeper or more complex than what LLMs plainly do: statistical modelling of languages.
Calling LLMs "next token predictor" is a bit of a cynical take, because that would be like calling a gaming engine a "pixel color processor". It's simplistic, yeah. But the polar opposite of spraying the explanation with convoluted inductive reasoning is just as bereft of substance.
It observes a so-called "replacement model" as a stand-in because it has a different architecture than the common LLMs, and lends itself to observing some "activation" patterns.
Then it liberally labels patterns observed in the "replacement model" with words borrowed from psychology, neuroscience and cognitive science. It's all very fanciful and clearly directed at being pointed at as evidence of something deeper or more complex than what LLMs plainly do: statistical modelling of languages.
Calling LLMs "next token predictor" is a bit of a cynical take, because that would be like calling a gaming engine a "pixel color processor". It's simplistic, yeah. But the polar opposite of spraying the explanation with convoluted inductive reasoning is just as bereft of substance.