I didn't use the word "think" in the human sense of the word, or even in the consciousness/qualia sense of the word. I of course only meant inference (Matrix multiplications)
What I like to say however is that LLMs are doing genuine "reasoning"; and what the emergent "intelligent" behavior of LLMs has proven to mankind is that "reasoning" and "consciousness/qualia" are separable things (not identical), and I think as recently as 2021 we didn't really know/expect that to be the case.
I always assume some people might define "thinking" as requiring "consciousness", because up until about 2022 pretty much every human on the planet did.
Consciousness/qualia actually correlates stronger to brain waves than to synaptic activity, and for that and a lot of other reasons (that I'm presently writing a paper on) I think consciousness is a phenomena made up of waves, and while it may indeed be "substrate independent" it's not just purely calculations, and therefore cannot be done in computers. Even a perfect simulation of a human brain in a computer would be nothing but a "predictor" of behavior, rather than a "feeler" of behavior.
Color me intensely skeptical. The actual neural processing absolutely corresponds with the content of thoughts. So you’re saying the experience of consciousness is something unrelated to that, yet necessarily linked through some unspecified mechanism? Because clearly the experience of consciousness depends on the content of what we are thinking and how, and is affected by things like neurotransmitter concentrations (emotional affects) which would be inaccessible to large-scale brainwave phenomena.
Basically the theory is that what neurons are doing is carrying I/O signals into a physical 3D arrangement of charge flow that can self-resonate in the EMF wave domain similar to how a radio transmitter/receiver works.
The actual qualia/consciousness part is the resonance itself. So you ask what's it resonating with? The answer: Every past instantiation of itself. I believe the Block Universe view of Physics is correct and there's an entanglement connection left behind whenever particles interact via collapse of the wave function.
So in my theory memories aren't even stored locally. When you "remember" something that is your brain resonating with nearest matches from past brains. I have a formula for resonance strength with a drop off due to time and a proportionality due to negentropy or repetition (multiple resonating matches). I'm not going to write the entire theory here, but it's about 100 pages to fully describe. The theory explains everything from fungal intelligence to why repeating things over and over makes you memorize them. It's not a theory about brains per se, it's a theory about negentropic systems of particles resonating thru the causality chain.
I’m not asking for an explanation of your theory. Sorry, but I wouldn’t read it. We have fully mechanistic explanations of thinking (including consciousness) that have no need for added complications beyond what is represented by the synaptic connectome. So why even bother adding something else?
I suspect that, to use Daniel Dennett’s terminology, you’re looking for a skyhook. You want some aspect of consciousness to not be explainable by neural nets. Why?
The "Consciousness is a Computation" theory leaves out the fact that we know brainwaves are strongly correlated to consciousness in too strong a way to be coincidence. I've noticed people believing in the purely computational view never want to talk about brain waves, electromagnetic effects, etc. and generally have a less than adequate understanding of Quantum Mechanics (or even radio circuits, to understand how resonance applies) to be able to even formulate an educated opinion. Once I mention "resonance" people just assume I mean it in the woo woo spiritual sense, because they don't even know how the circuit of a radio reciever/transmitter uses resonance, and that resonance has a precise meaning in the context of wave mechanics, including probability waves in QM.
And furthermore, no serious neuroscientist on the planet currently claims we have a viable mechanistic theory of consciousness yet, so your statement to the contrary calls into question your knowledge, even at a general level, of this entire field.
Correlation is not causation. One would expect a physical implementation of an electro-chemical neural net with loops (and consciousness / self-awareness definitely requires loops) to develop voltage biases over time, which require periodic rebalancing to continue operation. This is what brain waves are. You see the structure of the brain wave change when consciousness is chemically turned on or off (anesthesia), because those self-reinforcing loops turn off or change character when the brain is unconscious. It's like noticing that an electrical circuit gives off radio waves, and then trying to locate the behavior of the circuit in those waves because you noticed that when you switched off the device those radio waves disappeared. No, the computation is happening in the circuit; the radio waves are just an unavoidable side product of electric charges moving around.
I'm a physicist by training; I understand what resonance is.
> no serious neuroscientist on the planet currently claims we have a viable mechanistic theory of consciousness yet
This isn't true. There are dozens of mechanistic theories of consciousness: take your pick. We just don't know which one reflects the situation within our own brain, because we lack sufficient understanding of our neural circuitry to make that determination. But having dozens of different possible models and not knowing which one is "right" in the sense of describing our actual brain (while any one of them is a reasonable theory of consciousness for other systems) is very different from not even having a single model of consciousness, as you seem to be implying.
Right, I get your view of things. Probably our common ground of beliefs is that LLMs are mechanistic and they can do genuine human-level reasoning, as long as we expand the word reasoning to be slightly more general than "human reasoning". The brain is also built on Perceptrons (essentially) and can do reasoning via perceptrons like LLMs (agreeing with you there).
My special claim about wave resonance is a bit more specific and nuanced. I'm claiming that it's only memory, qualia/consciousness(also, emotions, pain, ets) that's made of waves. So we can agree that when you're thinking in terms of logic and reason itself, you may be using mostly the Perceptronics and not the waves.
But I think the "pattern matching" ability of the brain (which is 90% of what it does, my view, excluding the I/O [sensory+motor neurons]) is totally built on resonance. I claim memory and resonance are identical. When something happens that reminds you of something else (even Deja Vu) that's literally your brain being entangled with all past copies of itself and able to resonate in real-time with all of them across the causality chain of the Block Universe because they're ALL part of a single entangled structure.
EDIT: So consciousness is where your brain gets "agency" (executive decision making) from. So you can think of this "agency" aspect of a brain as "The thing that runs LLM prompts, using your Neurons as Perceptrons", and yes those LLM prompts that perform reason might be totally mechanistic just like computer LLMs.
It might be fair to say they think but they can't directly introspect the thinking process. They can only confabulate reasoning post-hoc, or use science to try to understand themselves from the outside. Same as humans.
Hard disagree. LLMs are very accurately described by the "stochastic parrot" analogy that gets thrown around a lot. They do not "think" like humans at all, even if we use the word "think" because it's convenient.
I think the phrase "stochastic parrot" is misleading and has fooled millions of people into thinking that LLMs can't do genuine reasoning about situations they've never seen before, nor been trained on, which is wrong because LLMs definitely are doing genuine intelligent reasoning.
What model training is doing is building up a semantic space of vectors from which astronomically large numbers of true facts and ideas can be derived during inference. I mean like a number of facts larger than the number of molecules in the known universe. A googolplex more facts than the sum of all of humanity has ever "thought".
> What model training is doing is building up a semantic space of vectors from which astronomically large numbers of true facts and ideas can be derived during inference. I mean like a number of facts larger than the number of molecules in the known universe. A googolplex more facts than the sum of all of humanity has ever "thought".
If you've ever downloaded a model to play with locally, you may have noticed that it does not in fact contain more bits than there are atoms in the universe.
I said googolplex facts can be "derived". I didn't say they're stored. Also a single perceptron can answer an infinite number of questions. Here's how...
For example the knowledge of how to answer the linear equation "Y=mX+b" contains an infinite number of "facts". For any X prompt, you put in, you can get out a "fact" Y. That's a question and an answer. This is actually a perfect analogy too, because all Perceptrons are really doing is this exact linear math (aside from things like a tanh activation functions, etc).
If you've ever shuffled cards, you may have noticed that you can generate a deck sequence no one has ever seen before rather easily.
None of the cards are unique, but the sequence can be.
The GP specifically said that accurate predictions can be derived from the patterns in the model, not that it contains a list of all the facts one by one.
> I think the phrase "stochastic parrot" is misleading and has fooled millions of people into thinking that LLMs can't do genuine reasoning about situations they've never seen before, nor been trained on, which is wrong because LLMs definitely are doing genuine intelligent reasoning.
Can you describe specifically which part of an LLM architecture does the "reasoning" and how it works? Because every architecture I'm familiar with is literally just a fancy way of predicting the likelihood of the next token given the stream of previous tokens and the known distribution of token based on the training data. This is not reasoning. This is simple statistical prediction, making the "stochastic parrot" analogy actually quite accurate.
> What model training is doing is building up a semantic space of vectors from which astronomically large numbers of true facts and ideas can be derived during inference. I mean like a number of facts larger than the number of molecules in the known universe. A googolplex more facts than the sum of all of humanity has ever "thought".
Sort of. It's like an extremely lossy compression process which has absolutely no guarantee that "truth" was maintained in the process. Also I'm extremely dubious of your claim that it can accurately encode "a number of facts larger than the molecules in the known universe" given that it's trivially easy to get the best LLMs to give you an incorrect answer to a question any human would easily get right.
The "reasoning" is an emergent property that no one understands yet. Yes we do all the training only to try to predict the next word (i.e. train to do word prediction), yet with enough training data, then at some scale (GPT 3.5ish) the embedding vectors in semantic space begin to build a geometric scaffolding in into the weights, for lack of a better way to phrase it.
If you know about facts like (Vector(man) minus Vector(woman) equals Vector(king) minus Vector(queen)), that's an indication that this "scaffolding" is taking shape. It means the "concept of gender" has a "direction" in the roughly 4,000 dimensional vector "space". This vector behaves geometrically, so that vectors behave in vector space as if it was a geometric space of sorts (there are directions and distances), even though there's no true space coordinates, just logical "directions". Mankind doesn't quite yet understand the "Geometry of Logic". LLMs prove we don't. I think it's a new math field to be invented.
As far as the actual number of "facts" contained in an LLM, I think you have to consider something that's a function of the number of bits in an entire model, and ask how many "states" can that store, as a rough approximation from an entropy standpoint. But these aren't pure facts. They're reasoning. I guess you can call reasoning something like "fuzzy facts", so there's a bit of uncertainty to each one of them. LLMs don't store facts, they store fuzzy reasoning. But I call it "factual" when an LLM fixes a bug in my code, or correctly states some piece of knowledge.
Perhaps we disagree on semantics here, but IMHO I wouldn't call this "reasoning". It's essentially just data compression, which is exactly what you get by constructing an encoder network that minimizes loss while trying to maximally crunch down that data into a handful of geometric dimensions.
"Mankind doesn't quite yet understand the geometry of logic" is laying it on a bit thick with the marketing speak, IMHO. It's just data compression whose result is somewhat obvious given what the loss function is optimizing for.
If a structure capable of real reasoning was being built, I wouldn't expect LLMs to get tripped up by simple questions like "How many Rs does the word Strawberry have in it?". There are only two simple reasoning systems you need to solve this question. You need to learn the English alphabet and you need to be able to count to a handful of single digit numbers, both tasks that kids of age 3-4 have mastered just fine. Putting together these two concepts allows you now to reason your way through any such question, with any word and any letter.
Instead, LLMs perform how we would mostly expect a stochastic parrot to react. They hallucinate an answer, immediately apologize when it's called out to be wrong, hallucinate a new, still incorrect answer, immediately apologize again, until they eventually get stuck in a loop of cursed context and model collapse.
I'm not suggested that an LLM couldn't learn such a reasoning task, for example, but it would need to look at many training examples of such problems, and more importantly, have an architecture and loss function that optimized for learning a mechanical pattern or equation for solving that kind of problem.
And in that regard, we're very, very far away from LLMs that can do any kind of generic reasoning, because I haven't seen any evidence that those models are generic enough that you can avoid learning lots and lots and lots of specific ways to approach and solve problems.
One thing I think it's critical to keep in mind is that improvisation upon contextually relevant data in your compressed knowledge base is not reasoning. It might sound convincing to a human reader, but when it's failing at much simpler reasoning tasks the illusion really is shattered.
Wasn't there some paper recently that showed over training models, well beyond when it is normally halted, led them to create internal generalised models of a subject, e.g. arithmetic?
Essentially, the model internalised the core concepts of arithmetic. In that sense, the "reasoning" is pre baked into the model by training. Inference just plays things back through that space.
EDIT: as I recall, this is because understanding the concepts provides better compression than remembering lots of examples. It just takes a lot more training before it discovers them.
I don't like the analogy of "compression" that much, because for example if you train a model to predict linear data points, ideally it will only end up knowing two numbers in it's model weights when it's done training: "m" and "b" in "Y=Mx+b".
Once it's successfully captured "m" and "b" it has "knowledge" with which it can predict infinite numbers of points correctly, and hopefully it didn't "compress" any of the examples but discarded all of them.
Yeah, it's not compression in the sense of compressing data. Kind of compression in that it takes less resource to encode general rules than to remember the answer for everything.
The paper said was that the most efficient bits of the network were those that encoded rules rather than remembered data. Somehow those bits gradually took over from the less efficient parts. I'll have to dig around, can't seem to find it right now.
When people say "If it was reasoning, then it would be able to know, How many Rs does the word Strawberry have in it?", but that's not quite right, but I would say this instead "If it was reasoning THE SAME WAY HUMANS reason....then it would be able to...". Humans do reasoning a certain way. LLMs do reasoning a different way. But both are doing it.
But since it's not reasoning the way people do (but very differently), yes it can make mistakes that look silly to us, but still be higher IQ than any human. Intelligence is a spectrum and has different "types". You can fail at one thing but be highly intelligent at something else. Think of Savantism. Savants are definitely "reasoning" but many of savants are essentially mentally disabled by many standards of measurement, up to and including not being able to count letters in words. So saying you don't think LLMs can reason, and giving examples fails as evidence of that, is just a kind of category error, to put it politely.
The fact that LLMs can fix bugs in pretty much any code base shows it's definitely not doing just simple "word completion" (despite that way of training), but is indeed doing some kind of reasoning FAR FAR beyond what humans can yet understand. I have a feeling only coders truly understand the power of LLMs reasoning because the kind of prompts we do absolutely require extremely advanced reasoning and are definitley NOT answerable because some example somewhere already had my exact scenario (or even a remotely similar one) that the model weights essentially had just 'compressed'. Sure there is a compression aspect to what LLMs do, but that's totally orthogonal to the reasoning aspect.
I tend to agree that LLMs are not thinking in the way that we usually mean it when referring to human thinking. However I think it is dangerous to assert on the capabilities of a system based on the structure seemingly imposed by its API. Take for example the instruction set of a CPU. One could argue that a cpu only has N registers or can process only one instruction at a time because the instruction set only contains N registers, and processes instruction linearly. But on any modern application processor, the register renamer allows many more physical registers to be allocated than can be names in the ISA, and instructions are dispatched in parallel to mitigate memory latency and increase throughput.
What I mean is that an LLM is not a stochastic parrot because of its API, but rather because it does not outdo a stochastic parrot when tested.
This could change though. LLMs could think in full sentences and spoon feed them to us one token at a time, rewording as necessary to provide a few possibilities. They could memorize the letters in each token and count letters correctly despite the limitations imposed by the API.
Even when you know, as you do, it's tough to avoid such characterizations.