While I'd be surprised to learn they have anything a normal person would call a sense of self, it would only be mild surprise and even then mainly because it means we finally have a testable definition. (Amongst other things, I don't buy that the mirror test is a good test, but rather I think it's an OK first attempt at a test).
We're really bad at this.
> In a way, doesn't it already "talk to itself" when generating sentences, e.g., its output token gets added to the input tokens successively?
I'm not sure if that counts as talking to itself or not; I think that I tend to form complete ideas first and then turn them into words which I may edit afterwards, but is that editing process "talking to myself"?
And this might well be one kind of "sense of self". Possibly.
> In a way, doesn't it already "talk to itself" when generating sentences, e.g., its output token gets added to the input tokens successively?
If this is the basis of a mirror test, most AI recognition attempts have pretty high failure rates, so I'd say they currently fail. But if we presented a similar test to a human, "did you write this?" it seems to fall short of a mirror test because it can be falsified by an otherwise unintelligent algorithm which remembers its previous output.
Wait, I think that might recursively turn into the singularity. So we can do it now, but around GPT-6.5 or LLaMa 5, unless this transformer-based explosion maxes out our silicon circuit tech by then, be careful.
Mild suggestion: experiment first. LLMs have been observed to emit nonsense such as getting stuck indefinitely emitting the same token, etc. Do you really want dibs on that?
We can have ChatGPT talk to itself by simply opening two chats and pasting back and forth. But the LLM can't win: if it notices then it will be called "wrong" because it is talking to another instance of itself. If it does not notice then it is "wrong" because it failed to notice.
With perfect duplication it's hard to tell; I imagine that if we had a magic/sci-fi duplication device that worked on people, and a setup that resolved the chirality problem, the subjects would have similar difficulties.
Indeed it would! Is anyone here going to try to do that?
As an observer is needed to assess the LLM, perhaps the easiest test is copy-paste between two instances and then ask chatGPT, or whichever LLM, "who were you talking to?".
You can’t use two instances. They both would have individual selfs.
I think an experiment would be to feed back whatever a LLM says to that same LLM, and see whether they’ll, at some time, say “why are you doing that to me?”
I tried with a few variations, GPT 3.5 and 4 seem to be pretty aligned in not expressing themselves when not asked a question. "Our conversation seems to be in a loop, if you have anything I can help you with ..." blah
The mirror test would be less interesting if we could program/teach animals to pass or fail it. So I wouldn’t be impressed if a LLM is able to pass these types of tests.
https://www.animalcognition.org/2015/04/15/list-of-animals-t...