Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My guess is this is an artifact of the RLHF part of the training. Answers like "I don't know" or "let me think and let's catch on this next week" are flagged down by human testers, which eventually trains LLM to avoid this path altogether. And it probably makes sense because otherwise "I don't know" would come up way too often even in cases where the LLM is perfectly able to give the answer.


I don't know, that seems like a fundamental limitation. LLMs don't have any ability to do reflection on their own knowledge/abilities.


Humans aren't very aware of their limits, either.

Even the Dunning-Kruger effect is, ironically, widely misunderstood by people who are unreasonably confident about their knowledge.


But you know if you have ever heard about call by name or value semantics.


You've only ever seen people get upset about technical jargon they know they don't understand, but also never seen people misuse jargon wildly?

The latter in particular is how I model the mistakes LLMs made, what with them having read most things.


Yes, Dunning-Kruger's paper never found what popular science calls the 'Dunning-Kruger' effect.

Effectively, they found nothing real but a statistical artifact.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: