That isn't the way they work today. LLMs can easily find errors in outputs they ...

Lio · on Oct 31, 2024

I don’t think LLMs can easily find errors in their output.

There was a recent meme about asking LLMs to draw a wineglass full to the brim with wine.

Most really struggle with that instruction. No matter how much you ask them to correct themselves they can’t.

I’m sure they’ll get better with more input but what it reveals is that right now they definitely do not understand their own output.

I’ve seen no evidence that they are better with code than they are with images.

For instance, if the time to complete only scales with length of the token and not the complexity of its contents then it probably safe to assume it’s not being comprehended.

philipwhiuk · on Oct 31, 2024

> LLMs can easily find errors in outputs they themselves just produced.

No. LLMs can be told that there was an error and produce an alternative answer.

In fact LLMs can be told there was an error when there wasn't one and produce an alternative answer.

danielmarkbruce · on Oct 31, 2024

You don't use LLMs.

https://chatgpt.com/share/6722e41d-6b20-8002-8cbb-3012cd9179...

mavidser · on Oct 31, 2024

https://chatgpt.com/share/672331d2-676c-8002-b8b3-10fc4c8d88...

In my experience, if you confuse an LLM by deviating from the the "expected", then all the shims of logic seem to disappear, and it goes into hallucination mode.

danielmarkbruce · on Oct 31, 2024

Try asking this question to a bunch of adults.

mavidser · on Nov 6, 2024

Tbf that was exactly my point. An adult might use 'inference' and 'reasoning' to ask clarification, or go with an internal logic of their choosing.

ChatGPT here went with a lexigraphical order in Python for some reason, and then proceeded to make false statements from false observations, while also defying its own internal logic.

    "six" > "ten" is true because "six" comes after "ten" alphabetically.

No.

    "ten" > "seven" is false because "ten" comes before "seven" alphabetically.

No.

From what I understand of LLMs (which - I admit - is not very much), logical reasoning isn't a property of LLMs, unlike information retrieval. I'm sure this problem can be solved at some point, but a good solution would need development of many more kinds of inference and logic engines than there are today.

cdchn · on Oct 31, 2024

Do you believe that the LLM understands what it is saying and is applying the logic that you interprets from its response, or do you think its simply repeating similar patterns of words its seen associated with the question you presented it?

danielmarkbruce · on Oct 31, 2024

If you take the time to build an (S?)LM yourself, you'll realize it's neither of these. "Understands" is an ill-defined term, as is "applying logic".

But a LLM is not "simply" doing anything. It's extremely complex and sophisticated. Once you go from tokens into high-dimensional embeddings... it seems these models (with enough training) figure out how all the concepts go together. I'd suggest reading the word2vec paper first, then think about how attention works. You'll come to the conclusion these things are likely to be able to beat humans at almost everything.

lomase · on Oct 31, 2024

You said humans are machines that make errors ans that LLMs can easily find errors in output they themself produce.

Are you sure you wanted to say that? Or is the other way around?

danielmarkbruce · on Oct 31, 2024

Yes. Just like humans. It's called "checking your work" and we teach it to children. It's effective.

0points · on Oct 31, 2024

> LLMs can easily find errors in outputs they themselves just produced.

Really? That must be a very recent development, because so far this has been a reason for not using them at scale. And noone is.

Do you have a source?

danielmarkbruce · on Oct 31, 2024

Lots of companies are using them at scale.