Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You’re talking about ChatGPT 2.0 and severely underestimate the capabilities of today’s models.


It's well known that even current LLMs do not perform well on logic games when you change the names / language used.

i.e. try asking it to swap the meanings of the words red and green and ask it to describe the colors in a painting and analyse it with color theory - notice how quickly the results degrade, often attributing "green" qualities to "red" since it's now calling it "green".

What this shows us is that training data (where the associations are made) plays a significant role in the level of answer an LLM can give, no matter how good your context is (at overriding the associations / training data). This demonstrates that training data is more important (for "novel" work) than context is.


Write "This sentence is green." in red sharpie and "This sentence is red" in green sharpie on a piece of paper. Show it to someone briefly and then hide it. Ask them what color the first sentence said it was and what color the second sentence was written in.

Another one: ask a person to say 'silk' 5 times, then ask them what cows drink.

Exploiting such quirks only tells you that you can trick people, not what their capabilities are.


The point isn't that you can trick an LLM, but that their capabilities are more strongly tied to training data than context. That's to say, when context and training disagree, training "wins". ("wins" isn't the correct wording, but hopefully you understand the point)

This poses a problem for new frameworks/languages/whatever that do things in a wholly different way since we'll be forced to rely on context that will contradict the training data that's available.


What is an example of a framework that does things in a wholly different way? Everything I'm familiar with is a variation on well explored ideas from the 60s-70s.

If you had someone familiar with every computer science concept, every textbook, every paper, etc. up to say 2010 (or even 2000 or earlier), along with deep experience using dozens of programming languages, and you sat them down to look at a codebase, what could you put in front of them that they couldn't describe to you with words they already know?


Even the differences between React and Svelte are big enough for this to be noticeable. And Svelte is actually present in the training data. Given the large amount of react training data, svelte performs significantly worse (yes, even when given the full official svelte llms.txt in the context)


But it doesn't pose a problem. You are extrapolating things that are not even correlated.

You started with 'they can't understand anything new' and then followed it up with 'because I can trick it with logic problems' which doesn't prove that.

Have you even tried doing what you say won't work?


If I make up a riddle and ask an LLM to solve it, it will perform worse than a riddle that is well known and whose solution will be found in the dataset. That's just a foundational component of how they work.


Yes you can trick it.

But it’s almost trivial for an LLM to generate every question and answer combo you could every come up with based on new documentation and new source code for a new framework. It doesn’t need StackOverflow anymore. It’s already miles ahead.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: