You’re talking about ChatGPT 2.0 and severely underestimate the capabilities of ...

miningape · 2026-01-07T16:13:32 1767802412

It's well known that even current LLMs do not perform well on logic games when you change the names / language used.

i.e. try asking it to swap the meanings of the words red and green and ask it to describe the colors in a painting and analyse it with color theory - notice how quickly the results degrade, often attributing "green" qualities to "red" since it's now calling it "green".

What this shows us is that training data (where the associations are made) plays a significant role in the level of answer an LLM can give, no matter how good your context is (at overriding the associations / training data). This demonstrates that training data is more important (for "novel" work) than context is.

Eisenstein · 2026-01-07T16:42:23 1767804143

Write "This sentence is green." in red sharpie and "This sentence is red" in green sharpie on a piece of paper. Show it to someone briefly and then hide it. Ask them what color the first sentence said it was and what color the second sentence was written in.

Another one: ask a person to say 'silk' 5 times, then ask them what cows drink.

Exploiting such quirks only tells you that you can trick people, not what their capabilities are.

miningape · 2026-01-07T17:15:54 1767806154

The point isn't that you can trick an LLM, but that their capabilities are more strongly tied to training data than context. That's to say, when context and training disagree, training "wins". ("wins" isn't the correct wording, but hopefully you understand the point)

This poses a problem for new frameworks/languages/whatever that do things in a wholly different way since we'll be forced to rely on context that will contradict the training data that's available.

ndriscoll · 2026-01-07T17:32:22 1767807142

What is an example of a framework that does things in a wholly different way? Everything I'm familiar with is a variation on well explored ideas from the 60s-70s.

If you had someone familiar with every computer science concept, every textbook, every paper, etc. up to say 2010 (or even 2000 or earlier), along with deep experience using dozens of programming languages, and you sat them down to look at a codebase, what could you put in front of them that they couldn't describe to you with words they already know?

miningape · 2026-01-07T19:11:19 1767813079

Even the differences between React and Svelte are big enough for this to be noticeable. And Svelte is actually present in the training data. Given the large amount of react training data, svelte performs significantly worse (yes, even when given the full official svelte llms.txt in the context)

Eisenstein · 2026-01-07T19:13:47 1767813227

But it doesn't pose a problem. You are extrapolating things that are not even correlated.

You started with 'they can't understand anything new' and then followed it up with 'because I can trick it with logic problems' which doesn't prove that.

Have you even tried doing what you say won't work?

jakeydus · 2026-01-07T16:34:55 1767803695

If I make up a riddle and ask an LLM to solve it, it will perform worse than a riddle that is well known and whose solution will be found in the dataset. That's just a foundational component of how they work.

spiderfarmer · 2026-01-08T06:15:11 1767852911

Yes you can trick it.

But it’s almost trivial for an LLM to generate every question and answer combo you could every come up with based on new documentation and new source code for a new framework. It doesn’t need StackOverflow anymore. It’s already miles ahead.