There's some level at which an AI 'player' goes from being competitive with a hu...

kirrent · on April 30, 2025

As another example you can consider the apparently successful DOTA2 and Starcraft 2 bots. They'd be interesting if they taught us new ideas about the games in the same way that AlphaGo's God move uncovered something new about Go. But they didn't. They excelled through superior micro and flawless execution of quite simple strategies. Watching pros trying to hold off waves of perfectly microed blink stalkers reminded me of seeing a chess engine in action. A computer grinding down their doomed human opponent using the advantages offered by being a computer rather than superior human-like play.

grogenaut · on April 30, 2025

I'm pretty sure that the bots changed the dieback meta around the last TI in seattle when openai last did their demo pre canada TI. So I disagree that the "ai taught us nothing". Prior to that dieback was seen bad. After that people did the math and realized that spam respawn, the money and growth matter more. They may have altered the game after that, I don't know. I only paid attention when it was at Climate Pledge / Key.

Ntrails · on April 30, 2025

The AI's play meaningfully added ideas of ways to play dota2 iirc. It wasn't just buying back, the way they played around early advantage hyper aggressive, not much farming, spam buying regen to stay out etc.

On the other hand you could generally beat the first "1v1 mid" bot by just cutting the wave behind its tower. So adaptation to new stuff was not good in isolation.

I would have loved to know whether given more time/prep/replays/practice pros would have figured out the holes. My guess is yes

emporas · on April 30, 2025

If all internet data could be saved in a disk, alongside with model weights, then what's the difference of pulling the knowledge out of the weights exclusively, versus weights and jpeg images? I don't see any difference.

The only difference might be compression, model weights throw away the noise and save the signal only.

For me humans versus machines is not an interesting competition. Machines will always win in a narrow specialized domain.

A more interesting competition is a very experienced human, versus an amateur who knows how to use A.I. Statistical/probabilistic models get confused, and they can easily wander aimlessly into rabbit holes. But a human who knows how to control the A.I. but is amateur at that particular narrow task, could guide it and at the end perform the same, or even better than the more experienced person.

In chess that's not true due to the super narrowed down domain of 64 squares and 6 different pieces, but anything that is more general a natty intelligence is necessary.

When I use it for programming, I never ask it to write code, i guide it to write that function, and use that other function from a library. If it is let free to guess, it will guess correctly 90% to 99% of the time, but if it is instructed then the code is almost flawless, nine nines percent of accuracy.

somenameforme · on April 30, 2025

Your assessment of computer chess could use a bit of elaboration. A strong human can easily play an entire game blindfolded - even in blitz/high speed time controls. So seeing a line 30 moves out is not especially remarkable. What makes computers so unbelievably strong in chess is much like in other domains, and it's pretty boring - they will literally never make a simple oversight or blunder. Even the best human players regularly make "simple" mistakes even on the current move, let alone in one's distant analyses.

So 98% of the moves a computer will play are not especially surprising at all. A strong human will just about always have at least considered the move and even if not - they'll immediately understand the point. And in the 2% there's a relatively simple explanation. Computer's inability to make short-term mistakes lets them consider ideas humans never would. For instance humans tend to like material, yet there are a shockingly large number of positions where a modern computer will sac a piece and then just continue on playing a piece down in what "feels" like a fairly normal position. It simply turns out that your opponent simply has no way to convert their material advantage, and so your positional advantage will tell in the longrun, even being a piece down! At least if you're a computer...

This has led to some interesting outcomes. For instance Fabiano Caruana, a top 10 player in the world, is extremely well known for his exceptional level of deep and creative opening preparation, all computer approved. But in more than a few instances he's ended up in positions that look bad but where a computer will say he's practically winning, and ultimately go on to lose the game. It's simply because these sort of positions might indeed be objectively winning, but it may require 10 or 20 practically perfect moves - whereas a single subtle mistake means you lose. And it's extremely hard for even the best players in the world to play like this.

umanwizard · on April 30, 2025

> A strong human can easily play an entire game blindfolded - even in blitz/high speed time controls. So seeing a line 30 moves out is not especially remarkable.

How are these points connected? Playing blindfolded doesn’t require being able to calculate 30 moves deep (or any particular number).

Being able to remember/visualize an N move sequence without losing the thread while blindfolded is not at all the same thing as being able to calculate N moves deep.

somenameforme · on April 30, 2025

I assume you mean because when a human is calculating some variation 30 moves deep that we're obviously discarding a ridiculous chunk of the overall game tree possibilities? Absolutely true, but the same is also true of computers. For instance I just let Stockfish 17 run on the starting position until it got to a reported depth of 30. It took almost exactly 10 seconds while running at ~3.2 million nodes per second. So it assessed about 32 million positions to get to a reported depth of 30 (which is 15 moves for each side), but there's at least something like 8e41 possible positions there (that's assuming a low average of 25 possible moves per position). So it's discarding a percent of moves that pretty safely rounds up to 100%.

Another example to illustrate the point is the ICCF (International Correspondence Chess Federation). Were computers comparably competent at long-term play as they are at short-term, then there wouldn't even be a competition. It'd simply be who has the strongest computer. But in reality that seems to play no particularly decisive factor. For instance, as in "normal" chess, there remains a huge gender divide in ratings, yet females certainly have no less access to competent hardware than males.

LeifCarrotson · on May 1, 2025

No, a blindfolded human chess player isn't calculating a selective variation 30 moves deep. They've memorized the current state of the board, and update it when their assistant (not blindfolded) tells them the opponent's move. That's completely different from imagining millions of future possible boards simultaneously.

somenameforme · on May 1, 2025

This is not really how it works, at least not internally. For instance the record for blindfold simultaneous games is 48. Playing that by anything even remotely like conscious memory would probably be impossible. If it were simply a game of conscious memory then a highly competent memory competitor should be able to play (even if to a poor standard) multiple blindfold games, yet in reality he'd probably be unable to play a single one - even if he is entirely capable of memorizing a deck of cards, which is vastly more 'state' than a chess position. And vice versa, test a highly competent blindfold player in a general memory game and he'd be unlikely to do much better than above average.

Chess, for a stronger player, is very much like a language - in fact it uses the exact same area of the brain. It's like when you read these words, you're not consciously thinking at all - the meaning just comes to you immediately. And in fact you could trivially recall everything I said (even if not necessarily verbatim) if you just thought for a second or so. But simultaneously it's not like you actually made any effort whatsoever to memorize it.

So how long of a conversation could you hold with yourself in your mind? Practically endless, and you could probably reconstruct the overwhelming majority of it on demand. It's the same with chess. Meandering around in your mind to positions of an arbitrary depth is not difficult for a strong player. And the person I was responding to felt that a player writing down some analysis to move 20 would be some meaningful form of cheating. In reality, I'd absolutely love for my opponent to be able to write down their analyses. It'd waste just a monumental amount of time and afford no advantage whatsoever. It'd be akin to you writing down the conversation from your mind.

tshaddox · on April 29, 2025

> There's some level at which an AI 'player' goes from being competitive with a human player, matching better-trained human strategy against a more impressive memory, to just a cheaty computer with too much memorization. Finding that limit is the interesting thing about this analysis, IMO!

And a lot of human competitions aren't designed in such a way that the competition even makes sense with "AI." A lot of video games make this pretty obvious. It's relatively simple to build an aimbot in a first-person shooter that can outperform the most skilled humans. Even in ostensibly strategic games like Starcraft, bots can micro in ways that are blatantly impossible for humans and which don't really feel like an impressive display of Starcraft skill.

Another great example was IBM Watson playing Jeopardy! back in 2011. We were supposed to be impressed with Watson's natural language capabilities, but if you know anything about high-level Jeopardy! then you know that all you were really seeing is that robots have better reflexes than humans, which is hardly impressive.

vlovich123 · on April 29, 2025

> It's not interesting playing chess against Magnus, even for high-level GMs. He just crushes almost every human

The differences even among humans between the absolute best & those out of the top 10 tend to be pretty drastic. And a non-IM against Magnus won't even understand what's going on. You could similarly claim that Magnus just memorized a bunch of openings which is similar to criticism GMs level too which is why Chess960 is now gaining more traction. My point is that there's not really such a thing as "fair" in a competition.

Re geoguessr, why not let them whatever tools are available? I have similar critiques about bike racing & restrictions on the technology advancements they can put on the bike. But every competition chooses arbitrary lines to draw which compose the rules so it doesn't really matter.

sensanaty · on April 29, 2025

I mean Geoguessr explicitly states when you launch the game (in PvP mode) that googling/searching is bannable

ralfd · on April 29, 2025

Geoguessr is a game with artificial rules though. If I want the AI to solve a task I care about the result, not what tools it uses.

vlovich123 · on April 30, 2025

That’s exactly my point. Evaluating the task success independent of artificial limitations that are specific to the game doesn’t invalidate the result.

mrlongroots · on April 29, 2025

To reframe your takeaway: you want to benchmark the "system" and see how capable it is. The boundaries of the system are somewhat arbitrary: is it "AI + web" or "only AI", and it is not about fairness as much as about "what do you, the evaluator, want to know".

rowanG077 · on April 29, 2025

You seem indicate you want a computer to beat a human without ever using what a computer is actually good at(large memories, brute force compute etc). That seems a little ridiculous to me. How do you want it to engage? Disallowed use of native compute and must simulate a full human brain?

Sure I do agree that the web search is too far, because it's literally cheating. But stockfish is super human at chess, it doesn't really matter that it can do this by leveraging the strengths of a computer.

monadINtop · on April 29, 2025

I disagree, if we're gonna be hyping up machines for their prowess at "thinking" and being artificially "intelligent" in that soft effusive human way then yeah I think its fair criticism. We already knew from the 50s that computers are like stupid geniuses when it comes to following algorithms and crunching computations far too expansive and tedious for any human.

rowanG077 · on April 30, 2025

The point is that from a black box view they are rapidly surpassing humans in a lot of fields. You can say they do it with tools the human mind has no access to. That's probably true. The "soft effusive human way" to be intelligent is also black box, and something we aren't even close to understanding. This means it's as close to be able to be measured as string theory. "If it's not exactly like this thing we don't understand it's not fair".

chongli · on April 30, 2025

They're not a black box though. They're querying an external resource (Google Search). That's crossing an API boundary. If you're going to let them use Google Search then let the human opponent use Google Search as well.

It's like if you were building an AI robot to run a marathon against a human opponent, except you let the AI robot ride a motorcycle and force the human to stay on foot.

SamPatt · on April 30, 2025

Search was irrelevant in this case. I ran it again without search and it made the same guesses. I updated the post with those details.

rowanG077 · on April 30, 2025

I didn't say the AI is black box, I said if you take a black box view. That last word is load bearing.

Did you read the article? It's clearly shown that with or without search it doesn't make much of a difference how good it actually is.