There's some level at which an AI 'player' goes from being competitive with a human player, matching better-trained human strategy against a more impressive memory, to just a cheaty computer with too much memorization. Finding that limit is the interesting thing about this analysis, IMO!
It's not interesting playing chess against Stockfish 17, even for high-level GMs. It's alien and just crushes every human. Writing down an analysis to 20 move depth, following some lines to 30 or more, would be cheating for humans. It would take way too long (exceeding any time controls and more importantly exceeding the lifetime of the human), a powerful computer can just crunch it in seconds. Referencing a tablebase of endgames for 7 pieces would also be cheating, memorizing 7 terabytes of bitwise layouts is absurd but the computer just stores that on its hard drive.
Human geoguessr players have impressive memories way above baseline with respect to regional infrastructure, geography, trees, road signs, written language, and other details. Likewise, human Jeopardy players know an awful lot of trivia. Once you get to something like Scrabble or chess, it's less and less about knowing words or knowing moves, but more about synthesizing that knowledge intelligently.
One would expect a human to recognize some domain names like, I don't know, osu.edu: lots of people know that's Ohio State University, one of the biggest schools in the US, located in Columbus, Ohio. They don't have to cheat and go to an external resource. One would expect a human (a top human player, at least) to know that taxilinder.at is based in Austria. One would never expect any human to have every business or domain name memorized.
With modern AI models trained on internet data, searching the internet is not that different from querying its own training data.
As another example you can consider the apparently successful DOTA2 and Starcraft 2 bots. They'd be interesting if they taught us new ideas about the games in the same way that AlphaGo's God move uncovered something new about Go. But they didn't. They excelled through superior micro and flawless execution of quite simple strategies. Watching pros trying to hold off waves of perfectly microed blink stalkers reminded me of seeing a chess engine in action. A computer grinding down their doomed human opponent using the advantages offered by being a computer rather than superior human-like play.
I'm pretty sure that the bots changed the dieback meta around the last TI in seattle when openai last did their demo pre canada TI. So I disagree that the "ai taught us nothing". Prior to that dieback was seen bad. After that people did the math and realized that spam respawn, the money and growth matter more. They may have altered the game after that, I don't know. I only paid attention when it was at Climate Pledge / Key.
The AI's play meaningfully added ideas of ways to play dota2 iirc. It wasn't just buying back, the way they played around early advantage hyper aggressive, not much farming, spam buying regen to stay out etc.
On the other hand you could generally beat the first "1v1 mid" bot by just cutting the wave behind its tower. So adaptation to new stuff was not good in isolation.
I would have loved to know whether given more time/prep/replays/practice pros would have figured out the holes. My guess is yes
If all internet data could be saved in a disk, alongside with model weights, then what's the difference of pulling the knowledge out of the weights exclusively, versus weights and jpeg images? I don't see any difference.
The only difference might be compression, model weights throw away the noise and save the signal only.
For me humans versus machines is not an interesting competition. Machines will always win in a narrow specialized domain.
A more interesting competition is a very experienced human, versus an amateur who knows how to use A.I. Statistical/probabilistic models get confused, and they can easily wander aimlessly into rabbit holes. But a human who knows how to control the A.I. but is amateur at that particular narrow task, could guide it and at the end perform the same, or even better than the more experienced person.
In chess that's not true due to the super narrowed down domain of 64 squares and 6 different pieces, but anything that is more general a natty intelligence is necessary.
When I use it for programming, I never ask it to write code, i guide it to write that function, and use that other function from a library. If it is let free to guess, it will guess correctly 90% to 99% of the time, but if it is instructed then the code is almost flawless, nine nines percent of accuracy.
Your assessment of computer chess could use a bit of elaboration. A strong human can easily play an entire game blindfolded - even in blitz/high speed time controls. So seeing a line 30 moves out is not especially remarkable. What makes computers so unbelievably strong in chess is much like in other domains, and it's pretty boring - they will literally never make a simple oversight or blunder. Even the best human players regularly make "simple" mistakes even on the current move, let alone in one's distant analyses.
So 98% of the moves a computer will play are not especially surprising at all. A strong human will just about always have at least considered the move and even if not - they'll immediately understand the point. And in the 2% there's a relatively simple explanation. Computer's inability to make short-term mistakes lets them consider ideas humans never would. For instance humans tend to like material, yet there are a shockingly large number of positions where a modern computer will sac a piece and then just continue on playing a piece down in what "feels" like a fairly normal position. It simply turns out that your opponent simply has no way to convert their material advantage, and so your positional advantage will tell in the longrun, even being a piece down! At least if you're a computer...
This has led to some interesting outcomes. For instance Fabiano Caruana, a top 10 player in the world, is extremely well known for his exceptional level of deep and creative opening preparation, all computer approved. But in more than a few instances he's ended up in positions that look bad but where a computer will say he's practically winning, and ultimately go on to lose the game. It's simply because these sort of positions might indeed be objectively winning, but it may require 10 or 20 practically perfect moves - whereas a single subtle mistake means you lose. And it's extremely hard for even the best players in the world to play like this.
> A strong human can easily play an entire game blindfolded - even in blitz/high speed time controls. So seeing a line 30 moves out is not especially remarkable.
How are these points connected? Playing blindfolded doesn’t require being able to calculate 30 moves deep (or any particular number).
Being able to remember/visualize an N move sequence without losing the thread while blindfolded is not at all the same thing as being able to calculate N moves deep.
I assume you mean because when a human is calculating some variation 30 moves deep that we're obviously discarding a ridiculous chunk of the overall game tree possibilities? Absolutely true, but the same is also true of computers. For instance I just let Stockfish 17 run on the starting position until it got to a reported depth of 30. It took almost exactly 10 seconds while running at ~3.2 million nodes per second. So it assessed about 32 million positions to get to a reported depth of 30 (which is 15 moves for each side), but there's at least something like 8e41 possible positions there (that's assuming a low average of 25 possible moves per position). So it's discarding a percent of moves that pretty safely rounds up to 100%.
Another example to illustrate the point is the ICCF (International Correspondence Chess Federation). Were computers comparably competent at long-term play as they are at short-term, then there wouldn't even be a competition. It'd simply be who has the strongest computer. But in reality that seems to play no particularly decisive factor. For instance, as in "normal" chess, there remains a huge gender divide in ratings, yet females certainly have no less access to competent hardware than males.
No, a blindfolded human chess player isn't calculating a selective variation 30 moves deep. They've memorized the current state of the board, and update it when their assistant (not blindfolded) tells them the opponent's move. That's completely different from imagining millions of future possible boards simultaneously.
This is not really how it works, at least not internally. For instance the record for blindfold simultaneous games is 48. Playing that by anything even remotely like conscious memory would probably be impossible. If it were simply a game of conscious memory then a highly competent memory competitor should be able to play (even if to a poor standard) multiple blindfold games, yet in reality he'd probably be unable to play a single one - even if he is entirely capable of memorizing a deck of cards, which is vastly more 'state' than a chess position. And vice versa, test a highly competent blindfold player in a general memory game and he'd be unlikely to do much better than above average.
Chess, for a stronger player, is very much like a language - in fact it uses the exact same area of the brain. It's like when you read these words, you're not consciously thinking at all - the meaning just comes to you immediately. And in fact you could trivially recall everything I said (even if not necessarily verbatim) if you just thought for a second or so. But simultaneously it's not like you actually made any effort whatsoever to memorize it.
So how long of a conversation could you hold with yourself in your mind? Practically endless, and you could probably reconstruct the overwhelming majority of it on demand. It's the same with chess. Meandering around in your mind to positions of an arbitrary depth is not difficult for a strong player. And the person I was responding to felt that a player writing down some analysis to move 20 would be some meaningful form of cheating. In reality, I'd absolutely love for my opponent to be able to write down their analyses. It'd waste just a monumental amount of time and afford no advantage whatsoever. It'd be akin to you writing down the conversation from your mind.
> There's some level at which an AI 'player' goes from being competitive with a human player, matching better-trained human strategy against a more impressive memory, to just a cheaty computer with too much memorization. Finding that limit is the interesting thing about this analysis, IMO!
And a lot of human competitions aren't designed in such a way that the competition even makes sense with "AI." A lot of video games make this pretty obvious. It's relatively simple to build an aimbot in a first-person shooter that can outperform the most skilled humans. Even in ostensibly strategic games like Starcraft, bots can micro in ways that are blatantly impossible for humans and which don't really feel like an impressive display of Starcraft skill.
Another great example was IBM Watson playing Jeopardy! back in 2011. We were supposed to be impressed with Watson's natural language capabilities, but if you know anything about high-level Jeopardy! then you know that all you were really seeing is that robots have better reflexes than humans, which is hardly impressive.
> It's not interesting playing chess against Magnus, even for high-level GMs. He just crushes almost every human
The differences even among humans between the absolute best & those out of the top 10 tend to be pretty drastic. And a non-IM against Magnus won't even understand what's going on. You could similarly claim that Magnus just memorized a bunch of openings which is similar to criticism GMs level too which is why Chess960 is now gaining more traction. My point is that there's not really such a thing as "fair" in a competition.
Re geoguessr, why not let them whatever tools are available? I have similar critiques about bike racing & restrictions on the technology advancements they can put on the bike. But every competition chooses arbitrary lines to draw which compose the rules so it doesn't really matter.
That’s exactly my point. Evaluating the task success independent of artificial limitations that are specific to the game doesn’t invalidate the result.
To reframe your takeaway: you want to benchmark the "system" and see how capable it is. The boundaries of the system are somewhat arbitrary: is it "AI + web" or "only AI", and it is not about fairness as much as about "what do you, the evaluator, want to know".
You seem indicate you want a computer to beat a human without ever using what a computer is actually good at(large memories, brute force compute etc). That seems a little ridiculous to me. How do you want it to engage? Disallowed use of native compute and must simulate a full human brain?
Sure I do agree that the web search is too far, because it's literally cheating. But stockfish is super human at chess, it doesn't really matter that it can do this by leveraging the strengths of a computer.
I disagree, if we're gonna be hyping up machines for their prowess at "thinking" and being artificially "intelligent" in that soft effusive human way then yeah I think its fair criticism. We already knew from the 50s that computers are like stupid geniuses when it comes to following algorithms and crunching computations far too expansive and tedious for any human.
The point is that from a black box view they are rapidly surpassing humans in a lot of fields. You can say they do it with tools the human mind has no access to. That's probably true. The "soft effusive human way" to be intelligent is also black box, and something we aren't even close to understanding. This means it's as close to be able to be measured as string theory. "If it's not exactly like this thing we don't understand it's not fair".
They're not a black box though. They're querying an external resource (Google Search). That's crossing an API boundary. If you're going to let them use Google Search then let the human opponent use Google Search as well.
It's like if you were building an AI robot to run a marathon against a human opponent, except you let the AI robot ride a motorcycle and force the human to stay on foot.
It's not interesting playing chess against Stockfish 17, even for high-level GMs. It's alien and just crushes every human. Writing down an analysis to 20 move depth, following some lines to 30 or more, would be cheating for humans. It would take way too long (exceeding any time controls and more importantly exceeding the lifetime of the human), a powerful computer can just crunch it in seconds. Referencing a tablebase of endgames for 7 pieces would also be cheating, memorizing 7 terabytes of bitwise layouts is absurd but the computer just stores that on its hard drive.
Human geoguessr players have impressive memories way above baseline with respect to regional infrastructure, geography, trees, road signs, written language, and other details. Likewise, human Jeopardy players know an awful lot of trivia. Once you get to something like Scrabble or chess, it's less and less about knowing words or knowing moves, but more about synthesizing that knowledge intelligently.
One would expect a human to recognize some domain names like, I don't know, osu.edu: lots of people know that's Ohio State University, one of the biggest schools in the US, located in Columbus, Ohio. They don't have to cheat and go to an external resource. One would expect a human (a top human player, at least) to know that taxilinder.at is based in Austria. One would never expect any human to have every business or domain name memorized.
With modern AI models trained on internet data, searching the internet is not that different from querying its own training data.