A couple hundred billion floating point numbers is enough to store quite a few things.
Also, algorithms you learn in CS are for scaling problems to arbitrary sizes, but you don't strictly need those algos to handle problems of a small size. In a sense, you could say the "next token predictor" can simulate some very crude algorithms, eg. at every token, greedily find the next location by looking at the current location, and output the neighboring location that's to the direction of the destination.
The next token predictor is a built in for loop, and if you have a bunch of stored data on where the current location is roughly, its neighboring locations, and the relative direction of the destination... then you got a crude algo that kinda works.
PS: but yeah despite the above, I still think the emergence is "magic".
Also, algorithms you learn in CS are for scaling problems to arbitrary sizes, but you don't strictly need those algos to handle problems of a small size. In a sense, you could say the "next token predictor" can simulate some very crude algorithms, eg. at every token, greedily find the next location by looking at the current location, and output the neighboring location that's to the direction of the destination.
The next token predictor is a built in for loop, and if you have a bunch of stored data on where the current location is roughly, its neighboring locations, and the relative direction of the destination... then you got a crude algo that kinda works.
PS: but yeah despite the above, I still think the emergence is "magic".