> Alphazero also doesn't need training data as input--it's generated by game-play. The information fed in is just game rules
This is wrong, it wasn't just fed the rules, it was also fed a harness that did test viable moves and searched for optimal ones using a depth first search method.
Without that harness it would not have gained superhuman performance, such a harness is easy to make for Go but not as easy to make for more complex things. You will find the harder it is to make an effective such harness for a topic the harder it is to solve for AI models, it is relatively easy to make a good such harness for very well defined programming problems like competitive programming but much much harder for general purpose programming.
Are you talking about Monte Carlo tree search? I consider it part of the algorithm in AlphaZero's case. But agreed that RL is a lot harder in real-life setting than in a board game setting.
This is wrong, it wasn't just fed the rules, it was also fed a harness that did test viable moves and searched for optimal ones using a depth first search method.
Without that harness it would not have gained superhuman performance, such a harness is easy to make for Go but not as easy to make for more complex things. You will find the harder it is to make an effective such harness for a topic the harder it is to solve for AI models, it is relatively easy to make a good such harness for very well defined programming problems like competitive programming but much much harder for general purpose programming.