In all reinforcement learning there is (explicitly as part of a fitness function...

		taneq 10 months ago \| parent \| context \| favorite \| on: DeepMind program finds diamonds in Minecraft witho... In all reinforcement learning there is (explicitly as part of a fitness function, or implicitly as part of the algorithm) some impetus for exploration. It might be adding a tiny reward per square walked, a small reward for each block broken and a larger one for each new block type broken. Or it could be just forcing a random move every N steps so the agent encounters new situations through “clumsiness”.

That is right, there is usually a parameter on the action selection function -- the exploitation vs exploration balance.