Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In all reinforcement learning there is (explicitly as part of a fitness function, or implicitly as part of the algorithm) some impetus for exploration. It might be adding a tiny reward per square walked, a small reward for each block broken and a larger one for each new block type broken. Or it could be just forcing a random move every N steps so the agent encounters new situations through “clumsiness”.


That is right, there is usually a parameter on the action selection function -- the exploitation vs exploration balance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: