In pretraining, the model is just trying to predict the most likely next token g...

In pretraining, the model is just trying to predict the most likely next token given the context. Its guesses are not always correct, which leads to hallucinations. Post-training often incentivizes the model to sound confident in its output, which can make the problem worse.