In pretraining, the model is just trying to predict the most likely next token given the context. Its guesses are not always correct, which leads to hallucinations. Post-training often incentivizes the model to sound confident in its output, which can make the problem worse.