> We might be incentivizing answers that sound right with reinforcement learning...

		latentsea 9 months ago \| parent \| context \| favorite \| on: OpenAI's new reasoning AI models hallucinate more > We might be incentivizing answers that sound right with reinforcement learning as opposed to answers that are actually right. We do this with other humans, so I don't know that we know how to avoid doing the same with machines.