>the 'big leap forward' is just another dataset of people repairing chatgpt's la... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		GaggiX on Sept 13, 2024 \| parent \| context \| favorite \| on: OpenAI threatens to revoke o1 access for asking it... >the 'big leap forward' is just another dataset of people repairing chatgpt's lack of reasoning capabilities. I think there is a really strong reinforcement learning component with the training of this model and how it has learned to perform the chain of thought.

HarHarVeryFunny on Sept 13, 2024 [–]

Yes, but I suspect that the goals of the RL (in order to reason, we need to be able to "break down tricky steps into simpler ones", etc) were hand chosen, then a training set demonstrating these reasoning capabilities/components was constructed to match.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact