Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yepp it continues the gathering of more and better data.

Ai is not a hype. We have started to actually do something with all the data and this process will not stop soon.

Aline the RL what is now happening through human feedback alone (thumbs up/down) is massive.



It was always the case. We only managed to make a decent model once we created a decent dataset.

This meant making a rich synthetic dataset first, to pre-train the model, before fine tuning on real, expensive data to get the best results.

but this was always the case.


RLHF wasn't needed for Deepseek, only gobbling up the whole internet — both good and bad stuff. See their paper


I thought human preferences was typically considered a noisy reward signal


If it was just "noisy", you could compensate with scale. It's worse than that.

"Human preference" is incredibly fucking entangled, and we have no way to disentangle it and get rid of all the unwanted confounders. A lot of the recent "extreme LLM sycophancy" cases is downstream from that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: