Yepp it continues the gathering of more and better data. Ai is not a hype. We ha...

KaiserPro · 2025-07-23T09:01:03 1753261263

It was always the case. We only managed to make a decent model once we created a decent dataset.

This meant making a rich synthetic dataset first, to pre-train the model, before fine tuning on real, expensive data to get the best results.

but this was always the case.

noname120 · 2025-07-23T19:48:03 1753300083

RLHF wasn't needed for Deepseek, only gobbling up the whole internet — both good and bad stuff. See their paper

rtrgrd · 2025-07-23T10:55:54 1753268154

I thought human preferences was typically considered a noisy reward signal

ACCount36 · 2025-07-23T15:30:06 1753284606

If it was just "noisy", you could compensate with scale. It's worse than that.

"Human preference" is incredibly fucking entangled, and we have no way to disentangle it and get rid of all the unwanted confounders. A lot of the recent "extreme LLM sycophancy" cases is downstream from that.