I don't see the reward model that they are using. I think that is an important f... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		tulip4attoo on Aug 31, 2023 \| parent \| context \| favorite \| on: Tune PaLM 2 with your own RLHF training data I don't see the reward model that they are using. I think that is an important factor when doing rlhf.

aldarisbm on Aug 31, 2023 [–]

What does this impact? how aligned to the base model does the output model stays?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact