Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
tulip4attoo
on Aug 31, 2023
|
parent
|
context
|
favorite
| on:
Tune PaLM 2 with your own RLHF training data
I don't see the reward model that they are using. I think that is an important factor when doing rlhf.
aldarisbm
on Aug 31, 2023
[–]
What does this impact? how aligned to the base model does the output model stays?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: