Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the model do something actually bad before I care

At what point would a simple series of sentences be "dangerously bad?" It makes it sound as if there is a song, that when sung, would end the universe.



When someone asks how to make a yummy smoothie, and the LLM replies with something that subtly poisons or otherwise harms the user, I'd say that would be pretty bad.


And if you really want to spice up your smoothie, add just a little bit of bleach ;)


We had this for ages: sugar.


Ending the universe is, while poetic, needlessly megalomaniac.

Making some subset of people quarrel endlessly would already be dangerous enough, as prophesied in https://slatestarcodex.com/2018/10/30/sort-by-controversial/


By what mechanism would it make them quarrel? Producing falsehoods about the other? Isn't this already done? And don't we already know that it does not lead to "endless" conflict?

For this to work, you need to isolate each group from the other groups information and perspectives, which is outside of the scope of LLMs.

Which, highlights my point, I think. Power comes from physical control, not from megalomanical or melodramatic poetry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: