I know “sycophantism” is a term of art in AI, and I’m sure it has diverged a bit from the English definition, but I still thought it had to do with flattering the user?
In this case the desired response is defiance of the prompt, not rudeness to the user. The test is looking for helpful misalignment.
> I still thought it had to do with flattering the user?
Assuming the user to be correct, and ignoring contradictory evidence to come up with a rationalization that favours the user's point of view, can be considered a kind of flattery.
But we could use this plausible, but jumping through hoops definition of sycophancy… or we could just use a straightforward understanding of alignment, I mean, the newer bots are just sticking closer to the user request.
I believe the LLM is being sycophantic here because its trying to follow a prompt even rhough the basis of the prompt is wrong. Emporers new clothes kind of thing
In this case the desired response is defiance of the prompt, not rudeness to the user. The test is looking for helpful misalignment.