When I was on Google Docs, I watched the Google Forms team build a sophisticated ML model that attempted to detect when people were using it for nefarious purposes.
It underperformed banning the word "password" from a Google Form.
I wonder if this is just an example of Goodhart's law. How did they measure performance of those models? I would imagine they tried measuring against known cases of forms misuse, aka those forms that contained 'password' field.
It underperformed banning the word "password" from a Google Form.
So that's what they went with.