I should maybe preface this by saying that I probably agree that this is the way...

I should maybe preface this by saying that I probably agree that this is the way this will shake out ultimately.

But I also would say multiple odd post processing stuff (obviously completely obscured for security reasons) bolted onto a giant black box model will erode the trust in the results. If a robot was unveiled and the question of "what prevents this robot from using it's superhuman strength from smashing my head in" the answer of "don't worry there is a post processing step in the robots brain whereby if it detects a desire to kill we just cancel that" would be a little disconcerting.

The more satisfying solution is: the model / robot is designed to not be able to produce specific images / to smash human heads in. It just might not really be possible.