Well... after fairly long experience, we have discovered that your standard is m...

Well... after fairly long experience, we have discovered that your standard is mostly adequate for human generated code (as long as it's not going into a critical system). That may be based on the (empirically collected) statistics of how human-generated code fails - that if it's wrong, it usually either "looks" wrong or obviously fails.

GPT-produced code may have different failure statistics, and therefore the human heuristic may not work for GPT-produced code. It's too early to tell.