Could you give a few examples? I'd lean towards adjusting tooling if you can.
My spelling is often horrendous and I know it - but almost every dev I know of prefers to copy and paste anything that might be misspelled just because it's easier than taking the risk.
Similarly - how does does this get anywhere near causing a production outage?
I'd be tempted to view this as a blessing in disguise; this person sounds like they'll trip up more often than the rest, but if one individual can cause a production outage with spelling mistakes something's gone awry with your processes elsewhere. You have an opportunity to fix whatever that is now.
A string value in a json config needed to be updated.
On one prod instance, typo while updating the config by hand. Config validation of the software caught it, software stopped with the appropriate error message, a few minutes later we were up and running again.
We introduced work reviews on prod instances (similar to code reviews) after that.
Later, he then wrote a patch script to avoid making that mistake again.
In the json schema definition used in the script, the name of the property had a typo (how it came to be... no clue, copy paste should have taken care of that).
The script was part of a MR, the reviewer missed the typo. We noticed it in staging.
We introduced tests for config editing scripts after that.
And so it went on and on... The problem is not that it happens and we then refine our processes. It is the frequency.
What I’m seeing here is that you don’t have mature mechanisms to assure the reliability of your services yet. The second paragraph suggests that a misconfiguration was able to make it into production that arguably should have been caught at an earlier stage of the deployment pipeline. Anyone can make these sorts of mistakes; the fact that a particular colleague is more prone to them really doesn’t matter all that much.
Fortify your delivery pipeline and the problem should resolve itself.
They are not, but think of it like learning to play a guitar: at first, the strings cut into your fingers, but then you build up enough calluses and playing it stops hurting. Or, consider a building code: every rule was written in blood, and new buildings get safer over time.
My spelling is often horrendous and I know it - but almost every dev I know of prefers to copy and paste anything that might be misspelled just because it's easier than taking the risk.
Similarly - how does does this get anywhere near causing a production outage?
I'd be tempted to view this as a blessing in disguise; this person sounds like they'll trip up more often than the rest, but if one individual can cause a production outage with spelling mistakes something's gone awry with your processes elsewhere. You have an opportunity to fix whatever that is now.