Sure, in an ideal world this is how nearly everything would work.
Getting a complex system to a level of maturity where this is feasible to do at scale in real life and actually work well is a respectable and non-trivial achievement.
I don't know if Amazon or Azure are able to confidently and effectively put in such automatic remediation measures globally. My sense is there are humans involved to triage and fix unusual types of outages at every other cloud provider, including the other bigs.
Leaving a comment on a message board saying how things ought to work is one thing (there's nothing wrong with your comment, I like it!); I only want to highlight, bold, and underscore how successfully achieving this level of automatic remediation atop a large and dynamic system is uncommon and noteworthy.
Getting a complex system to a level of maturity where this is feasible to do at scale in real life and actually work well is a respectable and non-trivial achievement.
I don't know if Amazon or Azure are able to confidently and effectively put in such automatic remediation measures globally. My sense is there are humans involved to triage and fix unusual types of outages at every other cloud provider, including the other bigs.
Leaving a comment on a message board saying how things ought to work is one thing (there's nothing wrong with your comment, I like it!); I only want to highlight, bold, and underscore how successfully achieving this level of automatic remediation atop a large and dynamic system is uncommon and noteworthy.