Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I may not be an expert in high-reliability systems but that isn't how I expect the problem to be tackled.

You probably know way better than me, but in my experience, configuring things correctly on healthy hardware gets you 99.99% by default. Adding some surplus capacity adds another 9, at least.

Then you build from there (autoscaling, hardware failovers, etc. etc.).



Eh, 99.99 is less than an hour downtime a year. I know of almost nothing of any size that actually achieved that save certain areas of the power grid.


That does depend somewhat on what you count as "downtime". Larger distributed systems may not suffer a complete outage in the whole year, and more than one of the services my team runs has that level of successful response ratio.


As a person who runs an OpenStack cluster, I can pull 100% uptime many months back to back with minimum intervention.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: