Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd say your choice between Quora's engineers being incompetent or AWS being dishonest/incompetent is a completely false dichotomy. Anyone who has been around AWS (or basically any technology) will agree that the things that can really hurt you are not always the things you considered in your design. I just can't believe that many of the people who grok the cloud were running production sites under the assumption that there was no cross-AZ risk. They use the same API endpoints, auth, etc so it's obvious they're integrated at some level.

Perhaps for Quora and the like, engineering for the amount of availability needed to withstand this kind of event was simply not cost effective, but I seriously doubt the possibility didn't occur to them. It's not even obvious to me that there are many people who did follow the contract you reference who had serious downtime. All of the cases I've read about so far have been architectures that were not robust to a single AZ failure.

As for multi-az RDS, it's synchronous MySQL replication on what smell like standard EC2 instances, probably backed by EBS. Our multi-az failover actually worked fine this morning, but I am curious how normal that was.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: