"So what scale are we talking about? A few million monthly users? So like hacker...

midasuni · on Feb 10, 2024

You really need two in physically different locations (ideally ASes) with some form of failover, assuming you want a reasonable guaranteed uptime

ascar · on Feb 10, 2024

Does 99.995% [1] of hackernews sound reasonable enough to you?

The reality is a lot of systems (especially simple ones) run perfectly fine on a single server with next to no downtime and all the additional redundancies we introduce also add additional points of failures and without the scale that makes these necessary you might actually end up reducing your availability.

[1] https://hn.hund.io/

kens · on Feb 11, 2024

> Does 99.995% [1] of hackernews sound reasonable enough to you?

How did you come up with that number? I looked at the link and just one of the outages listed on January 10 was 59 minutes. That alone makes the uptime worse than 99.99% for the entire year before it was halfway through January.

(99.995% means at most 26.3 minutes of downtime per year. See https://en.wikipedia.org/wiki/High_availability#Percentage_c...)

ascar · on Feb 11, 2024

it's the 30 day uptime statistic displayed on the website I linked.

if you click the history it says 98.452% over 365 days.

midasuni · on Feb 11, 2024

That’s a pretty low level. That’s lower than my pi hole. But I wouldn’t consider my pihole to be anything other than best endeavours (99.1%). Two would be fine, but there are common points of failure which would limit the solution

dekhn · on Feb 11, 2024

I can't see how hackernews is 99.995% if I get at least 30-40 "Can't serve your request" error pages a year.

midasuni · on Feb 10, 2024

HN is on a pair of servers.

ascar · on Feb 10, 2024

The second one is just a standby though and not in another region. And if I recall correctly dang mentioned at an outage that the failover to the standby is manual. But I'm not sure.

https://news.ycombinator.com/item?id=16076041

borkyborkbork · on Feb 12, 2024

This so much this! Experience is such a wonderful thing.

NikolaNovak · on Feb 11, 2024

We've run 375,000 self-service employees ERP system (so much heavier single transaction than HN) on a single (large:) db2/aix box with no downtime over last 8 years. That's well within published specs for that hardware / software combo.

Yes we do have a DR box in another data centre now, in case of meteorite or fire.

This used to be the norm. A single hardware / software box CAN be engineered for high uptime. It's perfectly fine when we choose to go the other way and distribute software over number of cheap boxes with meh software, but I get pet-peeved when people assume that's the ONLY way :).

midasuni · on Feb 11, 2024

I’m paranoid about environmental situations so always have a failover, but then my DR plans include events such as “Thames barrier fails”.

joshspankit · on Feb 11, 2024

Add a static caching layer and you’re ready for traffic spikes.