Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeh. Or, during a complex investigation, you need to setup a hypothesis explaining these spikes in order to eventually establish causality surrounding these spikes. And once you have causality, you can start fixing.

For example, I've had a disagreement with another engineer there during a larger outage. We eventually got to the idea: If we click that button in the application, the database dies a violent death. Their first reaction was: So we never click that button again. My reaction was: We put all attention on the DB and click that button a couple of times.

If we can reliably trigger misbehavior of the system, we're back in control. If we're scared, we're not in control.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: