Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried to catch soft errors for about a year on a couple of Linux boxes I had. They were both desktop form factor machines, one being used as a home server and one as a desktop at work.

I had a background process [1] on each that simply allocated a 128 MB buffer, filled it with a known data pattern, and then went into an infinite loop that slept a while, woke up and checked the integrity of the buffer, and if any of the data had changed logged the change and restored the data pattern.

Based on the error rates I'd seen published, I expected to catch a few errors. For example, using the rate that Tomte's comment [2] cites I think I'd expect about 6 errors a year.

I never caught an error.

I also have two desktops with ECC (a 2008 Mac Pro and a 2009 Mac Pro). I've used the 2008 Mac Pro every working day since I bought it in 2008, and the 2009 Mac Pro every day since I bought it in 2009. Neither of them has ever reported correcting an error.

I have no idea why I have not been able to see an error.

[1] http://pastebin.com/Bv56kVwC

[2] https://news.ycombinator.com/item?id=10600308



Did you check the resulting (dis)assembly? If you compile with optimisations the reading (and maybe writing) to the RAM buffer may be optimised away.


As soon as you have a power fluctuation, air conditioning malfunction, or a few dirty caused short cuts, you'll get enough errors to converge on the published average.

Just wait, and relax. You'll get there eventually.


That is normal of course, and the published error rates are over large amounts of RAM I think.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: