Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No. Memset (and bzero) aren’t HW accelerated. There is a special CPU instruction that can do it but in practice it’s faster to do it in a loop. In user space you can frequently leverage SIMD instructions to speed it up (of course those aren’t available in the kernel because it avoids saving/restoring those and FP registers on every syscall (only when you switch contexts).

What could be interesting if there were a CPU instruction to tell the RAM to do it. Then you would avoid the memory bandwidth impact of freeing the memory. But I don’t think there’s any such instruction for the CPU/memory protocol even today. Not sure why.



That seems wild to be honest. I know how easy it is to say "well they can just.."

But...wouldn't it be relatively trivial to have an instruction that tells the memory controller "set range from address y to x to 0" and let it handle it? Actually slamming a bunch of 0's out over the bus seems so very suboptimal.


> But...wouldn't it be relatively trivial to have an instruction that tells the memory controller "set range from address y to x to 0" and let it handle it?

Having the memory controller or memory module do it is complicated somewhat because it needs to be coherent with the caches, needs to obey translation, etc. If you have the memory controller do it, it doesn't save bandwidth. But, on the other hand, with a write back cache, your zeroing may never need to get stored to memory at all.

Further, if you have the module do it, the module/sdram state machine needs to get more complicated... and if you just have one module on the channel, then you don't benefit in bandwidth, either.

A DMA controller can be set up to do it... but in practice this is usually more expensive on big CPUs than just letting a CPU do it.

It's not really tying up a processor because of superscalar, hyperthreading, etc, either; modern processors have an abundance of resources and what slows things doing is things that must be done serially or resources that are most contended (like the bus to memory).


Thanks for the answer!


Through modern CPUs are explicitly build to make sure such a loop is fast.

And in some cases on some systems the DRM controller might zero the memory in some situations, in which cases you could say it was done by hardware.


> DRM controller

Did you mean DMA controller? Or do you have more information?


yes DMA, not the direct rendering manager ;=)


dc zva?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: