Thank you Radford! This is a long piece and it would take me a while to absorb p...

radford-neal · on May 30, 2020

It doesn't vectorize in an obvious fashion.

But it has occurred to me that one could accumulate the sum in several "superaccumulators", rather than just one as in the present code. (For instance, by adding even terms to one superaccumulator and odd terms to another superaccumulator.) One could then add the superaccumulators at the end. This would allow for vectorization, or for other forms of instruction-level-parallelism.

I haven't tried this, however. My guess is that it would be advantageous for summing a very large number of terms, but maybe not for less than at least hundreds of terms.

One could also use multiple cores, again with multiple superaccumulators. One advantage of computing the exact sum is that the answer then doesn't depend on what order you add things in (any order gives the same exact result), simplifying the use of multiple cores.