Things we learned about sums

ken · on May 29, 2020

Pet peeve alert: not distinguishing between "IEEE-754 type" and "real number".

> System.out.println(5.1+9.2);

> We ask to add 5.1 to 9.2. The result should be 14.3, but we get the following instead: 14.299999999999999

That's misleading. You put the characters "5.1" in a Java source file, which in Java (like many languages) means "the IEEE-754 64-bit binary floating point value closest to the decimal number 5.1". It's not equal to the real number 5.1. 9.2 and 14.3 can't be represented exactly in binary floating point, either.

The number 5.1 + the number 9.2 should be the number 14.3.

The Java double 5.1 + the Java double 9.2 should not be the Java double represented by 14.3.

The addition isn't really what's hurting you. The representation is. You're confusing the matter by changing type systems in the middle of a sentence.

> It is a small difference (only 0.000000000000001), but it is still wrong.

The answer is "wrong" in large part because the question was wrong. The error in IEEE-754 "14.3" is only slightly smaller than the error in "5.1+9.2".

The sum looks more wrong than the literal "14.3" because Java's Double toString() truncates its output according to some specific rules [1] designed to guess how those 64 bits got there.

> CPUs are poor at dealing with floating-point values. Arithmetics are almost always wrong

Wouldn't it be more accurate to say they're "wrong" at addition 50% of the time? You picked two numbers whose IEEE-754 representation are both just below their actual value, and whose sum is just above its IEEE-754 representation. Had you added "5.1+5.2" (which happen to be just below and just above their actual values, respectively), the representation errors would have cancelled, and you'd have gotten "10.3" as you expect.

[1]: https://docs.oracle.com/javase/7/docs/api/java/lang/Double.h...

bluestreak · on May 29, 2020

Thank you for taking time to elaborate on the subject. This is both informative and enjoyable to read! We knew that all real numbers cannot be represented using IEEE-754. It wasn’t all clear though because toString() appears to be perfectly transitive for double 5.1 yet “sum” is non-associative.

I guess it would be more accurate to say that arithmetic is wrong at some probability value. I just feel that in this instance it is appropriate to call “wrong” what is not 100% right :)

bluestreak · on May 29, 2020

Author here.

About a month ago, I posted about using SIMD instructions to make aggregation calculations faster. I am very thankful for the feedback so far, this post is the result of the comments we received last time.

Many comments suggested that we implement compensated summation (aka Kahan) as the naive method could produce inaccurate and unreliable results. This is why we spent some time integrating kahan and Neumaier summation algorithms. This post summarises a few things we learned along this journey.

We thought Kahan would badly affect the performance since it uses 4x as many operations as the naive approach. However, some comments also suggested we could use prefetch and co-routines to pull the data from RAM to cache in parallel with other CPU instructions. We got phenomenal results thanks to these suggestions, with Kahan sums nearly as fast as the naive approach.

A lot of you also asked if we could compare this with Clickhouse. As they implement Kahan summation, we ran a quick comparison. Here's what we got for summing 1bn doubles with nulls with Kahan algo. The details of how this was done are in the post.

QuestDB: 68ms Clickhouse: 139ms

Thanks for all the feedback so far and keep it going so we can continue to improve. Vlad

zackb · on May 29, 2020

Thanks for you work on this and making it FOSS. This is the first I've heard of QuestDB. Is it production ready? Whats the scaling story? Thanks

bluestreak · on May 29, 2020

Thank you! QuestDB is in production. Scaling is a part of commercial product, built using QuestDB.

radford-neal · on May 29, 2020

While Kahan summation is more accurate than naive summation, it still does not produce the precise result (the true sum rounded to the final precision).

It is not too costly to do the summation exactly. Several algorithms for this have been developed. You can read about mine at https://arxiv.org/abs/1505.05571 and get the code for them at https://gitlab.com/radfordneal/xsum

bluestreak · on May 30, 2020

Thank you Radford! This is a long piece and it would take me a while to absorb properly. Does your algorithm lend itself to vectorised implementation?

radford-neal · on May 30, 2020

It doesn't vectorize in an obvious fashion.

But it has occurred to me that one could accumulate the sum in several "superaccumulators", rather than just one as in the present code. (For instance, by adding even terms to one superaccumulator and odd terms to another superaccumulator.) One could then add the superaccumulators at the end. This would allow for vectorization, or for other forms of instruction-level-parallelism.

I haven't tried this, however. My guess is that it would be advantageous for summing a very large number of terms, but maybe not for less than at least hundreds of terms.

One could also use multiple cores, again with multiple superaccumulators. One advantage of computing the exact sum is that the answer then doesn't depend on what order you add things in (any order gives the same exact result), simplifying the use of multiple cores.

ianmchenry · on May 29, 2020

super interesting. love it.