One thing I learned in grad school is you can do a stupendous number of floating...

kzrdude · on Nov 28, 2022

Matrix multiplication and similar is also one of the few operations where algorithms and special case instructions are interesting for floating point on a massive scale.

I.e adding two arrays together, computing dot products, those operations are just memory bound when the data grows, but matrix multiplication is dense enough with operations per element that it is limited by arithmetic operations too.

oxfeed65261 · on Nov 28, 2022

Had to look it up:

> Intel oneAPI Math Kernel Library (Intel oneMKL; formerly Intel Math Kernel Library or Intel MKL), is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math.

https://en.m.wikipedia.org/wiki/Math_Kernel_Library

ilyt · on Nov 28, 2022

> One thing I learned in grad school is you can do a stupendous number of floating point operations in the time it takes to serialize a matrix to ASCII numbers and deserialize it. Whatever it is you are doing with a JSON document might be slower than parsing it.

> It's true that autovectorization accomplishes very little but specialized libraries could have a notable effect on perceived performance if they were widely developed and used.

I mean, sure, but even if we take JSON as an example, in vast majority of cases it gets fed to a giant blob of JS driving even bigger blob of browser code.

The cases where you do deserialization -> very little processing -> serialization are pretty rare.

Sure, if it is already on chip might as well use it, but realistically savings will be in single digit percents.

lmm · on Nov 28, 2022

> The cases where you do deserialization -> very little processing -> serialization are pretty rare.

Actually I've seen a lot of systems that do that - query a datastore, do some minimal processing of the results, feed them back to the caller. Although that tends to get addressed at a higher level, e.g. MongoDB drivers shifting towards using BSON.