LUTs at least do well in microbenchmarks, but I do worry that they may do comparatively much worse in real code.
That said, that's another advantage of small tables using vpermi2pd.
The Julia/base implementations of log and exp both use LUTs.
The SIMD AVX512 implementation of exp used by LoopVectorization.jl will sometimes use the 16 element table.
I experimented with log, but had some difficulty getting accuracy and performance, so the version LoopVectorization.jl currently uses doesn't use a table.
The Julia/base implementations of log and exp both use LUTs. The SIMD AVX512 implementation of exp used by LoopVectorization.jl will sometimes use the 16 element table. I experimented with log, but had some difficulty getting accuracy and performance, so the version LoopVectorization.jl currently uses doesn't use a table.