> To clarify, are you saying the entire app was slower with AVX than it was with...

> To clarify, are you saying the entire app was slower with AVX than it was with SSE4?

We were optimizing loop by loop, and some loops converted in AVX could well be slower yes (at this point of time). AVX is probably more often a win nowadays.

> That would be surprising, because 2x vector width is expected to outweigh 10-20 percent downclocking.

In practice workload is often bottleneck by memory access, and there is diminishing returns with increased vector size.