I played a little with FFT Gaussian blur. It uses the frequency domain, and so does not have to average hundreds of points, but rather transforms the image and the blur kernel into the frequency domain. There it performs a pointwise multiplication and transforms the image back. It's way faster than the direct convolution.
Having to process 100 source pixels per destination pixel to shrink 10x seems like an inefficient implementation. If you downsample each dimension individually you only need to process 20 pixels per pixel. This is the same optimization used for Gaussian blur.
> If you downsample each dimension individually you only need to process 20 pixels per pixel.
If you shrink 10x in one direction, then the other, then you first turn 100 pixels into 10, before turning 10 pixels into 1. You actually do more work for a non-smoothed shrink, sampling 110 pixels total.
To benefit from doing the dimensions separately, the width of your sample has to be bigger than the shrink factor. The best case is a blur where you're not shrinking at all, and that's where 20:1 actually happens.
If you sampled 10 pixels wide, then shrunk by a factor of 3, you'd have 100 samples per output if you do both dimensions at the same time, and 40 samples per output if you do one dimension at a time.
Two dimensions at the same time need width^2 samples
Two dimensions, one after the other, need width*(shrink_factor + 1) samples
Practically people think that box averaging is too expensive (pretty much it is like that Gaussian blur but computed on fewer output points.)