I've checked with ClickHouse and the result is better than I expect... it runs i...

zX41ZdbW · on July 25, 2022

https://github.com/ClickHouse/countwords/pull/1/files

fauigerzigerk · on July 25, 2022

>it runs in 0.043 sec. on my machine, which is faster than any other result.

Did you run the other benchmarks on your machine as well?

zX41ZdbW · on July 25, 2022

Yes (but only scripted, without compilation):

`grep` | 0.03 | 0.03 | `grep` baseline; optimized sets `LC_ALL=C`

`wc -w` | 0.18 | 0.25 | `wc` baseline; optimized sets `LC_ALL=C`

SQL | 0.26 | | by Alexey Milovidov

Perl | 1.22 | | by Charles Randall

Python | 1.42 | 0.86 |

Tcl | 5.30 | | by William Ross

Shell | 9.66 | 1.79 | optimized does `LC_ALL=C sort -S 2G`

blacksqr · on July 27, 2022

N.B. the Tcl script is absurdly inefficient. A single simple optimization cuts the run time in half.

zX41ZdbW · on July 25, 2022

I forgot to multiply the file 10 times. When I do, the result is 0.209 sec. which is still better than every other result.

stoical · on July 26, 2022

You are also using a language function to read the file. In the 'official' github implementations they have to accept the data line by line from stdin - stdin likely being slower than reading a file directly.