Good hunch. On my machine (13900k) & zig 0.11, the latest version of the code: >...

mtlynch · on March 20, 2024

Based on what you've shared, the second version can start reading instantly because "INFILE" was populated in the previous test. Did you clear it between tests?

Here are the benchmarks before and after fixing the benchmarking code:

Before: https://output.circle-artifacts.com/output/job/2f6666c1-1165...

After: https://output.circle-artifacts.com/output/job/457cd247-dd7c...

What would explain the drastic performance increase if the pipelining behavior is irrelevant?

vlovich123 · on March 20, 2024

That was just a typo in the comment. The command run locally was just a strait pipe.

Using both invocation variants, I ran:

8a5ecac63e44999e14cdf16d5ed689d5770c101f (before buffered changes)

78188ecbc66af6e5889d14067d4a824081b4f0ad (after buffered changes)

On my machine, they're all equally fast at ~28 us. Clearly the changes only had an impact on machines with a different configuration (kernel version or kernel config or xxd version or hw).

One hypothesis outlined above is that the when you pipeline all 3 applications, the single byte reader version is doing back-to-back syscalls and that's causing contention between your code and xxd on a kernel mutex leading to things going to sleep extra long.

It's not a strong hypothesis though just because of how little data there is and the fact that it doesn't repro on my machine. To get a real explanation, I think you have to actually do some profile measurements on a machine that can repro and dig in to obtain a satisfiable explanation of what exactly is causing the problem.

rofrol · on March 20, 2024

This @mtlynch

vlovich123 · on March 20, 2024

To sanity check myself, I reran this without the buffered reader and still don't see the slow execution time:

> echo '60016000526001601ff3' | xxd -r -p > | zig build run -Doptimize=ReleaseFast

> execution time: 28.889µs

So I think my machine config for whatever reason isn't representative of whatever OP is using.

Linux-ck 6.8 CONFIG_NO_HZ=y CONFIG_HZ_1000=y

Intel 13900k

zig 0.11

bash 5.2.26

xxd 2024-02-10

Would be good if someone that can repro it compares the two invocation variants with buffered reader implemented & lists their config.