I'll add that latency also doesn't matter that much. You are doing batched data loading for batch n+1 on CPU when GPUs are churning batch n-1 and copying batch n from host memory at the same time.
So as long as your "load next batch" doesn't run for like >1s it would be fine. But one single "load next batch" on one worker means thousands (if not more) random read.
I'll add that latency also doesn't matter that much. You are doing batched data loading for batch n+1 on CPU when GPUs are churning batch n-1 and copying batch n from host memory at the same time.
So as long as your "load next batch" doesn't run for like >1s it would be fine. But one single "load next batch" on one worker means thousands (if not more) random read.