Yeah - I think that their argument would be that their approach enables you to do that, and that there are significant gains on the table without that last step.
My main critique is that it's one example, and it would be good to see the technique exercised across a number so that we can see the strengths and weaknesses.
It seems to reason that if you can co-locate your calls into a single process, you'd gain at least 9x.