Transformers run well on GPU's or other hardware accelerators. This benchmark doesn't allow GPU's.
That makes it more of a "can I use unsuitable hardware to get the job done fast and accurately enough" challenge, rather than a pure math puzzle of how to encode data with fewer bytes.
I suspect that's why there is only 1 Transformer entry, and to me raises the question whether the rules should be updated to allow GPU's now they are fairly commonplace.
Not if part of your definition of "best" includes codebook/model size. The best text generator will use a massive model. That won't help compression beyond a certain point, since the cost of lookup indexes grows with the size of the model. "Monkey #384714...872 typewriter output" doesn't help when the monkey number is longer than the input.