If you have really unlimited budget, unconditional love for Intel and x86 and don't care about ludicrous power draw at all, Intel has a silly Sapphire Rapids Xeon Max part with 64GiB of 1TB/s HBM.
It goes really fast (same magnitude of bandwith as A100s) if your model fits in that cache entirely.
It goes really fast (same magnitude of bandwith as A100s) if your model fits in that cache entirely.