It's a null question. Training itself is neither publication nor distribution, so copyright can't be relevant at that point. "Fair use" just isn't a concept applicable to training.
Training stores a variation of the source material, which is arguably distribution. And selling the result or selling access to it certainly is. So fair use applies, and hoping a court thinks the process is transformative to count as fair use. Given original material can be spat out, my money is on a court thinking this is about as transformative as a compression algorithm.
Storing copyright content itself can sometimes be illegal - like ripping a Bluray. What if these frames are now stored on their servers and go into the training dataset?
The illegal bit of ripping a Blu-ray is circumventing the copy protection, not the storage. At least, that's how I've always understood the effect of the DMCA on the situation.