They were trained in fp16, and researchers tend to release whatever format they trained. It’s hard enough to do a large release that it’s best not to try to have too many goals, for the same reason most software projects try not to do too much lest their schedule slip.
Still, I’m a little sad they didn’t release the optimizer weights. It would’ve given us so much valuable info about the dataset, among other benefits.
Still, I’m a little sad they didn’t release the optimizer weights. It would’ve given us so much valuable info about the dataset, among other benefits.