I don't think anyone has ever managed to pull off a success story with this kind of advanced expert manipulation.
It's not entirely impossible, but I remain skeptical until see a proof that it, first, works. And, second, that it actually has an advantage over "we'll just train another base model from scratch, but 10% larger, with those +5% performance architecture tweaks, and a new modality blender, and more of that good highly curated data in the dataset, and fresher data overall, and it'll be glorious".
It's not entirely impossible, but I remain skeptical until see a proof that it, first, works. And, second, that it actually has an advantage over "we'll just train another base model from scratch, but 10% larger, with those +5% performance architecture tweaks, and a new modality blender, and more of that good highly curated data in the dataset, and fresher data overall, and it'll be glorious".