FTA: “For design reasons that are a complete mystery to me, the MMX registers are actually sub-registers of the x87 STn registers”
I think the main argument for doing that was that it meant that existing OSes didn’t need changes for the new CPU. Because they already saved the x87 registers on context switch, they automatically saved the MMX registers, and context switches didn’t slow down.
It also may have decreased the amount of space needed, but that difference can’t have been very large, I think
By "existing OSes", that really means Microsoft Windows, other OSes would not have had any problems with the negligible update required to save and restore more registers.
During many decades, Intel has introduced a lot of awful workarounds in their CPUs for the only reason that Microsoft was too lazy to update their OS so the newer better CPUs had to be managed by the OS exactly in the same way as the old worse CPUs, even if that moved inside the CPUs various functions that can be done much more efficiently by the OS, so their place is not inside the CPU.
So the MMX registers were aliased over the FPU registers because in this way the existing MS Windows saved them automatically at thread switching. Eventually the limitations of MMX were too great, and due to competitive pressure from AMD (3DNow!) and Motorola (AltiVec), Intel and Microsoft were forced to transition to SSE in 1999, for which a couple of new save and restore instructions have been added and used by the OS, allowing an increase in the number and size of registers.
I think the main argument for doing that was that it meant that existing OSes didn’t need changes for the new CPU. Because they already saved the x87 registers on context switch, they automatically saved the MMX registers, and context switches didn’t slow down.
It also may have decreased the amount of space needed, but that difference can’t have been very large, I think