- Jitter. Others have mentioned it, but I'll repeat it. These systems run carefully tuned microkernels to try to avoid any spurious interrupts, placing disk storage directly in the system would cause a lot of problems. Most applications require a careful lock-step progression to keep the calculations relevant; having one node take even 1% longer on a step wastes tremendous amounts of compute power for the system as a whole.
- System lifecycle management. As mentioned in the article, supercomputers are only cost effective to run for ~ 4 years; but data storage can outlast that. De-coupling storage and compute makes the transitions a bit easier; you can still get at the old filesystem even when the last generation machine has been scrapped. Also, this helps with access from related-but-disparate systems - you do need to get the data in and out of the machine to do anything useful with it, and potentially interrupting or degrading compute performance for external file access would be a problem.
- Power, and power stability - filesystems, especially large distributed systems such as Lustre/GPFS/PVFS2, do NOT handle power loss well. Best current practice for HPC centers is to keep the storage subsystems, file servers and disk arrays on backup power, but the compute side is directly run from the grid. Embedding disk in the compute would either require UPS'ing the compute platform, or anticipating filesystem corruption.
As with pretty much everything in the industry, people are looking at approaches to solve these. There is some inertia as you speculated, but it's becoming obvious that I/O is the new bottleneck as systems scale up any further, and you'll likely see some storage start moving in closer to the compute system.
- Jitter. Others have mentioned it, but I'll repeat it. These systems run carefully tuned microkernels to try to avoid any spurious interrupts, placing disk storage directly in the system would cause a lot of problems. Most applications require a careful lock-step progression to keep the calculations relevant; having one node take even 1% longer on a step wastes tremendous amounts of compute power for the system as a whole.
- System lifecycle management. As mentioned in the article, supercomputers are only cost effective to run for ~ 4 years; but data storage can outlast that. De-coupling storage and compute makes the transitions a bit easier; you can still get at the old filesystem even when the last generation machine has been scrapped. Also, this helps with access from related-but-disparate systems - you do need to get the data in and out of the machine to do anything useful with it, and potentially interrupting or degrading compute performance for external file access would be a problem.
- Power, and power stability - filesystems, especially large distributed systems such as Lustre/GPFS/PVFS2, do NOT handle power loss well. Best current practice for HPC centers is to keep the storage subsystems, file servers and disk arrays on backup power, but the compute side is directly run from the grid. Embedding disk in the compute would either require UPS'ing the compute platform, or anticipating filesystem corruption.
As with pretty much everything in the industry, people are looking at approaches to solve these. There is some inertia as you speculated, but it's becoming obvious that I/O is the new bottleneck as systems scale up any further, and you'll likely see some storage start moving in closer to the compute system.