The problem is that storage costs are a step function. Once you cross a certain threshold (the threshold depends on your performance requirements a.k.a. IOPS), storage gets Really Freaking Expensive (tm). The steps start to get really, really steep as your capacity increases, and that's just for the primary copy of your data.
Once you get into truly large data amounts, other things start to break (RAID 5, RAID 6, tape backup, disk backup, synchronization, the ability to replace storage systems without massive outages). The good news is that they're almost all solved problems, but you're usually stuck with buying overpriced crap from EMC, Hitachi, NetApp, 3PAR and IBM (storage is a protection racket). All of this combines to explain why a good storage admin pulls down 6 figures a year.
I may be a bit myopic, but I see a world coming where technology startups trade capital costs for operating costs. S3 is pricey if you're dealing with small quantities of data, but once the step increase in your per-GB storage costs goes over 30%, you might want to reconsider. The steps only get bigger.
Something about your argument is not making sense. You say that as storage needs increase, the costs go up dramatically (that there are significant DISeconomies of scale).
Yet Amazon is acting as an aggregator of all those ever-increasing storage needs, operating at even higher scale, yet is able to turn a profit providing a service at scale. Amazon almost certainly isn't building their own hardware.
It seems you're arguing in circles somewhere there...
Sorry, I forgot to mention one of my assumptions...
There are tremendous diseconomies of scale if: you're buying the storage through traditional storage vendors. When your storage needs get really big, you'll realize that stuff is 1) a waste of money, and 2) not meeting your needs (you shape your needs to fit the available products, not vice versa). At that point, it makes enough sense to roll your own storage solution (write your own S3), tailored to your very specific needs.
Once you get into truly large data amounts, other things start to break (RAID 5, RAID 6, tape backup, disk backup, synchronization, the ability to replace storage systems without massive outages). The good news is that they're almost all solved problems, but you're usually stuck with buying overpriced crap from EMC, Hitachi, NetApp, 3PAR and IBM (storage is a protection racket). All of this combines to explain why a good storage admin pulls down 6 figures a year.
I may be a bit myopic, but I see a world coming where technology startups trade capital costs for operating costs. S3 is pricey if you're dealing with small quantities of data, but once the step increase in your per-GB storage costs goes over 30%, you might want to reconsider. The steps only get bigger.