Need to compare this with custom silicon like Apple will be shipping.
They already have the Neural Engine chip which can run Stable Diffusion, but eventually you could imagine casting a specific model instance to an ASIC (say GPT-3.5 or -4, today).
If most devices are replaced within a year or two then you get a pretty good cadence for updating your Siri model (and even more incentive for users to upgrade hardware).
You don't, because of the scaling law they say they've identified. If optical energy per MAC operation scales as 1/d, we know two things: 1) there is no electronic architecture possible that can catch it, and 2) bigger models give optical networks a bigger energy advantage.
It's possible to have a temporary lead because of constant factors, but as long as an electronic circuit has to expend a unit of energy per MAC, you'll always be able to specify a model big enough that an optical network will beat it.
1) this is a research device and a theoretical scaling law; it’s not been proven.
> We conclude that with well-engineered, large-scale optical hardware, it may be possible to achieve a 100× energy-efficiency advantage
Emphasis on may.
2) in the real world, constant factors matter (as you allude to). For example if an ASIC gets a 1000x speedup (optimistic; we saw this for BTC) it might be the better choice for this generation, but start to lose next gen and beyond. If an ASIC only gets 100x or lower then it’s not favorable this gen.
So sure, this tech might win in the long term, but I wasn’t making any categorical claims, just noting that there are multiple horses we need to track.
It would be quite foolish to dismiss custom silicon solutions based on this paper.
You can run it on either now (for example, MochiDiffusion allows you to pick https://github.com/godly-devotion/MochiDiffusion#compute-uni...). Anecdotally, the GPU seems to be faster for an M1 Max or up GPU, the ANE is a touch faster on anything smaller, and more power efficient in general.
If most devices are replaced within a year or two then you get a pretty good cadence for updating your Siri model (and even more incentive for users to upgrade hardware).