Apple said that Max's CPU is up to 2.5x faster and the Ultra's CPU is up to 3.8x...

rovr138 · on March 8, 2022

There has to be overhead to not have to deal with the multi-cpu architecture in software.

It's impressive

andjd · on March 8, 2022

I don't work in the relevant space, but what makes coding for multi-cpu substantially harder than programming for multiple cores? Is it just having to manage separate memory for each CPU?

dagmx · on March 9, 2022

Essentiality you're fighting cache coherency.

When you work with multiple cores they'll likely share the same L2 or at least L3 cache. Multiple CPUs often you pay the cost of copying that L3 cache over or need an L4 or worst case you go back to system RAM.

Each level that you go out further can drastically reduce performance, so you need to try and stay on nearby cores where possible.

Most devs don't account for that since dual CPU machines are in the vast minority

cyber_kinetist · on March 8, 2022

Though maybe we can get some NUMA-like affinity control for the M1 Ultra, so HPC control freaks can finally tune the fuck out of this hardware.

rovr138 · on March 8, 2022

I'm actually curious about memory.

Since they're 2 distinct chips, will a single chip be able to handle 128GB? Not sure how the interconnect works.

cyber_kinetist · on March 8, 2022

Yes, as all NUMA machines do, one CPU can access all memory, both local (to the CPU) and global (through the interconnect). The problem is that there is a significant latency cost when a CPU accesses non-local memory (limitations of the interconnect). So the HPC people writing their algorithms make sure that this happens at a minimal amount, by enforcing that the data each CPU is using is allocated locally as possible (ex. by using special affinity controls provided by libnuma)

I was just curious if these kinds of optimizations are possible in the M1 Ultra.

MBCook · on March 8, 2022

But IS there an interconnect?

The way Apple presented it sounded more like the chips talked at a lower layer, much like if it was all built as one physical chip, than when you have two normal chips with an interconnect fabric.

Someone will figure it out with benchmarks or something.

celeritascelery · on March 9, 2022

There is an interconnect. They just claim it is faster then competitors.

maronato · on March 8, 2022

Maybe, but performance doesn't usually increase linearly with cpu count anyway. See https://en.wikipedia.org/wiki/Amdahl's_law