Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Apple said that Max's CPU is up to 2.5x faster and the Ultra's CPU is up to 3.8x faster than whatever intel CPU is in the iMac Pro, so you're getting about 52% more CPU performance with Ultra's doubling in CPU cores vs the Max, so definitely feeling some linear scaling limitations with the interconnect.


There has to be overhead to not have to deal with the multi-cpu architecture in software.

It's impressive


I don't work in the relevant space, but what makes coding for multi-cpu substantially harder than programming for multiple cores? Is it just having to manage separate memory for each CPU?


Essentiality you're fighting cache coherency.

When you work with multiple cores they'll likely share the same L2 or at least L3 cache. Multiple CPUs often you pay the cost of copying that L3 cache over or need an L4 or worst case you go back to system RAM.

Each level that you go out further can drastically reduce performance, so you need to try and stay on nearby cores where possible.

Most devs don't account for that since dual CPU machines are in the vast minority


Though maybe we can get some NUMA-like affinity control for the M1 Ultra, so HPC control freaks can finally tune the fuck out of this hardware.


I'm actually curious about memory.

Since they're 2 distinct chips, will a single chip be able to handle 128GB? Not sure how the interconnect works.


Yes, as all NUMA machines do, one CPU can access all memory, both local (to the CPU) and global (through the interconnect). The problem is that there is a significant latency cost when a CPU accesses non-local memory (limitations of the interconnect). So the HPC people writing their algorithms make sure that this happens at a minimal amount, by enforcing that the data each CPU is using is allocated locally as possible (ex. by using special affinity controls provided by libnuma)

I was just curious if these kinds of optimizations are possible in the M1 Ultra.


But IS there an interconnect?

The way Apple presented it sounded more like the chips talked at a lower layer, much like if it was all built as one physical chip, than when you have two normal chips with an interconnect fabric.

Someone will figure it out with benchmarks or something.


There is an interconnect. They just claim it is faster then competitors.


Maybe, but performance doesn't usually increase linearly with cpu count anyway. See https://en.wikipedia.org/wiki/Amdahl's_law




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: