Yes, as all NUMA machines do, one CPU can access all memory, both local (to the CPU) and global (through the interconnect). The problem is that there is a significant latency cost when a CPU accesses non-local memory (limitations of the interconnect). So the HPC people writing their algorithms make sure that this happens at a minimal amount, by enforcing that the data each CPU is using is allocated locally as possible (ex. by using special affinity controls provided by libnuma)
I was just curious if these kinds of optimizations are possible in the M1 Ultra.
The way Apple presented it sounded more like the chips talked at a lower layer, much like if it was all built as one physical chip, than when you have two normal chips with an interconnect fabric.
Someone will figure it out with benchmarks or something.
I was just curious if these kinds of optimizations are possible in the M1 Ultra.