I'm guessing that the "ML accelerator" in the CPU cores means one of ARM's SME e...

hmottestad · on May 7, 2024

Apple has the NPU (also called Apple Neural Engine), which is specific hardware for running inference. Can't be used for LLMs though at the moment, maybe the M4 will be different. They also have a vector processor attached to the performance cluster of the CPU, they call the instruction set for it AMX. I believe that that one can be leveraged for faster LLM inferencing.

https://github.com/corsix/amx