Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We'll see.

There's now another game in town, exemplified by Apple's switch to ARM chips for its Macs. I'll keep an eye on the AVX2 vs. AVX-512 performance gap on Zen4+, but my working hypothesis is that my SIMD-handcoding time will be better spent on improving ARM support than upgrading AVX2 code to AVX-512 for the foreseeable future.



How about porting to Highway, that gets you AVX-512, NEON and SVE(2) from a single rewrite :)


I'd probably use Highway in a new project; thanks for your work on it! In my main existing project, though, Highway-like code already exists as a side effect of supporting 16-byte vectors and AVX2 simultaneously, and I'd also have to give up the buildable-as-C99 property which has occasionally simplified e.g. FFI development.


:) C99 for FFI makes sense. It's pretty common to have a C-like function as the entry point for a SIMD kernel. That means it's feasible to build only the implementation as C++, right?



I'm a huge simp for M1 too (and there's SVE there too). Yeah for client stuff if you can get people to just buy a macbook that's the best answer right now, if that does their daily tasks. Places need to start thinking about building ARM images anyway, for Ampere and Graviton and other cost-effective server environments if nothing else. If you are that glued at the hip to x86 is time to look at solving this problem.

Apple's p-cores get the limelight but the e-cores are simply ridiculous for their size... they are 0.69mm^2 vs 1.7mm2 for gracemont, excluding cache. Gracemont is Intel 7, so it's a node behind, but, real-world scaling is about 1.5-1.6x between 5nm and 6nm so that works out to about 1.1mm2 for Avalanche if it were 7nm, for equal/better performance to gracemont, at much lower power.

https://www.reddit.com/r/hardware/comments/qlcptr/m1_pro_10c...

Sierra Forest (bunch of nextmont on a server die, like Denverton) looks super interesting and I'd absolutely love to see an Apple equivalent, give me 256 blizzard cores on a chiplet and 512 or 1024 on a package. Or even just an M1 Ultra X-Serve would be fantastic (although the large GPU does go unutilized). But I don't think Apple wants to get into that market so far from what I've seen.

(tangent but everyone says "Gracemont is optimized for size not efficiency!" and I don't know what that means in a practical sense. High-density cell libraries are both smaller and more efficient. So if people meant that they were using high-performance libraries that would be both bigger and less efficient (but clock higher). If it's high density it'd be smaller and more efficient but clock lower. Those two things go together. And yes everyone uses a mix of different types of cells, with high-performance cells on the timing hot-path... but "gracemont is optimized for size not efficiency" has become this meme that everyone chants and I don't know what that actually is supposed to mean. If anyone knows what that's supposed to be, please do tell.)

(also, as you can see from the size comparison... despite the "it's optimized for size" meme, gracemont still isn't really small, not like Blizzard is small. they're using ~50% more transistors to get to the same place, and it's almost half the size of a full zen3 core with SMT and all the bells and whistles... I really think e-cores are where the music stops with the x86 party, I think i-cache and decoders are fine on the big cores but as you scale downwards they start taking up a larger and larger portion of the core area that remains... it is Amdahl's Law in action with area, if i-cache and decoding doesn't scale then reducing the core increases the fraction devoted to i-cache/decoding. And if you scale it down then you pay more penalty for x86-ness in other places, like having to run the decoder. And you have to run the i-cache at all times even when the chip is idling, otherwise you are decoding a lot more. It just is a lot of power overhead for the things you use an e-core for.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: