MatX: Faster Chips for LLMs

dang · on Aug 5, 2023

This post doesn't have enough information to count as a good HN submission. Because it doesn't have enough information, the comments are almost all generic. Generic comments don't make for good threads—there needs to be specific information for people to sink their teeth into.

It would be be better to write a post about how you're making faster chips for LLMs, that everybody can learn something from.

modeless · on Aug 5, 2023

I love the presentation though. This is actually the best startup homepage I've ever seen. Hand-typed HTML, 20 lines of CSS, no JavaScript, not even any images. Just some powerful bullet points that actually contain relevant information about the company.

A lack of information about the technology is understandable as it presumably doesn't exist yet. But I see that it's Google TPU and software people striking out on their own with the backing of a bunch of VCs and prominent researchers, and it seems pretty clear where it's going. Likely an evolution of or progression from what they were working on at Google, positioned for easy acquisition by any number of competitors.

I think George Hotz is right that ML ASIC companies need to invest in software far more than they typically do. I guess the plan here is to sidestep the need for generic software by focusing solely on transformers for text, maybe even only a couple of specific architectures. I don't really think that's a good strategy but it seems to be what they're describing.

dang · on Aug 6, 2023

I agree, it's a nice layout.

Tuna-Fish · on Aug 5, 2023

The how is on the page, after no more than 25 words:

> Approach:

> * We target just LLMs, whereas GPUs target all ML models. LLMs are different. Our hardware and software can be much simpler.

> * We combine deep domain experience, a few key ideas, and a lot of careful engineering.

dang · on Aug 6, 2023

That's not enough information to make the post count as substantive. This is not a borderline call!

The OP is something between a landing page and a job ad, not an in-depth article. This is not a borderline call!

rock_artist · on Aug 5, 2023

I'm not in this field as others. so I'd really appreciate a comment from some experienced HN folks.

I wonder what's the end goal.

The ML/AI world seems to be changing fast. So one model approach might become "old tech" if something better comes.

Is our current state of the models is stable so the LLM glory would be the same in 1-2 years? Or suddenly there'll be new approach making this "tech" obsolete?

7moritz7 · on Aug 5, 2023

Historically speaking there is a good chance the transformers approach will be replaced by a new approach or a substantially altered transformers approach in the next five years. Efficiency will increase both on the software and the hardware front simultaneously, there is a lot to be optimized with the same parameter count.

What looks state of the art will probably look to people in 20 years how the 1885 Mercedes Benz car looks like to us in terms of efficiency, right now it's a bit of a bruteforce approach in general

aperrien · on Aug 5, 2023

Perhaps some sort of memristor crossbar array that could process large array calculations efficiently? Something like that could be pretty universal, until the tech moves to spiking neural networks. And even then the knowledge you gain from advancing on that path would likely transfer over.

throwawayadvsec · on Aug 5, 2023

there are already dozens of use cases that would become (more) profitable/interesting if we had chips optimized for LLMs

they were asic for BTC at a way earlier stage

+they can always pivot in a year if the market changes too much

DoingIsLearning · on Aug 5, 2023

> they can always pivot in a year

I think you are downplaying a bit the timescale of going from ideation to tape-out in ASIC design.

PeterStuer · on Aug 5, 2023

"Be the compute platform for AGI" and "we target just LLMs" feel to me to be contradicting statements.

fragmede · on Aug 5, 2023

It's interesting to contrast this with tinycorp, geohot's company, and his claim that basically you would have to spend a lot to make a better chip, so the more optimal, less capital intensive play, is to write better drivers for AMD cards.

https://geohot.github.io/blog/jekyll/update/2023/05/24/the-t...

He's still on the AMD drivers train, judging by his Twitter post from 4 days ago, so we'll see where things go.

https://twitter.com/realgeorgehotz/status/168616581138659737...

ilaksh · on Aug 5, 2023

AMD drivers are a higher priority but he also made tinygrad https://github.com/tinygrad/tinygrad which is basically designed to minimize the software complexity and operations that a company like MatX needs to target. It's actually kind of a perfect complement to tinygrad.

brucethemoose2 · on Aug 5, 2023

What is the architecture's shape? A chain of cheap SRAM heavy chips as proposed in the Microsoft paper?

I feel like AMD's 7900 XTX strategy would be good too: a bunch of small, cheap (LPDDRX?) memory controller dies to form a massive bus for a central compute tile.

gkaul · on Aug 10, 2023

Could you please paste a reference to the Microsoft paper? Thanks

cpgxiii · on Aug 5, 2023

I don't want to be too cynical about the state of hardware for ML, but I don't see where this is going. Nvidia does not lack for competitors trying (and sometimes nominally succeeding) to build faster/cheaper/more efficient hardware. Yet still Nvidia is overwhelmingly the vendor of choice because the software story works. So long as Pytorch only practically works with Nvidia GPUs, everything else is little more than a rounding error.

I don't see MatX ending up any different than the legion of startups that have come already - either they get acquired by a bigger player, or they fade into obscurity.

brucethemoose2 · on Aug 5, 2023

> So long as Pytorch only practically works with Nvidia GPUs, everything else is little more than a rounding error.

This is changing.

https://github.com/merrymercy/awesome-tensor-compilers

There are more and better projects that can compile an existing PyTorch codebase into a more optimized format for a range of devices. Triton (which is part of PyTorch) TVM and the MLIR based efforts (like torch-MLIR or IREE) are big ones, but there are smaller fish like GGML and Tinygrad, or more narrowly focused projects like Meta's AITemplate (which works on AMD datacenter GPUs).

Hardware is in a strange place now... It feels like everyone but Cerebras and AMD/Intel was squeezed out, but with all the money pouring in, I think this is temporary.

mmaunder · on Aug 5, 2023

It’s like a startup optimizing Java applet performance as the web was taking off.

fnordpiglet · on Aug 5, 2023

I’ll wager this is a aquirhire play, with no product based exit but rather offering a asic dev team and prototype to an established hardware company.

cool-RR · on Aug 5, 2023

Would investors really be interested in an acquihire play? Aren't they looking for a big multiplier on their investment?

fnordpiglet · on Aug 5, 2023

In the aquihires I’ve been involved in investors did well, as owners they’re compensated as part of the acquisition as well. You see these sorts of specialized team acquisitions all the time in big tech companies where a foundational tech team is built that develops some key technology that they don’t have the critical mass, capital, brand, or vertical ability to bring to market. They’re acquired primarily for the team assembled and expertise, but the amounts paid can be extraordinary depending on how advanced their technology is and how crucial it is.

klysm · on Aug 5, 2023

I remember seeing discussion a couple months back about how long it would take to get ASICS for LLMs. I guess the answer is this long

DeathArrow · on Aug 5, 2023

It is possible to build ASICS for a specific LLM? Would that be cost effective?

psychphysic · on Aug 5, 2023

They need to get ChatGPT to write a cooler website for them.