This post doesn't have enough information to count as a good HN submission. Because it doesn't have enough information, the comments are almost all generic. Generic comments don't make for good threads—there needs to be specific information for people to sink their teeth into.
It would be be better to write a post about how you're making faster chips for LLMs, that everybody can learn something from.
I love the presentation though. This is actually the best startup homepage I've ever seen. Hand-typed HTML, 20 lines of CSS, no JavaScript, not even any images. Just some powerful bullet points that actually contain relevant information about the company.
A lack of information about the technology is understandable as it presumably doesn't exist yet. But I see that it's Google TPU and software people striking out on their own with the backing of a bunch of VCs and prominent researchers, and it seems pretty clear where it's going. Likely an evolution of or progression from what they were working on at Google, positioned for easy acquisition by any number of competitors.
I think George Hotz is right that ML ASIC companies need to invest in software far more than they typically do. I guess the plan here is to sidestep the need for generic software by focusing solely on transformers for text, maybe even only a couple of specific architectures. I don't really think that's a good strategy but it seems to be what they're describing.
I'm not in this field as others. so I'd really appreciate a comment from some experienced HN folks.
I wonder what's the end goal.
The ML/AI world seems to be changing fast. So one model approach might become "old tech" if something better comes.
Is our current state of the models is stable so the LLM glory would be the same in 1-2 years? Or suddenly there'll be new approach making this "tech" obsolete?
Historically speaking there is a good chance the transformers approach will be replaced by a new approach or a substantially altered transformers approach in the next five years. Efficiency will increase both on the software and the hardware front simultaneously, there is a lot to be optimized with the same parameter count.
What looks state of the art will probably look to people in 20 years how the 1885 Mercedes Benz car looks like to us in terms of efficiency, right now it's a bit of a bruteforce approach in general
Perhaps some sort of memristor crossbar array that could process large array calculations efficiently? Something like that could be pretty universal, until the tech moves to spiking neural networks. And even then the knowledge you gain from advancing on that path would likely transfer over.
It's interesting to contrast this with tinycorp, geohot's company, and his claim that basically you would have to spend a lot to make a better chip, so the more optimal, less capital intensive play, is to write better drivers for AMD cards.
AMD drivers are a higher priority but he also made tinygrad https://github.com/tinygrad/tinygrad
which is basically designed to minimize the software complexity and operations that a company like MatX needs to target. It's actually kind of a perfect complement to tinygrad.
What is the architecture's shape? A chain of cheap SRAM heavy chips as proposed in the Microsoft paper?
I feel like AMD's 7900 XTX strategy would be good too: a bunch of small, cheap (LPDDRX?) memory controller dies to form a massive bus for a central compute tile.
I don't want to be too cynical about the state of hardware for ML, but I don't see where this is going. Nvidia does not lack for competitors trying (and sometimes nominally succeeding) to build faster/cheaper/more efficient hardware. Yet still Nvidia is overwhelmingly the vendor of choice because the software story works. So long as Pytorch only practically works with Nvidia GPUs, everything else is little more than a rounding error.
I don't see MatX ending up any different than the legion of startups that have come already - either they get acquired by a bigger player, or they fade into obscurity.
There are more and better projects that can compile an existing PyTorch codebase into a more optimized format for a range of devices. Triton (which is part of PyTorch) TVM and the MLIR based efforts (like torch-MLIR or IREE) are big ones, but there are smaller fish like GGML and Tinygrad, or more narrowly focused projects like Meta's AITemplate (which works on AMD datacenter GPUs).
Hardware is in a strange place now... It feels like everyone but Cerebras and AMD/Intel was squeezed out, but with all the money pouring in, I think this is temporary.
In the aquihires I’ve been involved in investors did well, as owners they’re compensated as part of the acquisition as well. You see these sorts of specialized team acquisitions all the time in big tech companies where a foundational tech team is built that develops some key technology that they don’t have the critical mass, capital, brand, or vertical ability to bring to market. They’re acquired primarily for the team assembled and expertise, but the amounts paid can be extraordinary depending on how advanced their technology is and how crucial it is.
It would be be better to write a post about how you're making faster chips for LLMs, that everybody can learn something from.