Plenty of non-IT applications use lots of cores, e.g. physics simulations, const...

lifeisstillgood · on Feb 15, 2025

Yes - but the amount of code that loves Amdahls law, is tiny compared to amount code churned out each day that can never run parallel over 1M cores - no matter how clever a compiler gets.

I cannot work out if we pack enough parallel problems in the world or just lack a programming language to describe them

sroussey · on Feb 15, 2025

Computers are still Von-Neumann machines, and other architectures lost out due to the great returns on investment for that architecture. However, in the AI world, this might not be the case. For instance, neuromorphic computing is one example, and there are others. Or back to analog again! Superposition is instant—no slow adders with carry bits to propagate! Who knows. Fun times!

jjmarr · on Feb 15, 2025

Most computers use modified Harvard architecture, funnily enough. There's a shared memory space like von Neumann, but separated caches for instructions and data.

It's the best of both worlds, because from the CPU's perspective it gets to have separate lanes for instructions and data, but from the programmer's perspective it's one memory.

anthk · on Feb 16, 2025

Not always. Modern computers are like several computers networked into one, if you think about it. Since the DMA, that approach is not valid. Today we have IOMMU's. The 9front/plan9 guys are trying to write a kernel (Nix) which wants to exploit every concurrent core of your CPU at crazy scaling levels.

yencabulator · on Feb 22, 2025

"Are trying" is a pretty generous phrase for something that seems to last have been thrown over the fence in one-commit repos about 5 years ago.

https://lsub.org/nix/

https://github.com/fjballest/nixMarkIV

https://github.com/fjballest/nix.markII

imtringued · on Feb 15, 2025

I don't get this. One of the worst computational problems holding back robotics is non linear model predictive control. You have 1-2 ms of time to build and solve a QP or a series of QP problems in the non linear case over a horizon of N time steps. 100% accurate MPC is inherently sequential. You must calculate the time step t_1 before t_2, because your joint positions are influenced by the control signal u_1. This means that the problem is intractable, since it needs backtracking via branch and bound.

However, since the problem is intractable, you don't actually have to solve it. What you can do instead is perform random shooting in 32 different directions, linearize and then solve 32 quadratic problems and find the minimum over those 32. That is phase one. However, a cold start from phase one sucks, so there is a second phase, where you take an initial guess and refine it. Again you do the same strategy, but this time you can use the initial guess to carefully choose search directions and solve the next batch of 32 QPs and take the minimum over them.

Now here is the thing. Even this in itself won't save you. At the extreme top end you are going to have 20k decision variables for which you're going to solve an extremely sparse linear systems of equations.

SQP is iterative QP and QP is an iterative interior point or active set algorithm, so we are two iterative algorithms deep. A linear systems of equations can be solved iteratively, so let's make it a third. It turns out, the biggest bottleneck in this nesting of sequential algorithms isn't necessarily the sequential nature. It's multiplying a giant sparse 20000x20000 matrix with a 20000 wide vector and doing this over and over and over again. That is what is fucking impossible to do in the 2 millisecond time budget you've been given, not the sequential parts.

So what does Boston Dynamics do for their impressive application of MPC? They don't even try. They just linearize the MPC and let the non sequential QP solver run as many iterations as it can until time is up, meaning that they don't even solve to optimality!

Now you might wonder why someone would want non linear MPC in the first place if it is so impractical. The reason is that MPC provides a general compute scaling solution to many problems that would require a lot of human ingenuity. It is the bitter lesson. Back when QP solvers were to slow, people used the pseudo inverse on non constrained QP problems. It's time for faster parallel hardware to make QP obsolete and let SQP take over.

Yes, parallel hardware is the key to a sequential problem.

lifeisstillgood · on Feb 16, 2025

Ok - wow please point at more reading across this :-)

jltsiren · on Feb 15, 2025

You can easily use a million cores if you have a million independent tasks. Which you often have, if you are dealing with data rather than building a big centralized service.