RIP Mikeal, those early NodeConf adventures were life-changing to me, ripped me out of my entrenched MSFT tech life, and got me on a Mac. It's thanks to these events I was able make meaningful contributions to OSS, ultimately ended up giving dozens of talks, publishing books, and have fulfilling career. Without him my life looks very different.
Take them to small claims court. You can self-represent (not all that complex), they've to pay a lawyer to show up -- they're already in the hole for way more than they promised. Multiply this by the number of people, yeah they'd be praying for a CAL.
But then I'm paying hundreds or thousands of dollars of my time for maybe a few hundred dollars gain. Sure, it's more expensive for them in absolute terms, but it's more expensive for me in relative terms. Not going to get hundreds of people to do this. A class-action lawsuit can actually be positive EV for everyone involved.
(Actually, I don't know whom they'd send -- I think, for small claims court, they have to send a paralegal rather than a lawyer.)
The first computers cost millions of dollars and filled entire rooms to accomplish what we would now consider simple computational tasks. That same computing power now fits into the width of a finger nail. I don’t get how technologists balk at the cost of experimental tech or assume current tech will run at the same efficiency for decades to come and melt the planet into a puddle.
AGI won’t happen until you can fit enough compute that’d take several data center’s worth of compute into a brain sized vessel. So the thing can move around process the world in real time. This is all going to take some time to say the least. Progress is progress.
I thought you were going to say that now we're back to bigger-than-room sized computers that cost many millions just to perform the same tasks we could 40 years ago.
I of course mean we're using these LLMs for a lot of tasks that they're inappropriate for, and a clever manually coded algorithm could do better and much more efficiently.
> and a clever manually coded algorithm could do better and much more efficiently.
Sure, but how long would it take to implement this algorithm, and would that be worth it for one-off cases?
Just today I asked Claude to create a jq query that looks for objects with a certain value for one field, but which lack a certain other field. I could have spent a long time trying to make sense of jq's man page, but instead I spent 30 seconds writing a short description of what I'm looking for in natural language, and the AI returned the correct jq invocation within seconds.
Claude answers a lot of its questions by first writing and then running code to generate the results. Its only limitation is the access to databases and size of context window, both of which will be radically improved over the next 5 years.
just ask the LLM to solve enough problems (even new problems), cache the best, do inference time compute for the rest, figure out the best/ fastest implementations, and boom, you have new training data for future AIs
> "Those who invalidate caches know nothing; Those who know retain data." These words, as I am told, were spoken by Lao Tzi. If we are to believe that Lao Tzi was himself one who knew, why did he erase /var/tmp to make space for his project?
-- Poem by Cybernetic Bai Juyi, "The Philosopher [of Caching]"
The LLMs are now writing their own algorithms to answer questions. Not long before they can design a more efficient algorithm to complete any feasible computational task, in a millionth of the time needed by the best human.
> The LLMs are now writing their own algorithms to answer questions
Writing a python script, because it can't do math or any form of more complex reasoning is not what I would call "own algorithm". It's at most application of existing ones/calling APIs.
LLMs are probabilistic string blenders pulling pieces up from their training set, which unfortunately comes from us, humans.
The superset of the LLM knowledge pool is human knowledge. They can't go beyond the boundaries of their training set.
I'll not go into how humans have other processes which can alter their and collective human knowledge, but the rabbit hole starts with "emotions, opposable thumbs, language, communication and other senses".
> take several data center’s worth of compute into a brain sized vessel. So the thing can move around process the world in real time
How so? I'd imagine a robot connected to the data center embodying its mind, connected via low-latency links, would have to walk pretty far to get into trouble when it comes to interacting with the environment.
The speed of light is about three orders of magnitude faster than the speed of signal propagation in biological neurons, after all.
Many of humans' capabilities are pretrained with massive computing through evolution. Inference results of o3 and its successors might be used to train the next generation of small models to be highly capable. Recent advances in the capabilities of small models such as Gemini-2.0 Flash suggest the same.
Recent research from NVIDIA suggests such an efficiency gain is quite possible in the physical realm as well. They trained a tiny model to control the full body of a robot via simulations.
---
"We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation."
...
"HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life!"
> Similarly, many of humans' capabilities are pretrained with massive computing through evolution.
Hmm .. my intuition is that humans' capabilities are gained during early childhood (walking, running, speaking .. etc) ... what are examples of capabilities pretrained by evolution, and how does this work?
If you look at animals, they can walk in hours, not much time needed after being born. It takes us a longer time because we are born rather undeveloped to get the head out of the birth canal.
A more high level example, sea sickness is a evolutionary pre-learned thing, your body things it's poisoned and it automatically wants to empty your stomach.
The brain is predisposed to learn those skills. Early childhood experiences are necessary to complete the training. Perhaps that could be likened to post-training. It's not a one-to-one comparison but a rather loose analogy which I didn't make it precise because it is not the key point of the argument.
Maybe evolution could be better thought of as neural architecture search combined with some pretraining. Evidence suggests we are prebuilt with "core knowledge" by the time we're born [1].
See: Summary of cool research gained from clever & benign experiments with babies here:
Learning to walk doesn't seem to be particularly easy, having observed the process with my own children. No easier than riding a bike or skating, for which our brains are probably not 'predisposed'.
Walking is indeed a complex skill. Yet some animals walk minutes after birth. Human babies are most likely born premature due to the large brain and related physical constraints.
Young children learn to bike or skate at an older age after they have acquired basic physical skills.
Check out the reference to Core Knowledge above. There are things young infants know or are predisposed to know from birth.
The brain has developed, through evolution, very specific and organized structures that allow us to learn language and reading skills. If you have a genetic defect that causes those structures to be faulty or missing, you will have severe developmental problems.
That seems like a decent example of pretraining through evolution.
But maybe it's something more like general symbolic manipulation, and not specifically the sounds or structure of language. Reading is fairly new and unlikely to have had much if any evolutionary pressure in many populations who are now quite literate. Same seems true for music. Maybe the hardware is actually more general and adaptable and not just for language?
And reading and music co-evolved to be relatively easy for humans to do.
(See how computers have a much easier time reading barcodes and QR codes, with much less general processing power than it takes them to decipher human hand-writing. But good luck trying to teach humans to read QR codes fluently.)
I think of evolution as unassisted learning where agents compete with the each other for limited resources. Over time they get better and better at surviving by passing on genes. It never ends of course.
Your brain is well adapted to learning how to walk and speak.
Chimpanzees score pretty high on many tests of intelligence, especially short term working memory. But they can't really learn language: they lack the specialised hardware more than the general intelligence.
I mean, there are plenty - e.g. mimicking (say, the mother's face's emotions), which are precursors to learning more advanced "features". Also, even walking has many aspects pretrained (I assume it's mostly a musculoskeletal limitation that we can't walk immediately), humans are just born "prematurely" due to our relatively huge heads. Newborn horses can walk immediately without learning.
But there are plenty of non-learned control/movement/sensing in utero that are "pretrained".
The concern here is mainly on practicality. The original mainframes did not command startup valuations counted in fractions of the US economy, they did qualify for billions in investment.
This is a great milestone, but OpenAI will not be successful charging 10x the cost of a human to perform a task.
Hmm the link is saying the price of an LLM that scores 42 or above on MMLU has dropped 100x in 2 years, equating gpt 3.5 and llama 3.2 3B. In my opinion gpt 3.5 was significantly better than llama 3B, and certainly much better than the also-equated llama 2 7B. MMLU isn't a great marker of overall model capabilities.
Obviously the drop in cost for capability in the last 2 years is big, but I'd wager it's closer to 10x than 100x.
Or 10x the skill and speed of a human in some specific class of recurrent tasks. We don't need full super-human AGI for AI to become economically viable.
Companies routinely pay short-term contractors a lot more than their permanent staff.
If you can just unleash AI on any of your problems, without having to commit to anything long term, it might still be useful, even if they charged more than for equivalent human labour.
(Though I suspect AI labour will generally trend to be cheaper than humans over time for anything AIs can do at all.)
Maybe AGI as a goal is overvalued: If you have a machine that can, on average, perform symbolic reasoning better than humans, and at a lower cost, that's basically the end game, isn't it? You won capitalism.
Right now I can ask an (experienced) human to do something for me and they will either just get it done or tell me that they can’t do it.
Right now when I ask an LLM… I have to sit there and verify everything. It may have done some helpful reasoning for me but the whole point of me asking someone else (or something else) was to do nothing at all…
I’m not sure you can reliably fulfill the first scenario without achieving AGI. Maybe you can, but we are not at that point yet so we don’t know yet.
Not with the same depth. I might ask a friend to drop off a letter and I might verify that they did it, but I don’t have to verify that they didn’t mistake a Taco Bell or a dumpster as the post office.
It’s very scary to ask a friend to drop off a letter if the last scenario is even 1% within the realm of possibility.
My guess is this is an artifact of the RLHF part of the training. Answers like "I don't know" or "let me think and let's catch on this next week" are flagged down by human testers, which eventually trains LLM to avoid this path altogether. And it probably makes sense because otherwise "I don't know" would come up way too often even in cases where the LLM is perfectly able to give the answer.
> Right now I can ask an (experienced) human to do something for me and they will either just get it done or tell me that they can’t do it.
Finding reliable honest humans is a problem governments have struggled with for over a hundred years. If you have cracked this problem at scale you really need to write it up! There are a lot of people who would be extremely interested in a solution here.
It's not clear to me whether AGI is necessary for solving most of the issues in the current generation of LLMs. It is possible you can get there by hacking together CoTs with automated theorem provers and bruteforcing your way to the solution or something like that.
But if it's not enough then maybe it might come as a second-order effect (e.g. reasoning machines having to bootstrap an AGI so then you can have a Waymo taxi driver who is also a Fields medalist)
There are so called "yes-men" who can't say "no" in no situation. That's rooted in their culture. I suspect that AI was trained using their assistance. I mean, answering "I can't do that" is the simplest LLM path that should work often unless they gone out of their way to downrank it.
Philosophy of mind is the branch of philosophy that attempts to account for a very difficult problem: why there are apparently two different realms of phenomena, physical and mental, that are at once tightly connected and yet as different from one another as two things can possibly be.
Broadly speaking you can think that the mental reduces to the physical (physicalism), that the physical reduces to the mental (idealism), both reduce to some other third thing (neutral monism) or that neither reduces to the other (dualism). There are many arguments for dualism but I’ve never heard a philosopher appeal to “magic spirits” in order to do so.
Dualism has nothing to do with it. There are more things on heaven and earth then just computable functions in the mathematical sense.
(In fact, the very idea of "computable functions" was invented to narrow down the space of "all things" to something much smaller, tighter and manageable. And now we've come full circle and apparently everything in the universe is a computable function? Well, if all you have is a hammer, I guess everything must necessarily look like a nail.)
Intelligence is about learning from few examples and generalising to novel solutions. Increasing compute so that exploring the whole problem space is possible is not intelligence. There is a reason the actual ARC-AGI price has efficiency as one of the success requirements. It is not so that the solutions scale to production and whatnot, these are toy tasks. It is to help ensure that it is actually an intelligent system solving these.
So yeah, the o3 result is impressive but if the difference between o3 and the previous state of art is more compute to do a much longer CoT/evaluation loop, I am not so impressed. Reminder that these problems are solved by humans in seconds, ARC-AGI is supposed to be easy.
Do you think intelligence exists without prior experience? For instance, can someone instantly acquire a skill—like playing the piano—as if downloading it in The Matrix? Even prodigies like Mozart had prior exposure. His father, a composer and music teacher, introduced him to music from an early age. Does true intelligence require a foundation of prior knowledge?
> Does true intelligence require a foundation of prior knowledge?
This is the way I think about it.
I = E / K
where I is the intelligence of the system, E is the effectiveness of the system, and K is the prior knowledge.
For example, a math problem is given to two students, each solving the problem with the same effectiveness (both get the correct answer in the same amount of time). However, student A happens to have more prior knowledge of math than student B. In this case, the intelligence of B is greater than the intelligence of A, even though they have the same effectiveness. B was able to "figure out" the math, without using any of the "tricks" that A already knew.
Now back to your question of whether or not prior knowledge is required. As K approaches 0, intelligence approaches infinity. But when K=0, intelligence is undefined. Tada! I think that answers your question.
Most LLM benchmarks simply measure effectiveness, not intelligence. I conceptualize LLMs as a person with a photographic memory and a low IQ of 85, who was given 100 billion years to learn everything humans have ever created.
That's the fascinating thing about lava/magma's power to simply erase what was there before. Plate tectonics does a similar thing when continents scrape each other clean. For all we know, Earth had an advanced civilization on it way before us and we'd have zero ways of knowing about it.
This is the new normal in 2024. Hearing about a layoff and reaching out to friends to check if they're safe from it. It's pretty messed up that world's richest companies are using layoffs as a mechanism to fix their balance sheet. My company never went for this lever. Also when you point out flaws in hiring processes at these large companies they swear up and down that it's absolutely necessary. So much waste, so many people chewed out in the process. Clearly there has to be a better way.
Yet it still sells vaporware. I was really excited for this company, but the self driving promises and fails, in addition to build quality issues made it a no go for me. While I buy new/unproven tech all the time, I guess I draw the line at cars.
It should. Cars get recalled for serious flaws. That means Tesla is shipping flawed products to paying customers that get used in the road, and then try to fix those OTA like some app on a phone. That is bad engineering, period.
Well, I don't think I'd be much bothered if I'm honest. Instead of going to the mechanic's it just gets fixed while parked outside? I think I would prefer that to my Subaru.
You still don't get it. If a product requires so many fuxes after being shipped, it was not fully developed when shipped, and defective in tip of it. Hardware ain't software, and those recalls are just the peak of the iceberg, because a lot smaller issues come along withbsaid bad engineering that cannot be fixed with a recall: suspension, power train...
And the latter already shows in data published by the German TÜV (cars here get an official road safety check every two years, the results of which are being aggregated by brand and model and published regularly): Teslas are among the cars with the highest failure rates for cars older then three years.
I mean, my car doesn't have a boombox or pedestrian warning system, so the former can't beat the latter on it. If the latter is louder than the former I wouldn't exactly be actually upset.
This honestly sounds like the conversations I'd have with people about the iPhone. Somehow it was the worst thing ever made but everyone who had a lot of money was buying one. When I eventually got around to getting it, it was awesome.
A car with quality issues on structural and safety critical parts, e.g. suspension, is more like a faulty board in a phone that risks making the battery go boom. Nothing that can be fixed OTA, but something that needs repairs in a workshop. And statistics, now that Teslas are on the market long enough for us to have those, show that being the case with Teslas.
This is decidedly not a problem of the entertainment system. And all those recalls Tesla has, hint at some engineering deficiencies. Same for all other brands and cars with similar problems, which absolutely do exist. E.g. VW quality is in decline for decades by now.
Just want to remind you that Tesla was delivering vehicles that didn't even have their seats attached properly[0]. Build quality issues are certainly something legacy automakers deal with -- but Tesla has generally brought these issues to an entirely new level.
Citation badly needed, because the evidence is against you.
I bought a Model X three years ago. It's been in service more than 10 times. My neighbor bought a Model X this year. I asked him what was broken on it, and he laughed and said "Everything".
I bought a brand new Scion FR-S, the first year if the first model ever. I've had zero issues and zero build quality issues in 100k miles. Same with our F-250.
The bigger issue is that the actual graphics themselves look the same as the previous game. I’m sure you can do side-by-side stuff and show it’s better, but I was hoping for a tour-de-force of amazing looking cities. A no buy for me until something changes.