Hacker Newsnew | past | comments | ask | show | jobs | submit | vessenes's commentslogin

I like how he's started preliminarily saying "Now listen, I don't make any trades on this fine thinking. And that's okay by me." So far below the standards of public markets due diligence. Sadly credulous readers will follow his advice. Which could turn out to be accurate in the public markets -- who knows? But if it's accurate it will be an accident, because the quality of analysis is so poor we should not call it analysis.

It's not clear that there's any sort of durable advantage to the provider here -- in fact, oAI started as the partner with apple a couple of years ago, and is reportedly unhappy with the outcomes.

The hard part is not distilling a frontier model down into a specific use case when you have hundreds of millions of users, the hard part is (apparently) re-architecting your mobile OS to work with such a model rather than fight it. Those architectural benefits accrue to apple, as will future datasets and expertise, and the benefit of having some distillation working 24/7 on prem.

Anyway, where I think you're going to be grumpy in two years is that switching the underlying model is going to require a jailbreak, and that you wish they'd made the os much more deeply open for agentic interaction, not that it's gemini - it's just not the valuable part of the story for Apple or for users.


This is all pretty interesting. I think seeing every possible strategy compete with every other one makes for useful conversation, and his summary makes some sense to me, which is depending on ruleset, getting access to a pocket of what Stephen calls computational irreducibility generally can gain you the upper hand across a wide range of strategies is an interesting CS / Combinatorics result.

Probably most interesting is just throwing down a bunch of strategies that are provably better than tit for tat in rule constrained environments, and showing that some more complicated form of tit for tat doesn’t win as you get more space than your opponents - better is to manipulate simpler opponents into predictable behavior.

Anwyay, this particular Wolfram essay was devoid of name dropping, and full of interesting (if occasionally hard to parse) dense infographics, I enjoyed it and learned something.


Yes.

SpaceX IPO is slated to be $75-80bn — the market has size for that. We also have seen robust options and finance markets for AAPL and NVDA over the last years that make the broader ecosystem not overly worrying in my armchair opinion.

I’m not clear how much crossover demand there is between SX and Anthropic/oAI — that seems like the more interesting question. I’m guessing if we had Anthropic/oAI launching at the same time we’d see some pretty interesting capital dynamics.


> if we had Anthropic/oAI launching at the same time

Don't we have exactly that? There are S-1 announcements for SpaceX, Anthropic, and OpenAI. Google is selling to raise money for infra (IIRC). There's an absurd amount of money flowing in at present (prospectively at least).


The reality is that the large banks running these IPOs will know, to an extreme level of granularity, how much demand for the IPO there is at the chosen price point, and will advise accordingly.

None of the companies needs an IPO right now, with the possible exception of oAI — I haven’t looked at their financials recently. But SX is cashflow positive as of today, and Anthropic is able to become so without giving up much on their R&D program. So for those two, it’s a matter of timing.

Like a video game release schedule or a film release, SX has carved out a window and is going first, and regardless of messaging, all the teams are going to be watching it VERY VERY carefully. If it goes well, I’d expect Anthropic to jump next.

If that goes well, oAI would likely go right after. If it goes mid, oAI may wait to improve their financial story or fundraise private at worse valuations for a while, or, or, or.

Agreed that the dream for the next guys down the road is to pick up some recycled capital gains from sx and of course some new capital. If SX is a flop, then these IPO dreams will slow down for a minute.


None of these companies are worth the numbers being tossed around, but SpaceX especially so.

Its Schrodinger's IPO: the space business is so successful how could you question the company's worth? You can't afford to miss out on the next biggest AI business to invest in!

What's going to happen is the music will stop and it's just a question of who cashed in when it does. OpenAI are easily the most vulnerable here.


I was under the impression SpaceX was going to be a trillion dollar company.

The media and market is hyping these three companies up to be all trillion dollar companies.


Afaik SpaceX is only putting 5% of its shares up on the public market when it IPOs (newly minted shares, diluting the existing private shareholders).

So the markets only "need to absorb" $75B when SpaceX IPOs, not its whole $1.7T valuation. At least until the lockup period expires.


Gotcha, that seems a bit more manageable for the market to absorb.

I think it will still be a bit tight with Anthropic and OpenAI IPO'ing at similar times however.


This looks great. Well reasoned, tons of work put into eval, thanks for building it.

It strikes me as kind of wild that good evals can drive tens to hundreds of millions of dollars of compute deployment in the wild — there’s something new and collaborative and competitive about the eval / frontier model race that’s quite interesting..

In this case “shorter actually mergable patches that open source maintainers would accept” feels like a great thing to deliver to the world.

I didn’t deep dive into good and bad patches, but I wonder if swyx or others on the team have predictions on saturation. Both when, and how useful will it be? That is, do you guys think this test is broad enough as written to get better behavior out of models, and if there is saturation on this test, will we see generalized better patch / coding behavior?


thanks - credit to silas, eric, ben, and team for the depth of the evals, and the rest of the research team for doing the transcript reading parties lol

by nature of being based on open source, frontiercode public will saturate very very quickly. frontiercode main will be >80% in less than a year. hopefully diamond will last a bit longer. we can do annual refreshes, thats not my strategy for staying relevant - what i'm more excited to get funding for is private held out version of frontiercode based on repros of real enterprise customer problems. in an ideal agent lab (https://latent.space/p/agent-labs) you meticulously build up this domain understanding and that is essentially why both model labs and serious customers come to you.


Interesting. So frontiercode-IBM-Diamond is a thing you’d hope to sell the creation of and certification of? And if it’s published then you’d expect model providers to train to forntiercode-IBM-Pro or whatever and publish it so that it would be considered a good model to use inside IBM? (Obviously just a random corporate choice here).

this is def alpha coded in my opinion - dad of both z and alpha types. we didn't hear about side parts -> it is not gen z

Rizz was definitely gen z as well as some of the other slang on here.


For your own safety do not read or be advised by Ed Zitron. By all means skip the SpaceX ipo if you like: makes sense. But Ed is neither perceptive nor correct historically.

Case in point: a lockup period ending matching with mandated index fund buying is emphatically good for IPO buyers: it adds liquidity to a major cliff every IPO company faces: liquidity seeking by insiders on a schedule.

Now it may be bad for axed buyers like pension funds but buy side liquidity coming in to a company is always good for existing shareholders. Reading Ed would make you think the opposite.


>> a lockup period ending matching with mandated index fund buying is emphatically good for IPO buyers

I cant believe you wrote this. You are making Ed Zitron case for him. And the lockup period in this case has been reduced to 15 days or less:

https://youtu.be/T8e2FbwN7dw?t=96

https://youtu.be/T8e2FbwN7dw?t=123


> a major cliff every IPO company faces: liquidity seeking by insiders on a schedule.

LOL, so the insiders can dump their shares. This is exactly What Zitron says. Maybe we should have Mark Karpeles' or SBF's opinion on this matter, too.


I agree the cost curve has shifted. But if we take the Mozilla team's Mythos report as a broad baseline, you need to hire something like 10 security engineers to equal the Mythos productivity. Put another way, everyone's under hiring security by a LOT right now, we just have been lucky enough to see similar under hiring on hackers.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: