Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had preview access for a couple of weeks. I've written up my initial notes so far, focusing on core model characteristics, pricing (extremely competitive) and lessons from the model card (aka as little hype as possible): https://simonwillison.net/2025/Aug/7/gpt-5/


Related ongoing thread:

GPT-5: Key characteristics, pricing and model card - https://news.ycombinator.com/item?id=44827794


> In my own usage I’ve not spotted a single hallucination yet

Did you ask it to format the table a couple paragraphs above this claim after writing about hallucinations? Because I would classify the sorting mistake as one


That wasn't a hallucination, that was it failing to sort things correctly.


So a hallucination would have been if it made up a new row?

What about the „9.9 / 9.11“ example?

It’s unclear to me where to draw the line between skill issue and hallucination. I image that one influences the other?


Out of interest, how much does the model change (if at all) over those 2 weeks? Does OpenAI guarantee that if you do testing from date X, that is the model (and accompaniments) that will actually be released?

I know these companies do "shadow" updates continuously anyway so maybe it is meaningless but would be super interesting to know, nonetheless!


It changed quite a bit - we got new model IDs to test every few days. They did tell us when the model was "frozen", and I ran my final tests against those IDs.

OpenAI and Anthropic don't update models without changing their IDs, at least for model IDs with a date in them.

OpenAI do provide some aliases, and their gpt-5-chat-latest and chatgpt-4o-latest model IDs can change without warning, but anything with a date in (like gpt-5-2025-08-07) stays stable.


In the interests of gathering these pre-release impressions, here's Ethan Mollick's writeup: https://www.oneusefulthing.org/p/gpt-5-it-just-does-stuff

Thank you to Simon; your notes are exactly what I was hoping for.


This post seems far more marketing-y than your previous posts, which have a bit more criticality to them (such as your Gemini 2.5 blog post here: https://simonwillison.net/2025/Jun/17/gemini-2-5/). You seem to gloss over a lot of GPT-5's shortcomings and spend more time hyping it than other posts. Is there some kind of conflict of interest happening?


You really think so? My goal with this post was to provide the non-hype commentary - hence my focus on model characteristics, pricing and interesting notes from the system card.

I called out the prompt injection section as "pretty weak sauce in my opinion".

I did actually have a negative piece of commentary in there about how you couldn't see the thinking traces in the API... but then I found out I had made a mistake about that and had to mostly remove that section! Here's the original (incorrect) text from that: https://gist.github.com/simonw/eedbee724cb2e66f0cddd2728686f... - and the corrected update: https://simonwillison.net/2025/Aug/7/gpt-5/#thinking-traces-...

The reason there's not much negative commentary in the post is that I genuinely think this model is really good. It's my favorite model right now. The moment that changes (I have high hopes for Claude 5 and Gemini 3) I'll write about it.


I am seeing the conflict from other tech influencers who were given early access or even invited to OpenAI events pre-release.


I was invited to the OpenAI event pre-release too - here's my post about that: https://simonwillison.net/2025/Aug/7/previewing-gpt-5/


Like many other industries, you probably lose preview access if you are negative.


Also, when most people have already dismissed OpenAI’s open weight models as trash, there’s this: https://simonwillison.net/2025/Aug/5/gpt-oss/

Suspicious.


I wrote that before "most people had dismissed" those weights.

I continue to think that the 12B model is something of a miracle. I've spent less time with the 120B one because I can't run it on my own machine.


That’s fair, I retract my suspicion


From the guidelines: Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.


I don't think that this applies to commenting on someone's blog.


Yeah this criticism was pretty mild, I don't think it violates that HN guideline personally.


Maybe mild, sure, but it's a clear shilling accusation.


Maybe there is a misconception about what his blog is about. You should treat it more like a YouTuber reporting, not an expert evaluation, more like an enthusiast testing different models and reiterating some points about them, but not giving the opinions of an expert or ML professional. His comment history on this topic in this forum clearly shows this.

It’s reasonable that he might be a little hyped about things because of his feelings about them and the methodology he uses to evaluate models. I assume good faith, as the HN guidelines propose, and this is the strongest plausible interpretation of what I see in his blog.


I consider myself an expert in the field of LLMs, and I try to write in a way that supports that.


It probably depends on the definition of "expert" here. Based on my definition, experts are people who write the LLM papers I read (some of them are my colleagues), people who implement them, people that push the field forward and PhD researchers blogs that go into depth and show understanding of how attention and transformers work, including underlying math and theory. Based on my own knowledge, experience (I'm working on LLMs in the field) and my discussions with people I consider experts in my day job I wouldn't add you to this category, at least not yet.

Based on my reading of some of your blogs and reading your discussions with others on this site, you still lack technical depth and understanding of the underlying mechanisms at what I would call an expert level. I hope this doesn't sound insulting, maybe you have a different definition of "expert". I also do not say you lack the capacity to become an expert someday. I just want to explain why, while you consider yourself an expert, some people could not see you as an expert. But as I said, maybe it's just different definitions. But your blogs still have value, a lot of people read them and find them valuable, so your work is definitely worthwhile. Keep up the good work!


Yup, I have a different definition of expert. I'm not an expert in training models - I'm an expert in applications of those models, and how to explain those applications to other people.

AI engineering, not ML engineering, is one way of framing that.

I don't write papers (I don't have the patience for that), but my work does get cited in papers from time to time. One of my blog posts was the foundation of the work described in the CaMeL paper from DeepMind for example: https://arxiv.org/abs/2503.18813


If you dont mind answering, is there any implication of not getting preview access if you are negative or critical? Asking because other companies have had such dynamics with people who write about their products


There was not at all, and if there was I genuinely would have walked out of there. I don't need preview access for the work that I do.


If Simon isn't an expert then I am not sure who is


Yes I noticed the same. This is very concerning




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: