I had preview access for a couple of weeks. I've written up my initial notes so far, focusing on core model characteristics, pricing (extremely competitive) and lessons from the model card (aka as little hype as possible): https://simonwillison.net/2025/Aug/7/gpt-5/
> In my own usage I’ve not spotted a single hallucination yet
Did you ask it to format the table a couple paragraphs above this claim after writing about hallucinations? Because I would classify the sorting mistake as one
Out of interest, how much does the model change (if at all) over those 2 weeks? Does OpenAI guarantee that if you do testing from date X, that is the model (and accompaniments) that will actually be released?
I know these companies do "shadow" updates continuously anyway so maybe it is meaningless but would be super interesting to know, nonetheless!
It changed quite a bit - we got new model IDs to test every few days. They did tell us when the model was "frozen", and I ran my final tests against those IDs.
OpenAI and Anthropic don't update models without changing their IDs, at least for model IDs with a date in them.
OpenAI do provide some aliases, and their gpt-5-chat-latest and chatgpt-4o-latest model IDs can change without warning, but anything with a date in (like gpt-5-2025-08-07) stays stable.
This post seems far more marketing-y than your previous posts, which have a bit more criticality to them (such as your Gemini 2.5 blog post here: https://simonwillison.net/2025/Jun/17/gemini-2-5/). You seem to gloss over a lot of GPT-5's shortcomings and spend more time hyping it than other posts. Is there some kind of conflict of interest happening?
You really think so? My goal with this post was to provide the non-hype commentary - hence my focus on model characteristics, pricing and interesting notes from the system card.
I called out the prompt injection section as "pretty weak sauce in my opinion".
The reason there's not much negative commentary in the post is that I genuinely think this model is really good. It's my favorite model right now. The moment that changes (I have high hopes for Claude 5 and Gemini 3) I'll write about it.
From the guidelines: Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.
Maybe there is a misconception about what his blog is about. You should treat it more like a YouTuber reporting, not an expert evaluation, more like an enthusiast testing different models and reiterating some points about them, but not giving the opinions of an expert or ML professional. His comment history on this topic in this forum clearly shows this.
It’s reasonable that he might be a little hyped about things because of his feelings about them and the methodology he uses to evaluate models. I assume good faith, as the HN guidelines propose, and this is the strongest plausible interpretation of what I see in his blog.
It probably depends on the definition of "expert" here. Based on my definition, experts are people who write the LLM papers I read (some of them are my colleagues), people who implement them, people that push the field forward and PhD researchers blogs that go into depth and show understanding of how attention and transformers work, including underlying math and theory. Based on my own knowledge, experience (I'm working on LLMs in the field) and my discussions with people I consider experts in my day job I wouldn't add you to this category, at least not yet.
Based on my reading of some of your blogs and reading your discussions with others on this site, you still lack technical depth and understanding of the underlying mechanisms at what I would call an expert level. I hope this doesn't sound insulting, maybe you have a different definition of "expert". I also do not say you lack the capacity to become an expert someday. I just want to explain why, while you consider yourself an expert, some people could not see you as an expert. But as I said, maybe it's just different definitions. But your blogs still have value, a lot of people read them and find them valuable, so your work is definitely worthwhile. Keep up the good work!
Yup, I have a different definition of expert. I'm not an expert in training models - I'm an expert in applications of those models, and how to explain those applications to other people.
AI engineering, not ML engineering, is one way of framing that.
I don't write papers (I don't have the patience for that), but my work does get cited in papers from time to time. One of my blog posts was the foundation of the work described in the CaMeL paper from DeepMind for example: https://arxiv.org/abs/2503.18813
If you dont mind answering, is there any implication of not getting preview access if you are negative or critical? Asking because other companies have had such dynamics with people who write about their products