Gave the same prompt to GPT 5.4 (high) and Opus 4.6 (high).
GPT 5.4 implemented the feature, refactored the code (was not asked to), removed comments that were not added in that session, made the code less readable, and introduced a bug. "Undo All".
Opus 4.6 correctly recognized that the feature is already implemented in the current code (yeah, lol) and proposed implementing tests and updating the docs.
Opus 4.6 is still the best coding agent.
So yeah, GPT 5.4 (high) didn't even check if the feature was already implemented.
Tried other tasks, tried "medium" reasoning - disappointment.
I make ChatGPT and Claude code review each other's outputs. ChatGPT thinks its solutions are better than what Claude produces. What was more surprising to me is that Claude, more often than not, prefers ChatGPT's responses too.
I am to sure one can really extrapolate much out of that, but I do find it interesting nonetheless.
I think language is also an important factor. I have a hard time deciding which of the two LLMs is worse at Swift, for example. They both seem equally great and awful in different ways.
I do the same (I have both review a piece of code), and Codex tend to produce more nitpicky feedback. Opus usually agrees with it on around half the feedback, but says that the other half is too nitpicky to implement. I generally agree with Opus' assessment, and do agree that Codex nitpicks a lot.
I can't even use Codex for planning because it goes down deep design rabbit holes, whereas Opus is great at staying at the proper, high level.
* Warehouse Management System - Barcode scanner, mobile app (Ionic), dashboard app.
* Vodafone Site Management - App for managing over 100 types of devices (with dynamically generated forms for each type), geolocation tools, 3D room editor (three.js), charts, and real-time data synchronization across tabs.
* SAP/Spartacus plugin for the Sony e-shops.
* correkt.com - e-commerce site with modern UI, SSR, zoneless architecture, Tailwind CSS 4, hydration, and various performance, SEO, and Core Web Vitals optimizations;
* surex.com (frontend) - insurance survey app with multiple complicated forms, rewritten from AngularJS to modern Angular.
* and 30+ other projects, from small startups and individuals to large enterprise companies.
I guess the point is that you're already doing it for postgres. You alrrady need persistent storage for your app, and the same engine can handle your queuing needs.
Exactly, if you’re already doing it for Postgres and Postgres can do the job well enough to meet your requirements, you’re only adding more cost and complexity by deploying Redis too.
Gave the same prompt to GPT 5.4 (high) and Opus 4.6 (high).
GPT 5.4 implemented the feature, refactored the code (was not asked to), removed comments that were not added in that session, made the code less readable, and introduced a bug. "Undo All".
Opus 4.6 correctly recognized that the feature is already implemented in the current code (yeah, lol) and proposed implementing tests and updating the docs.
Opus 4.6 is still the best coding agent.
So yeah, GPT 5.4 (high) didn't even check if the feature was already implemented.
Tried other tasks, tried "medium" reasoning - disappointment.
reply