More

EugeneOZ · 2026-03-08T17:56:48 1772992608

Not in my experience. Quoting my tweet:

Gave the same prompt to GPT 5.4 (high) and Opus 4.6 (high).

GPT 5.4 implemented the feature, refactored the code (was not asked to), removed comments that were not added in that session, made the code less readable, and introduced a bug. "Undo All".

Opus 4.6 correctly recognized that the feature is already implemented in the current code (yeah, lol) and proposed implementing tests and updating the docs.

Opus 4.6 is still the best coding agent.

So yeah, GPT 5.4 (high) didn't even check if the feature was already implemented.

Tried other tasks, tried "medium" reasoning - disappointment.

hirvi74 · 2026-03-08T18:36:00 1772994960

I make ChatGPT and Claude code review each other's outputs. ChatGPT thinks its solutions are better than what Claude produces. What was more surprising to me is that Claude, more often than not, prefers ChatGPT's responses too.

I am to sure one can really extrapolate much out of that, but I do find it interesting nonetheless.

I think language is also an important factor. I have a hard time deciding which of the two LLMs is worse at Swift, for example. They both seem equally great and awful in different ways.

stavros · 2026-03-08T23:58:59 1773014339

I do the same (I have both review a piece of code), and Codex tend to produce more nitpicky feedback. Opus usually agrees with it on around half the feedback, but says that the other half is too nitpicky to implement. I generally agree with Opus' assessment, and do agree that Codex nitpicks a lot.

I can't even use Codex for planning because it goes down deep design rabbit holes, whereas Opus is great at staying at the proper, high level.

frde · 2026-03-08T22:48:55 1773010135

Is this sample size of one task, or a consistent finding across many tasks?

EugeneOZ · 2026-03-05T09:08:34 1772701714

I do, 100%, every line.

EugeneOZ · 2026-03-03T06:22:33 1772518953

Location: Spain

Remote: Yes, only remote

Willing to relocate: No

Résumé/CV: https://www.linkedin.com/in/newmanoz/ , https://jamm.dev/resume.pdf

Email: normandiggs@gmail.com , oz@jamm.dev

I have more than 20 years of experience in webdev, here is my blog: https://medium.com/@eugeniyoz

I actively use coding agents and carefully review every generated line.

Main technologies: Angular, Rust, TypeScript, MySQL, PostgreSQL.

Types of projects I've worked on:

* CRM - Tickets, chat, dozens of dynamically modifiable reports, charts, editable tables, Rust REST API.

* Warehouse Management System - Barcode scanner, mobile app (Ionic), dashboard app.

* Vodafone Site Management - App for managing over 100 types of devices (with dynamically generated forms for each type), geolocation tools, 3D room editor (three.js), charts, and real-time data synchronization across tabs.

* SAP/Spartacus plugin for the Sony e-shops.

* correkt.com - e-commerce site with modern UI, SSR, zoneless architecture, Tailwind CSS 4, hydration, and various performance, SEO, and Core Web Vitals optimizations;

* surex.com (frontend) - insurance survey app with multiple complicated forms, rewritten from AngularJS to modern Angular.

* and 30+ other projects, from small startups and individuals to large enterprise companies.

EugeneOZ · 2026-02-24T06:19:03 1771913943

It depends on how much value their talents can bring to humankind, I guess.

fuzzfactor · 2026-02-24T11:26:01 1771932361

Very good guess, right on the money

Too bad humankind is almost never paying attention.

EugeneOZ · 2026-02-22T21:27:07 1771795627

Just checked - not blocked, works just fine (Adamo and Vodafone).

LtdJorge · 2026-02-23T01:29:22 1771810162

Adamo never blocks, at least for me. Vodafone does.

EugeneOZ · 2026-02-20T08:28:36 1771576116

Still not usable in production, not even near. But I'm happy to see any progress in this area.

EugeneOZ · 2026-02-20T08:24:19 1771575859

You still need a human (working at human speed) to review every generated line, if it’s not a throwaway app or some demo to impress investors.

EugeneOZ · 2026-02-13T21:32:09 1771018329

This isn't just you

EugeneOZ · 2026-01-14T12:37:30 1768394250

Chapter "The True Cost of Redis" surprised me.

> Deploy, version, patch, and monitor the server software

And with PostgreSQL you don't need it?

> Configure a persistence strategy. Do you choose RDB snapshots, AOF logs, or both?

It's a one-time decision. You don't need to do it daily.

> Sustain network connectivity, including firewall rules, between Rails and Redis

And for a PostgreSQL DB you don't need it?

> Authenticate your Redis clients

And your PostgreSQL works without that?

> Build and care for a high availability (HA) Redis cluster

If you want a cluster of PostgreSQL databases, perhaps you will do that too.

downsplat · 2026-01-14T12:47:21 1768394841

I guess the point is that you're already doing it for postgres. You alrrady need persistent storage for your app, and the same engine can handle your queuing needs.

heartbreak · 2026-01-14T12:56:41 1768395401

Exactly, if you’re already doing it for Postgres and Postgres can do the job well enough to meet your requirements, you’re only adding more cost and complexity by deploying Redis too.

EugeneOZ · 2026-01-10T10:36:29 1768041389

People like Einstein would find a solution.