There are actually quite a few studies out there that look at LLM code quality (...

scuff3d · 2026-03-17T03:23:22 1773717802

Code duplication is normally referring to duplicate code within the same code base, not writing something yourself instead of using a library.

keeda · 2026-03-17T19:38:24 1773776304

That's fair, but I suspect the underlying mechanism is the same -- the models prefer re-writing code from scratch rather than looking around for reusable abtsractions, which may exist just a few modules over, or -- for smaller models -- sometimes even in the same file. They're not copy-pasting the code for sure, just regenerating de novo.

This is the most common issue I find, even with the latest models. For normal logic it's not too bad, the real risk is when they start duplicating classes or other abstractions, because those tend to proliferate and cause a mess.

I don't know if it's the training or RL or something intrinsic to the attention mechanism, but these models "prefer" generating new code rather than looking around for and integrating reusable code, unless the functionality is significant or they are explicitly prompted otherwise.

I think this is why AGENTS.md files are getting so critical -- by becoming standing instructions, they help override the natural tendencies of the model.

scuff3d · 2026-03-17T19:45:21 1773776721

Yeah I agree that it's not copy/pasted the way a dev would, but I think the end result is the same. The more it needlessly duplicates code, the more brittle things will become. Changes will get harder and harder to implement as the number of sites that have to change increases.

On the other hand, I think driving down the need for external dependencies can be a net win. In my experience you usually need a very tiny slice of what a dependency actually offers, and often you settle for making design compromises to fit the dependency into your system, because the cost of writing it yourself is too high. LLMs definitely change that calculus.

I've found agent.md files are more a bandaid then anything. I've seen agents routinely ignore/forget them, and the larger the code base/number of changes they're making the more frequently they forget.