Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sigh. As others have commented, over and over again in the last 6 months we've seen discussions on HN with the same basic variation of "Claude Code [or whatever] is amazing" with a reply along the lines of "It doesn't work for me, it just creates a bunch of slop in my codebase."

I sympathize with both experiences and have had both. But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:

* what kind of codebase you were working on (language, tech stack, business domain, size, age, level of cleanliness, number of contributors)

* what exactly you were trying to do

* how much experience you have with the AI tool

* is your tool set up so it can get a feedback loop from changes, e.g. by running tests

* how much prompting did you give it; do you have CLAUDE.me files in your codebase

and so on.

As others pointed out, TFA also has the problem of not being specific about most of this.

We are still learning as an industry how to use these tools best. Yes, we know they work really well for some people and others have bad experiences. Let's try and move the discussion beyond that!



It's telling that you ask these details from a comment describing a negative experience, yet the top-most comment full of praises and hyperbole is accepted at face value. Let's either demand these things from both sides or from neither. Just because your experience matches one side, doesn't mean that experiences different from yours should require a higher degree of scrutiny.

I actually think it's more productive to just accept how people describe their experience, without demanding some extensive list of evidence to back it up. We don't do this for any other opinion, so why does it matter in this case?

> Let's try and move the discussion beyond that!

Sharing experiences using anecdotal evidence covers most of the discussion on forums. Maybe don't try to police it, and either engage with it, or move on.


>Let's either demand these things from both sides or from neither. Just because your experience matches one side, doesn't mean that experiences different from yours should require a higher degree of scrutiny.

Sort of.

The people that are happy with it and praising the avenues offered by LLM/AI solutions are creating codebases that fulfill their requirements, whatever those might be.

The people that seem to be unhappy with it tend to have the universal complaints of either "it produces garbage" , or "I'm slower with it.".

Maybe i'm showing my age here, but I remember these same exact discussions between people that either praised or disparaged search engines. The alternative being an internet Yellowpages (which was a thing for many years.)

The ones that praised it tended to be people who were taught or otherwise figured out how to use metadata tags like date:/onsite: , whereas the ones that disparaged it tended to be the folks who would search for things like "who won the game" and then proceed to click every scam/porno link on this green Earth and then blame Google/gdg/lycos/whatever when they were exposed to whatever they clicked.

in other words : proof is kind of in the pudding.

I wouldn't care about the compiler logs from a user that ignored all syntax and grammar rules of a language after picking it up last week, either -- but it's useful for successful devs to share their experiences both good and bad.

I care more about the opinions of those that know the rules of the game -- let the actual teams behind these software deal with the user testing and feedback from people that don't want to learn conventions.


> The people that are happy with it and praising the avenues offered by LLM/AI solutions are creating codebases that fulfill their requirements, whatever those might be.

Ah, but "whatever those might be" is the crucial bit.

I don't entirely disagree with what you're saying. There will always be a segment of power users who are able to leverage their knowledge about these tools to extract more value out of them than people who don't use them to their full potential. That is true for any tool, not just in software.

What you're ignoring are two other possibilities:

1. The expectation of users can be wildly different. Someone who has never programmed before, but can now create and ship a smartphone app, will see these tools as magical. Whatever issues they have will either go unnoticed, or won't matter considering the big picture. Surely their impression of AI tooling will be nothing short of positive. They might be experts at using LLMs, but not at programming.

OTOH, someone who has been programming for decades, and strives for a certain level of quality in their work, will find the experience much different. They will be able to see the flaws and limitations of these tools, and addressing them will take time and effort that they could've better spent elsewhere. As we've known since the introduction of LLMs, domain experts are the only ones who can experience these problems.

So the experience of both sides is valid, and should have equal weight in conversations. Unlike you, I do trust the opinion of domain experts over those of user experts, but that's a personal bias.

2. There are actual flaws and limitations in AI tooling. The assumption that all negative experiences are from users who are "holding it wrong", while all positive ones are from expert users, is wrong. It steers the conversation away from issues with the tech that should be discussed and addressed. And considering the industry is strongly propelled by hype and marketing right now, we need conversations grounded in reality to push back against it.


> The assumption that all negative experiences are from users who are "holding it wrong", while all positive ones are from expert users, is wrong.

I’m not sure about that. I feel like someone experienced would realize when using the LLM is a better idea than doing it themselves, and when they just need to do it by hand.

You might work in a situation where you have to do everything by hand, but then your response would be to the extent that you can see how it’s useful to other people.


> The ones that praised it tended to be people who were taught or otherwise figured out how to use metadata tags like date:/onsite: , whereas the ones that disparaged it tended to be the folks who would search for things like "who won the game" and then proceed to click every scam/porno link on this green Earth and then blame Google/gdg/lycos/whatever when they were exposed to whatever they clicked.

One big warning here: search engines only became really useful when you could search for "who won the game" and the search engine actually returned the correct thing as the top result.

We're more than a quarter of a century later and probably 99.99% of users don't know about Google's advanced search operators.

This should be a major warning for LLMs. People are people and will do people things.


I should have been clearer - I'd like to see this kind of information from positive comments as well. It's just as important. If someone is having success with Claude Code while vide-coding a toy app, I don't care. If they're having success with it on a large legacy codebase, I want them to write a blog post all about what they're doing, because that's extremely useful information.


I jumped the gun a bit in my comment, since you did mention you want to see this from both sides. So it was clear, and I apologize.

The thing is that I often read this kind of response only to comments with negative experiences, while positive ones are accepted as fact. You can see this reinforced in the comments here as well. A comment section is not the right place to expand on these details, but I agree that blog posts should have them, regardless of the experience type.


It’s telling that they didn’t specifically address it at the negative experience and you filled that in yourself


It was the comment they replied to. If it was a general critique of the state of discourse around agentic tools and Claude Code in particular why not make it a top level comment?


Oh, because I wanted to illustrate that the discourse is exemplified by the pair of the GP comment (vague and positive) and the parent comment (vague and negative). Therefore I replied to the negative parent comment.


>But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:

They did mention "(both positive and negative)", and I didn't take their comment to be one-sided towards the AI-negative comments only.


They're tools. To a fluent tool user, the negative anecdotes sound like,

"I prefer typewriters over word processors because it's easier to correct mistakes."

"I don't own any forks because knives are just better at cutting bread."

"Bidets make my pants wet, so I'll keep to toilet paper."

I think there's an urge to fix misinformation. Whereas if someone loves Excel and thinks Excel is better than Java at making apps, I have no urge to correct that. Maybe they know something about Excel that I don't.


The framing has been rather problematic. I find these differences in premises are lurking below the conversations:

- Some believe LLMs will be a winner-take-all market and reinforce divergences in economic and political power.

- Some believe LLMs have no path of evolution and have therefore already plateaued and too low to be sustainable with these investments in compute, which would imply it's a flash in the pan that will collapse.

- Some believe LLMs will all be hosted forever, always living in remote services because the hardware requirements will always be massive.

- Some believe LLMs will create new, worse kinds of harm without enough offsetting creation of new kinds of defense.

- Some believe LLMs and AI will only ever give low-skilled people mid-skill results and therefore work against high-skill people by diluting mid-end value without creating new high-end value for them.

We need to be more aware of how we are framing this conversation because not everyone agrees on these big premises. It very strongly affects the views that depend on them. When we don't talk about these points and just judge and reply based on whether the conclusion reinforces our premises, the conversation becomes more political.

Confirmation bias is a thing. Individual interests are a thing. Some of the outcomes, like regulation and job disruption, depend on what we generally believe. People know this and so begin replying and voting according to their interests, to convince others to aid their cause without respect for the truth. This can be counter-productive to the individual if they are wrong about the premises and end up pushing an agenda that doesn't even actually benefit them.

We can't tell people not to advance their chosen horse at every turn of a conversation, but those of us who actually care about the truth of the conversation can take some time to consider the foundations of the argument and remind ourselves to explore that and bring it to the surface.


Fair point.

For context, I was using Claude Code on a Ruby + Typescript large open source codebase. 50M+ tokens. They had specs and e2e tests so yeah I did have feedback when I was done with a feature - I could run specs and Claude Code could form a loop. I would usually advise it to fix specs one by one. --fail-fast to find errors fast.

Prior to Claude Code, I have been using Cursor for an year or so.

Sonnet is particularly good at NextJS and Typescript stuff. I also ran this on a medium sized Python codebase and some ML related work too (ranging from langchain to Pytorch lol)

I don't do a lot of prompting, just enough to describe my problem clearly. I try my best to identify the relevant context or direct the model to find it fast.

I made new claude.md files.


I spend a fair amount of time tinkering in Home Assistant. My experience with that platform and LLM's can be summed up as "this is amazing".

I also do a fair amount of data shuffling with Golang. My LLM experience there is "mixed".

Then I deal with quite a few "fringe" code bases and problem spaces. There LLM's fall flat past the stuff that is boiler plate.

"I work in construction and use a hammer" could mean framer, roofer or smashing out concrete with a sledge. I suspect that "I am a developer, I write code" plays out in much the same way, and those details dictate experience.

Just based on the volume of ruby and typescript, and the overlap of the output of these platforms your experience is going to be pretty good. I would be curious if you went and did something less mainstream, and in a less common language (say Zig) if you would have the same feelings and feedback that you do now. Based on my own experience I suspect you would not.


Speaking of that observation about "fringe": this will probably, increasingly, be a factor, let's call it LLMO (optimization), where "LLM friendly" content will be pushed. So I expect secondary or fringe programming languages to become even more pushed aside, since LLMs will not be as useful.

Which is, obviously, sad. Especially since the big winner is Javascript, a language that's still subpar as far as programming languages go.


Here's a few general observations.

Your LLM (CC) doesn't have your whole codebase in context, so it can run off and make changes without considering that some remote area of the codebase are (subtly?) depending on the part that claude just changed. This can be mitigated to some degree depending on the language and tests in place.

The LLM (CC) might identify a bug in the codebase, fix it, and then figure, "Well, my work here is done." and just leave it as is without considering ramifications or that the same sort of bug might be found elsewhere.

I could go on, but my point is to simply validate the issues people will be having, while also acknowledging those seeing the value of an LLM like CC. It does provides useful work (e.g. large tedious refactors, prototyping, tracking down a variety of bugs, and so on...).


Right, which is why having a comprehensive test suite is such an enormous unlock for this class of technology.

If your tests are good, Claude Code can run them and use them to check it hasn't broken any distant existing behavior.


Not always the case. It’ll just go and “fix” the tests to pass instead of fixing the core issue.


That used to happen a whole lot more. Recent Claudes (3.7, 4) are less likely to do that in my experience.

If they DO do that, it's on us to tell them to undo that and fix things properly.


This is why you keep CLAUDE.md updated, there it’ll write down what is where and other relevant info about the project.

Then it doesn’t need to feel (or rg) through the whole codebase.

You also use plan mode to figure out the issue, write the implementation plan in a .md file. Clear context, enter act mode and tell it to follow the plan.


Can probably give access to tools like ast-grep to Claude. Will help it see all references. I still agree some dynamic references might still be left. Only way is to prompt well enough. Since I tested this on a Ruby on Rails codebase, I dealt with this.


Agree. It keeps getting closer to "I've had a negative experience with the internet ..."


I'm not convinced that "we know they work really well for some people." So far I just see people really excited about the potential and really impressed at what it's capable of, but I think people are extrapolating poorly. It's like, yes it's impressive that it can make a video game with a few prompts, but that doesn't mean that with a few more prompts it'll turn into a AAA game.

I'm on board with some limited AI autocompletion, but so far agents just seem like gimmicks to me.


If we handwave that the popular game Wordle, which made a lot of money for its author, could have been vibecoded, at what point does the gimmick become an actual feature that people look and pay for?


No shade at wordle, but what you're describing sounds like it would be useful for the shovelware industry and that's about it. Not exactly a great leap forward for humanity...

Although I should be fair, this can help with one-off scripts that research folks usually do, when you just need to plot some data or do some back-of-the-terminal math. That said I don't think this would be a game changer, more of an efficiency boost and a limited one at that.


What would a great leap forwards for humanity look like? Sure, making it easier to shovel out shovelware means more shovelware, but why is that a bad thing? If customers have a very specific problem that wasn't going to get solved because it was too expensive to build a custom solution, and they now get to have bespoke software to cure their ills, other than being judgemental about this hypothetical piece of software as being shovelware, why is that a bad thing?


Here's one version of what a great leap forward could look like, but it's simply one of many: an LLM that understand the CPU it's running on and can turn prompts into assembly, taking full advantage of the hardware. Or maybe it could target a virtual CPU like Java, but the point is that if the LLM can write code, why do it in Python or C? Just let it understand the CPU and let it rip. The only reason we have C/Python/etc. in the first place is because assembly sucks for humans to work with.

As to the shovelware, if it benefits people that's great, and I think the net benefit will likely be positive, but only slightly. The point in calling it shovelware is to suggest that it's low quality, and so it could have bugs and other performance issues that add costs to using which subtract from the benefit it provides (possibly in a net positive way, but probably not as fundamentally game changing as, say, Docker).


Seconded, that a summary description of your problem, codebase, programming dialect in use, should be included whenever a “<Model> didn’t work for me” response.


I find it telling that I have (mostly) good experiences with the GPT family and (mostly) bad experiences with the Claude family.

I just wish I could figure out what it tells. Their training data can't be that different. The problems I'm feeding them are the same. Many people think Claude is the more capable of the two.

It has to be how I'm presenting the problems, right? What other variable is there?


If you have been using GPT for awhile it simply may know more about you.


> But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least ...

I use Claude many times a day, I ask it and Gemini to generate code most days. Yet I fall into the "I've never included a line of code generated by an LLM in committed code" category. I haven't got a precise answer for why that is so. All I can come up with is the code generated lacks the depth of insight needed to write a succinct, fast, clear solution to the problem someone can easily understand in in 2 years time.

Perhaps the best illustration of this is someone proudly proclaimed to be they committed 25k lines in a week, with the help of AI. In my world, this sounds like they are claiming they have a way of turning the sea into ginger beer. Gaining the depth of knowledge required to change 25k lines of well written code would take me more than a week of reading. Writing that much in a week is a fantasy. So I asked them to show me the diff.

To my surprise, a quick scan of the diff revealed what the change did. It took me about 15 minutes to understand most of it. That's the good news.

The bad news it that 25k lines added 6 fields to a database. 2/3's were unit tests, perhaps 2/3's of the remainder was comments (maybe more). The comments were glorious in their length and precision, littered with ASCII art tables showing many rows in the table.

Comments in particular are a delicate art. They are rarely maintained, so they can bit rot in downright misleading babble after a few changes. But the insight they provide into what author was thinking, and in particular the invariants he had in mind can save hours of divining it from the code. Ideally they concisely explain only the obscure bits you can't easily see from the code itself. Anything more becomes technical debt.

Quoting Woodrow Wilson on the amount of time he spent preparing speeches:

    “That depends on the length of the speech,” answered the President. “If it is a ten-minute speech it takes me all of two weeks to prepare it; if it is a half-hour speech it takes me a week; if I can talk as long as I want to it requires no preparation at all. I am ready now.”
Which is a round about way of saying I suspect the usefulness of LLM generated code depends more on how often a human is likely to read it, than of any of the things you listed. If it is write once, and the requirement is it works for most people in the common cases, LLM generated code is probably the way to go.

I used PayPal's KYC web interface the other day. It looked beautiful, completely inline with the rest of PayPal's styling. But sadly I could not complete it because of bugs. The server refused to accept one page, it just returned to the same page with no error messages. No biggie, I phoned support (several times, because they also could not get past the same bug), and after 4 hours on the phone the job was done. I'm sure the bug will be fixed a new contractor. He spend an few hours on it, getting an LLM to write a new version, throwing the old code away, just as his predecessor did. He will say the LLM provided a huge productivity boost, and PayPal will be happy because he cost them so little. It will be the ideal application for an LLM - got the job done quickly, and no one will read the code again.

I later discovered there was a link on the page that allowed me to skip past the problematic page, so I could at least enter the rest of the information. It was in a thing that looked confusingly like a "menu bar" on the left, although there was no visual hit any of the items in the menu were clickable. I clicked on most of them anyway, but they did nothing. While on hold for phone support, I started reading the HTML and found one was a link. It was a bit embarrassing to admit to the help person I hadn't clicked that one. It sped the process up somewhat. As I said, the page did look very nice to the eye, probably partially because of the lack of clutter created by visual hints on what was clickable.

[0] https://quoteinvestigator.com/2012/04/28/shorter-letter/


There are some tasks that it can fail and not, but a lot of "Claude Code [or whatever] is amazing" with a reply along the lines of "It doesn't work for me, it just creates a bunch of slop in my codebase." IMO is "i know how to use it" vs "I don't know how to use it" with a side of "I have good test coverage" vs "tests?"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: