Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I must have been a little too ambitious with my first test with Claude Code.

I asked it to refactor a medium-sized Python project to remove duplicated code by using a dependency injection mechanism. That refactor is not really straightforward as it involves multiple files and it should be possible to use different files with different dependencies.

Anyway, I explain the problem in a few lines and ask for a plan of what to do.

At first I was extremely impressed, it automatically used commands to read the files and gave me a plan of what to do. It seemed it perfectly understood the issue and even proposed some other changes which seemed like a great idea.

So I just asked him to proceed and make the changes and it started to create folders and new files, edit files, and even run some tests.

I was dumbfounded, it seemed incredible. I did not expect it to work with the first try as I had already some experience with AI making mistakes but it seemed like magic.

Then once it was done, the tests (which covered 100% of the code) were not working anymore.

No problem, I isolate a few tests failing and ask Claude Code to fix it and it does.

Now for a few times I found some failing tests and ask him to fix it, slowly trying to fix the mess until there is a test which had a small problem: it succeeded (with pytest) but froze at the end of the test.

I ask again Claude Code to fix it and it tries to add code to solve the issue, but nothing works now. Each time it adds some bullshit code and each time it fails, adding more and more code to try to fix and understand the issue.

Finally after $7,5 spent and 2000+ lines of code changed it's not working, and I don't know why as I did not make the changes.

As you know it's easier to write code than to read code so at end I decided to scrape everything and do all the changes myself little by little, checking that the tests keep succeeding as I go along. I did follow some of the recommended changes it proposed tough.

Next time I'll start with something easier.



Really yoy nearly got the correct approach there.

I generally follow the same approach these days, ask it to develop a plan then execute but importantly have it excute each step in as small increments as possible and do a proper code review for each step. Ask if for changes you want it to make.

There is certainly times I need to do it myself but definitely this has improved some level of productivity for me.

It's just pretty tedious so I generally write a lot of "fun" code myself, and almost always do the POC myself then have the AI do the "boring" stuff that I know how to do but really don't want to do.

Same with docs, the modern reasoning models are very good at docs and when guided to a decent style can really produce good copy. Honestly R1/4o are the first AI I would actually concider pulling into my workflow since they make less mistakes and actually help more than they harm. They still need to be babysit though as you noticed with Claude.


> ...do all the changes myself little by little, checking that the tests keep succeeding as I go along.

Or... you can do that with the robots instead?

I tried that with the last generation of Claude, only adding new functionality when the previously added functionality was complete, and it did a very good job. Well, Claude for writing the code and Deepseek-R1 for debugging.

Then I tried a more involved project with apparently too many moving parts for the stupid robots to keep track of and they failed miserably. Mostly Claude failed since that's where the code was being produced, can't really say if Deepseek would've fared any better because the usage limits didn't let me experiment as much.

Now that I have an idea of their limitations and had them successfully shave a couple yaks I feel pretty confident to get them working on a project which I've been wanting to do for a while.


I'm curious for the follow up post from Yegge, because this post is worthless without one. Great, Claude Code seems to be churning out bug fixes. Let's see if it actually passes tests, deploys, and works as expected in production for a few days if not weeks before we celebrate.


He posts a few times a year at https://sourcegraph.com/blog


I'm wondering if you can prompt it to work like this - make minimal changes, and run the tests at each step to make sure the code is still working


This thing can "fix" tests, not code. It just adjusts tests to incorrect code. So you need to keep an eye on the test code as well. That sounds crazy, of course. You have to constantly keep in mind that LLM doesn't understand what it is doing.


git commit after each change it makes. It will eventually get itself into a mess. Revert to the last good state and tell it to try a different approach. Squash your commits at the end




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: