I feel like everyone is applying a worse-case narrative to what's going on here..
I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.
This is a test. You can't improve a system without testing it on real world conditions.
How do we know they're not tweaking the Copilot system prompts and settings behind the scenes while they're doing this work?
Can no one see the possibility that what is happening in those PRs is exactly what all the people involved expected to have happen, and they're just going through the process of seeing what happens when you try to refine and coach the system to either success or failure?
When we adopted AI coding assist tools internally over a year ago we did almost exactly this (not directly in GitHub though).
We asked a bunch of senior engineers to see how far they could get by coaching the AI to write code rather than writing it themselves. We wanted to calibrate our expectations and better understand the limits, strengths and weaknesses of these new tools we wanted to adopt.
In most of those early cases we ended up with worse code than if it had been written by humans, but we learned a ton. We can also clearly see how much better things have gotten over time, since we have that benchmark to look back on.
I think people would be more likely to adopt this view if the overall narrative about AI is that it’s a work in progress and we expect it to get magnitudes better. But the narrative is that AI is already replacing human software engineers.
That's a weird comment. I do think for myself. I wasn't even talking about my own personal thoughts on the matter. I can just plainly see that the overwhelming narrative in the public zeitgeist is that AI can do jobs that humans can do. And it's not true.
why does every engineer keep talking about it like it’s more than marketing hype? why do you actually accept this is a real narrative real people believe? have you talked to the executives implementing these strategies?
redbull does not give you wings. it’s disconcerting to see the lack of nuance in these discussions around these new tools (and yeah sorry this isn’t really aimed at you, but the zeitgeist, apologies)
Because this “marketing hype” is affecting the way we do our job.
Some of us are being laid off due to the hype; some are assigned to babysit the AI; and some are simply looked down on by higher ups who are eagerly waiting for a day to lay us all off.
You can convince yourself as much as you want that it’s “just a hype”, but regardless of your beliefs are, it has REAL world consequences.
engineers are testing promising new technology. a mob (of probably half or more bots) is having a [redacted] perpetuating the anti-narrative they huffed themselves up into believing. and now we’re in a meta-[redacted] as if either A) redditors and armchair engineers here have valid opinions on this tech and B) marketers and founders with massive incentives to overpromise are telling a true narrative
why? we don’t have to do it. we could actually look at these topics with nuance and not react like literal bots to everything
(sorry I’m just losing my faith in humanity and taking it out in this thread)
> why do you actually accept this is a real narrative real people believe?
Because we're literally seeing people being laid off with narratives about being replaced with AI (At a whole slew of companies). Because we're seeing company policies around hiring being changed to require hiring managers to provide exhaustive justifications why the work couldn't be handled by an AI (at e.g. Shopify, Salesforce and so on)
> have you talked to the executives implementing these strategies?
I have had a few conversations, yes. Have you? They're weirdly "true believers" that are buying the marketing hype hook line and sinker. They're doing small coding exercises themselves in these tools, seeing that they as an executive can manage to get valid code for the small exercise out the other side of it, and assuming that that means it can replace head count. Either deliberately or naively failing to understand that there is a world of difference between leet code style exercises, or quick small changes to code bases, and actual software development.
The weirdest conversation recently, which thankfully I got to just be on the periphery of, involved an engineering org that decided to try to replace the post-incident process with one entirely written by LLMs. It would take timelines from a ticket, and a small prompt to write up the entire post-incident report, tasks etc.
The whole project showed a gross misunderstanding of the point of post-incident stuff, eradicating "introspection" and "learning from your mistakes", turning it into a check box exercise for teams. Even their narrative around what they were doing was hilarious, because it came down to "Get the post-incident report out of the way so we can concentrate on the real work".
> Either deliberately or naively failing to understand that there is a world of difference between leet code style exercises, or quick small changes to code bases, and actual software development.
Given how often leet code questions are used in the interview process across the entire industry I think it’s a fair assumption that they fail to understand this.
>> I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.
>> This is a test. You can't improve a system without testing it on real world conditions.
Software developers know to fix build problems before asking for a review. The AIs are submitting PRs in bad faith because they don't know any better. Compilers and other build tools produce errors when they fail, and the AI is ignoring this first line of feedback.
It is not a maintainers job to review code for syntax errors, or use of APIs that don't actually exist, or other silly mistakes. That's the compilers job and it does it well. The AI needs to take that feedback and fix the issues before escalating to humans.
I was looking for exactly this comment. Everybody's gloating, "Wow look how dumb AI is! Haha, schadenfreude!" but this seems like just a natural part of the evolution process to me.
It's going to look stupid... until the point it doesn't. And my money's on, "This will eventually be a solved problem."
The question though is what is the time horizon of “eventually”. Very different decisions should be made if it’s 1 year, 2 years, 4 years, 8 years etc. To me it seems as if everyone is making decisions which are only reasonable if the time horizon is 1 year. Maybe they are correct and we’re on the cusp. Maybe they aren’t.
Good decision making would weigh the odds of 1 vs 8 vs 16 years. This isn’t good decision making.
Or _never_, honestly. Sometimes things just don't work out. See various 3d optical memory techs, which were constantly about to take over the world but never _quite_ made it to being actually useful, say.
Sometimes the last 10% takes 90% of the time. It'll be interesting to see how this pans out, and whether it will eventually get to something that could be considered a solved problem.
I'm not so sure they'll get there. If the solved problem is defined as a sub-standard but low cost, then I wouldn't bet against that. A solution better than that though, I don't think I'd put my money on that.
People seem like they’re gloating as the message received in this period of the hype cycle is that AI is as good as a junior dev without caveats and it in no way is suppose to be stupid.
This is the exact reason AI sucks : there is no proper feedback loop.
EVERY single prompt should have the opportunity to get copied off into a permanent log where the end user triggers it : log all input, all output, human writes a summary of what he wanted to happen but did not, what he thinks might have went wrong, what he thinks should have happened (domain specific experts giving feedback about how things are fucking up) And then its still only useful with long term tracking like how someone actually made a training change to fix this exact failure scenario.
None of that exists, so just like "full self driving" was a pie in the sky bullshit dream that proved machine learning has an 80/20 never gonna fully work problem, same thing here
I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.
This is a test. You can't improve a system without testing it on real world conditions.
How do we know they're not tweaking the Copilot system prompts and settings behind the scenes while they're doing this work?
Can no one see the possibility that what is happening in those PRs is exactly what all the people involved expected to have happen, and they're just going through the process of seeing what happens when you try to refine and coach the system to either success or failure?
When we adopted AI coding assist tools internally over a year ago we did almost exactly this (not directly in GitHub though).
We asked a bunch of senior engineers to see how far they could get by coaching the AI to write code rather than writing it themselves. We wanted to calibrate our expectations and better understand the limits, strengths and weaknesses of these new tools we wanted to adopt.
In most of those early cases we ended up with worse code than if it had been written by humans, but we learned a ton. We can also clearly see how much better things have gotten over time, since we have that benchmark to look back on.