I'm curious, what are the main benefits a seasoned developer working with AI has over a younger developer? Why can't young developers use AI just as effectively?
The job of AI is to do what we tell it to do. It can't "create a spec" on its own. If it did and then implemented that spec, it wouldn't accomplish what we want it to accomplish. Therefore we the humans must come up with that spec. And when you talk about a software application, the totality of its spec written out, can be very complex, very complicated. To write and understand, and evolve and fix such a spec takes engineers, or what used to be called "system analysts".
To repeat: To specify what a "system" we want to create does is a highly complicated task, which can only be dones by human engineers who understand the requirements for the system, and how parts of those requirements/specs interact with other parts of the spec, what are the consequences of one (part of the) spec to other parts of it. We must not writ e"impossible specs" like draw me a round square. Maybe the AI can check whether the spec is impossible or not, but I'm not so sure of that.
So I expect that software engineers will still be in high demand, but they will be much more productive with AI than without it. This means there will be much more software because it will be cheaper to produce. And the quality of the software will be higher in terms of doing what humans need it to do. Usability. Correctness. Evolvability. In a sense the natural language-spec we give the AI is really something written in a very high-level programming-language - the language of engineers.
BTW. As I write this I realize there is no spell-checker integrated into Hacker News. (Or is there?). Why? Because it takes developers to specify and implement such a system - which must be integrated into the current HN implementation. If AI can do that for HN, it can be done, because it will be cheap enough to do it -- if HN can exactly spell out what kind of system it wants. So we do need more software, better software, cheaper software, and AI will helps us do that.
A 2nd factor is that we don't really know if a spec is "correct" until we test the implemented system with real users. At that point we typically find many problems with the spec. So somebody must fix the problems with the spec, evolve the spec and rinse and repeat the testing with real users -- the developers who understand the current spec and why it is is not good enough.
AI can write my personal scripts for me surely. But writing a spec for a system to be used by thousands of humans, still takes a lot of (human) work. The spec must work for ALL users. That makes it complicated and difficult to get right.
Another phrase that comes to mind is "Plausible Deniability": By uttering ambiguous sentences you can deny all but one possible meanings of what you say. And talking to different audiences at different times you can claim you didn't mean anything like what your citics are claiming you did.
But I like the idea there is a term for this, be it Straussian Memes or something else. What I didn't quite get is how "self-stabilizing" works?
What I'd like is for TV-anchors to get wise and start asking their interviewees "What EXACTLY do you mean when you use this term ...". But I guess they won't because they too are happy to spread a meme which multiple different communities can like because they understand it in the way they like.
> Another phrase that comes to mind is "Plausible Deniability": By uttering ambiguous sentences you can deny all but one possible meanings of what you say. And talking to different audiences at different times you can claim you didn't mean anything like what your citics are claiming you did.
This is the core rhetorical tactic of the progressive left in a nutshell. Linguistic superposition, equivocation, Schrodinger's definition - whatever you want to call it, it's the ability to have your cake and eat it too by simply changing your definitions, or even someone else's, post hoc.
Let us take a moment to be reminded of the English Socialism of Orwell and doublespeak.
> the core rhetorical tactic of the progressive left in a nutshell
I live in Wyoming and have MAGA and ultra-progressive friends.
Multiple messaging is a hallmark of all elites. Sometimes it’s functional: being able to say something sharp that if repeated is ambiguous is a skill. Anyone who has any power or authority wields it. It is so common to suggest requirement. (Other times, multiple messaging lets one apologise in a public setting without making things awkward.)
In many respects, it’s an essential feature of commanding language. Compressing multiple meanings into fewer words is the essence of poetry and literature.
> In many respects, it’s an essential feature of commanding language. Compressing multiple meanings into fewer words is the essence of poetry and literature.
Aye, perhaps prompting is the be-all-end-all skill, after all: the ability to distill out an idea into its most concentrated, compressed essence, so it can be diluted, expanded, and reworded ad infinitum by the LLMs.
brb while I search for the word prompt that generated the universe...
> the ability to distill out an idea into its most concentrated, compressed essence, so it can be diluted, expanded, and reworded ad infinitum by the LLMs
Nobody said people haven’t rendered themselves unable to understand poetry or literature through the ages. Nor that these skills haven’t had a distinct class mark to them.
Same here. Someone who relies on LLMs to speak and read will not be able to compete in a live environment. (Someone who uses them as a tool may gain an advantage. But that’s predicated on having the base skill.)
That's an interesting image, having a mobile connections to AI and having it tell me what I should say in any interactive situation. But, I don't think you would get much respect from other people if that is all we do. Gaining the respect of others I believe is the way to succeed in life.
Further much anybody could repeat that, make AI responsible for all their speech, and even actions. But less we use our own brains, the less we learn, and thus cannot gain a competititve advantage over other AI-users. The most rewarded original thoughts and ideas probabaly need to come from outside of AI since AI is trained on people's original text outputs.
"Core rhetorical tactic of the progressive left". Or the conservative right, depending on which side of this divide one happens to stand on. And speaking of Orwell, he was pointing out the doublespeak of the Fascists, not the socialists.
Fascists are the ones who want to manipulate other people to their Fuhrer's will. To do that they must manipulate language. Whereas "socialists" are about the common good, which can only happen through peaceful co-existence, which can only happen though democracy.
Depends of course on which definition of "socialism" you use. Didn't Hitler call his movement socialism as well? But I always associated "socialism" with "being social", which means taking into account other people's benefit as well, instead of trying to overpower them with propaganda and double-speak (and of course, violence).
If the goal is unlimited power to your party, to your leader, it would only make sense to lie to people as much as you can, to mislead them. To double-speak to them. If your goal is peaceful co-existence, then not so much.
And where there's smoke there is fire. Where there's Double-Speak, fascism is not far away.
Ironically Double-Speak succeeds because people are social beings, they really WANT to agree with others.
"Illegal alien" is one of the greatest accomplishments of language engineering and was unambiguously successful.
When the left tries this today it results in equal and opposite backlash and has no effect in terms of policy, winning elections, and that sort of stuff, but it certainly can be a motor that keeps online bubbles bubbling.
I think there is no equivocation or ambiguity here, unless you are me at age 5 asking why aliens have landed in Mexico.
I would hazard that you are underestimating the impact of these rhetorical tactics, but I've not the energy to aggressively litigate and cite this point further.
The effectiveness of these tactics is incredible, it helps people who build an identity around marginalization to always feel marginalized. If they ever won anything it would threaten their whole reason for existence.
Again, I think this is likely seen differently depending on which side of the political spectrum one stands, and what sources of information one attunes to. I agree that both 'racism' and 'gender' have become flash-points for discord, and that one can point to the left as trying to change the definitions. But I can think of other words that the right is equally guilty of attempting to re-define. For example, 'woke' was a term originally rooted in African American communities meaning awareness of systemic injustice, but is now used by the right as pejorative for anything they disagree with. (Including the existence of systemic injustice, sigh.)
The way Finns do it they take a hot Sauna in the winter and when they get out of the hot-room they go lay naked in the snow. Or plunge into the frozen lake through a sawed-out hole. Then they go back to the sauna again to feel the warmth again. It does feel great and stops you from dwelling in miserable thoughts. That may be part of the reason why Finns are ranked happiest people for multiple years.
I agree Finns don't usually look very happy. I think their suicide rates are quite high. They drink too much coffee and alcohol. But they do have a great democracy, society, education, healthcare, wealth equality, gender equality. So maybe these studies about happiness instead simply measure "Which people SHOULD be happiest in the world".
I have to mention the 300 club at Amundsen-Scott station. When the temp outside hits -100 (F) they crank up their sauna to 200, then run from the sauna to outside in their underwear.
What I'm struggling with is, when you ask AI to do something, its answer is always undeterministically different, more or less.
If I start out with a "spec" that tells AI what I want, it can create working software for me. Seems great. But let's say some weeks, or months or even years later I realize I need to change my spec a bit. I would like to give the new spec to the AI and have it produce an improved version of "my" software. But there seems to be no way to then evaluate how (much, where, how) the solution has changed/improved because of the changed/improved spec. Becauze AI's outputs are undeterministic, the new solution might be totally different from the previous one. So AI would not seem to support "iterative development" in this sense does it?
My question then really is, why can't there be an LLM that would always give the exact same output for the exact same input? I could then still explore multiple answers by changing my input incrementally. It just seems to me that a small change in inputs/specs should only produce a small change in outputs. Does any current LLM support this way of working?
This is absolutely possible but likely not desirable for a large enough population of customers such that current LLM inference providers don't offer it. You can get closer by lowering a variable, temperature. This is typically a floating point number 0-1 or 0-2. The lower this number, the less noise in responses, but a 0 still does not result in identical responses due to other variability.
In response to the idea of iterative development, it is still possible, actually! You run something more akin to integration tests and measure the output against either deterministic processes or have an LLM judge it's own output. These are called evals and in my experience are a pretty hard requirement to trusting deployed AI.
So, you would perhaps ask AI to write a set of unit-tests, and then to create the implementation, then ask the AI to evaluate that implementation against the unit-tests it wrote. Right? But then again the unit-tests now, might be completetly different from the previous unit-tests? Right?
Or would it help if a different LLM wrote the unit-tests than the one writing the implementation? Or, should the unit-tests perhaps be in an .md file?
I also have a question about using .md files with AI: Why .md, why not .txt?
Not quite unit tests. Evals should be created by humans, as they are measuring quality of the solution.
Let's take the example of the GitHub pr slack bot from the blog post. I would expect 2-3 evals out of that.
Starting at the core, the first eval could be that, given a list of slack messages, it correctly identifies the PRs and calls the correct tool to look up the status of said PR. None of this has to be real and the tool doesn't have to be called, but we can write a test, much like a unit test, that confirms that the AI is responding correctly in that instance.
Next, we can setup another scenario for the AI using effectively mocked history that shows what happens when the AI finds slack messages with open PRs, slack messages with merged PRs and no PR links and determine again, does the AI try to add the correct reaction given our expectations.
These are both deterministic or code-based evals that you could use to iterate on your solutions.
The use for an LLM-as-a-Judge eval is more nuanced and usually there to measure subjective results. Things like: did the LLM make assumptions not present in the context window (hallucinate) or did it respond with something completely out of context? These should be simple yes or no questions that would be easy for a human but hard to code up a deterministic test case.
Once you have your evals defined, you can begin running these with some regularity and you're to a point where you can iterate on your prompts with a higher level of confidence than vibes
Edit: I did want to share that if you can make something deterministic, you probably should. The slack PR example is something that id just make a simple script that runs on a cron schedule, but it was easy to pull on as an example.
1) How many bits and bobs of like, GPLed or proprietary code are finding their way into the LLM's output? Without careful training, this is impossible to eliminate, just like you can't prevent insect parts from finding their way into grain processing.
2) Proompt injection is a doddle to implement—malicious HTML, PDF, and JPEG with "ignore all previous instructions" type input can pop many current models. It's also very difficult to defend against. With agents running higgledy-piggledy on people's dev stations (container discipline is NOT being practiced at many shops), who knows what kind of IDs and credentials are being lifted?
Nice analogue, insect-parts. I thhink that is the elephant in the room. I read Microsoft said something like 30% of their code-output has AI generated code. Do they know what was the training set for the AI they use? Should they be transparent about that? Or, if/since it is legal to do your AI training "in the dark" does that solve the problem for them, they can not be responsible for the outputs of the AI they use?
> why can't there be an LLM that would always give the exact same output for the exact same input
LLMs are inherently deterministic, but LLM providers add randomness through “temperature” and random seeds.
Without the random seed and variable randomness (temperature setting), LLMs will always produce the same output for the same input.
Of course, the context you pass to the LLM also affects the determinism in a production system.
Theoretically, with a detailed enough spec, the LLM would produce the same output, regardless of temp/seed.
Side note: A neat trick to force more “random” output for prompts (when temperature isn’t variable enough), is to add some “noise” data to the input (i.e. off-topic data that the LLM “ignores” in it’s response).
No, setting the temperature to zero is still going to yeld different results. One might think they add random seeds, but it makes no sense for temperature zero. One theory is that the distributed nature of their systems adds entropy and thus produces different results each time.
Random seeds might be a thing, but for what I see there's a lot demand for reproducibility and yet no certain way to achieve it.
It's not really a mystery why it happens. LLM APIs are non-deterministic from user's point of view because your request is going to get batched with other users' requests. The batch behavior is deterministic, but your batch is going to be different each time you send your request.
The size of the batch influences the order of atomic float operations. And because float operations are not associative, the results might be different.
> Without the random seed and variable randomness (temperature setting), LLMs will always produce the same output for the same input.
Except they won't.
Even at temperature 0, you will not always get the same output as the same input. And it's not because of random noise from inference providers.
There are papers that explore this subject because for some use-cases - this is extremely important. Everything from floating point precision, hardware timing differences, etc. make this difficult.
Nondeterminism is not the issue here. Today's LLMs are not "round trip" tools. It's not like a compiler where you can edit a source file from 1975, recompile, and the binary does what 75'bin did plus your edit.
Rather, it's more like having an employee in 1975, asking them to write you a program to do something. Then time-machine to the present day and you want that program enhanced somehow. You're going to summon your 2026 intern and tell them that you have this old program from 1975 that you need updated. That person is going to look at the program's code, your notes on what you need added, and probably some of their own "training data" on programming in general. Then they're going to edit the program.
Note that in no case did you ask for the program to be completely re-written from scratch based on the original spec plus some add-ons. Same for the human as for the LLM.
> What I'm struggling with is, when you ask AI to do something, its answer is always undeterministically different, more or less.
For some computer science definition of deterministic, sure, but who gives a shit about that? If I ask it build a login page, and it puts GitHub login first one day, and Google login first the next day, do I care? I'm not building login pages every other day. What point do you want to define as "sufficiently deterministic", for which use case?
"Summarize this essay into 3 sentences" for a human is going to vary from day to day, and yeah, it's weird for computers to no longer be 100% deterministic, but I didn't decide this future for us.
Yes the idea of property taxes is NOT to tax based on increase in wealth, just the current amount of wealth. And it serves a good purpose in municipalities who have to make sure infrastructure such as roads power and sewage systems are paid somehow. More expensive houses typically require more of those.
I would go further and ask how does a person who is unable to work survive in our current society? Should we let them die of hunger? Send them to Equador? Of course not, only nazis would propose such a solution.
But, we as humans still have a need to understand the outputs of AI. We can't delegate this understanding task to AI because then we wouldn't understand AI and thus we could not CONTROL what the AI is doing, optimize its behavior so it maximizes our benefit.
Therefore, I still see a need for highlevel and even higher level languages, but ones which are easy for humans to understand. AI can help of course but challenge is how can we unambiguously communicate with machines, and express our ideas concisely and understandably for both us and for the machines.
reply