Not just that, I think a lot of people are going to waste their time losing the battle (and make no mistake, they will lose) fighting against AI writing without ever asking themselves what makes writing good in the first place.
There’s good AI writing and bad organic writing. But it’s easier to point out a few LLM-isms than to actually identify the problems with text.
Sure, but the LLM-isms in AI writing are mentally exhausting to see in every way at this point.
The whole point of reading, frankly, is to understand the voice of other people. When you pass that through a distorted filter that makes everyone sound the same... its bad, lossy, frustrating communication
It's also dishonest. When you publish something that is direct output without your wording. Digital catfishing at best.
The only good AI writing is providing the prompt, because the question is way more interesting, and way more constructive to learning than the answer
The point of writing is to convey an idea to another person or yourself at a future date. Authenticity has nothing to do with it. I frankly do not care about the “authentic voice” of the author of a random blog. I want to know if they have any interesting ideas.
I don’t think it’s absolutely embarrassing. First of all, the point of the author writing at all is to aid understanding, not produce prose. So from that standpoint, what would be embarrassing would be to include incorrect facts that suggest a fundamental misunderstanding of the topic.
From my read, it is fine. The brief history of LLMs is complicated since every single component has papers introducing enhancements. So it’s easy to ignore them or get bogged down with details.
The author appears to be a security researcher learning about LLMs for the purpose of defending against common attacks. So this piece is that person giving themselves a crash course on the topic. The fact that they cleaned up their notes with an LLM is frankly completely irrelevant.
When I first started working with LLMs in 2019 AI was in no was synonymous with LLMs. I personally realized pretty quickly that they’d eventually be able to write software that compiles. Not necessarily good software, but software that passes a minimum threshold.
Then again there were all sorts of hallucination-adjacent issues which are still present but rarer as models get bigger. Wondering about the consequences for software engineering as an industry was a little bit of an “overpopulation on Mars” problem since GPT2 could barely string a paragraph together.
Another factor is the industry’s continued insistence on evaluating the ability to write software using leetcode. Well, Claude is probably the best leetcoder in the world now, but since our industry never figured out better evaluation criteria for candidates of course we are backed into a corner.
It’s not as if they were one shot. 5 repos prior, two published pre-prints on SSRN and thousands of hours back my research that is right there for you to peer review and use freely.
“Semiotic awareness” is not standard ML terminology. The dictionary definition of semiotic simply means “relating to symbols” so it’s a bit grandiose to say you have Qwen “awareness of symbols” when in reality it’s a marginal improvement if even true.
Also to say that a philosopher that died 100 years ago inspired a new attention head is another instance of GPT off his rocker again. You don’t need MAH to contextualize “freedom” in a sentence. Attention already does that.
Your method appears to be similar to LoRA but simply less expressive. Some kind of manipulation to layers 7, 14, and 21. Did you compare with other layers? This is obviously extremely specific to a particular backbone.
Also your documents use a ton of nonstandard jargon which only serve to confuse laypeople and annoy anyone who is familiar with ML. Saying your change adds “semiotic awareness” is meaningless when your experiments claim only marginal improvements. Clearly the model had most of the capability before.
More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?
It is not LoRA. LoRA fine tunes capabilities into the model. SRT Adapter is a small overlay on a frozen model whose purpose is to make internal reasoning observable. It surfaces what the model is activating at moments of high divergence.
The layers 7, 14, and 21 were chosen after probing. They showed the strongest regime signals. We did compare other layers. The term semiotic awareness is just shorthand for detecting and modulating higher order meaning patterns. If the term is unhelpful I will drop it.
The capability gains are often marginal on standard benchmarks. The intended value is observability and steerability without retraining the backbone.
It’s a data point. I could imagine in a hardware constrained setting we might not care about training on enormous token counts, and on smaller devices it’s great if we can simplify the architecture.
I agree that this isn’t proof that it scales to trillions of tokens, but this does show a scaled up experiment would be worth a shot.
The Chinchilla scaling laws give you a minimum for the number of tokens you should be using for a given size: if you can't meet what they suggest for that size, you should shrink the size, as, otherwise, the capacity of the model is going to waste.
I do agree that it is a datapoint, but GP's point is that this model was undertrained, so it's hard to draw the same conclusions from it that we would from other research.
A helpful middle ground I’ve found is to build out the architecture you want, but stub out the tedious function implementations you don’t want to do yourself.
And by stub out I mean write the function signature yourself, including parameters it’ll accept and return types. Add a comment if necessary about what it will do.
There’s good AI writing and bad organic writing. But it’s easier to point out a few LLM-isms than to actually identify the problems with text.
reply