More

janalsncm · 2026-06-06T08:27:11 1780734431

Not just that, I think a lot of people are going to waste their time losing the battle (and make no mistake, they will lose) fighting against AI writing without ever asking themselves what makes writing good in the first place.

There’s good AI writing and bad organic writing. But it’s easier to point out a few LLM-isms than to actually identify the problems with text.

blharr · 2026-06-06T14:09:26 1780754966

> There's good AI writing

Sure, but the LLM-isms in AI writing are mentally exhausting to see in every way at this point.

The whole point of reading, frankly, is to understand the voice of other people. When you pass that through a distorted filter that makes everyone sound the same... its bad, lossy, frustrating communication

It's also dishonest. When you publish something that is direct output without your wording. Digital catfishing at best.

The only good AI writing is providing the prompt, because the question is way more interesting, and way more constructive to learning than the answer

janalsncm · 2026-06-06T20:35:05 1780778105

The point of writing is to convey an idea to another person or yourself at a future date. Authenticity has nothing to do with it. I frankly do not care about the “authentic voice” of the author of a random blog. I want to know if they have any interesting ideas.

wj · 2026-06-06T22:42:48 1780785768

I think because so much of an idea is shaped by the language used to convey it, it may be hard to separate the person from the LLM.

I think gp may want to know if a <person> has an interesting idea rather than <person + llm>.

janalsncm · 2026-06-06T08:20:50 1780734050

I don’t think it’s absolutely embarrassing. First of all, the point of the author writing at all is to aid understanding, not produce prose. So from that standpoint, what would be embarrassing would be to include incorrect facts that suggest a fundamental misunderstanding of the topic.

From my read, it is fine. The brief history of LLMs is complicated since every single component has papers introducing enhancements. So it’s easy to ignore them or get bogged down with details.

The author appears to be a security researcher learning about LLMs for the purpose of defending against common attacks. So this piece is that person giving themselves a crash course on the topic. The fact that they cleaned up their notes with an LLM is frankly completely irrelevant.

janalsncm · 2026-06-06T05:26:51 1780723611

When I first started working with LLMs in 2019 AI was in no was synonymous with LLMs. I personally realized pretty quickly that they’d eventually be able to write software that compiles. Not necessarily good software, but software that passes a minimum threshold.

Then again there were all sorts of hallucination-adjacent issues which are still present but rarer as models get bigger. Wondering about the consequences for software engineering as an industry was a little bit of an “overpopulation on Mars” problem since GPT2 could barely string a paragraph together.

Another factor is the industry’s continued insistence on evaluating the ability to write software using leetcode. Well, Claude is probably the best leetcoder in the world now, but since our industry never figured out better evaluation criteria for candidates of course we are backed into a corner.

janalsncm · 2026-06-05T10:14:25 1780654465

You should write your readmes by hand. You’ll learn a lot more that way, and it’ll help to ground the project.

spacebacon · 2026-06-05T11:09:44 1780657784

It’s not as if they were one shot. 5 repos prior, two published pre-prints on SSRN and thousands of hours back my research that is right there for you to peer review and use freely.

janalsncm · 2026-06-05T10:12:25 1780654345

“Semiotic awareness” is not standard ML terminology. The dictionary definition of semiotic simply means “relating to symbols” so it’s a bit grandiose to say you have Qwen “awareness of symbols” when in reality it’s a marginal improvement if even true.

Also to say that a philosopher that died 100 years ago inspired a new attention head is another instance of GPT off his rocker again. You don’t need MAH to contextualize “freedom” in a sentence. Attention already does that.

janalsncm · 2026-06-05T09:57:52 1780653472

Your method appears to be similar to LoRA but simply less expressive. Some kind of manipulation to layers 7, 14, and 21. Did you compare with other layers? This is obviously extremely specific to a particular backbone.

Also your documents use a ton of nonstandard jargon which only serve to confuse laypeople and annoy anyone who is familiar with ML. Saying your change adds “semiotic awareness” is meaningless when your experiments claim only marginal improvements. Clearly the model had most of the capability before.

More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?

spacebacon · 2026-06-05T10:54:44 1780656884

It is not LoRA. LoRA fine tunes capabilities into the model. SRT Adapter is a small overlay on a frozen model whose purpose is to make internal reasoning observable. It surfaces what the model is activating at moments of high divergence.

The layers 7, 14, and 21 were chosen after probing. They showed the strongest regime signals. We did compare other layers. The term semiotic awareness is just shorthand for detecting and modulating higher order meaning patterns. If the term is unhelpful I will drop it.

The capability gains are often marginal on standard benchmarks. The intended value is observability and steerability without retraining the backbone.

janalsncm · 2026-06-05T02:40:30 1780627230

It’s a data point. I could imagine in a hardware constrained setting we might not care about training on enormous token counts, and on smaller devices it’s great if we can simplify the architecture.

I agree that this isn’t proof that it scales to trillions of tokens, but this does show a scaled up experiment would be worth a shot.

Philpax · 2026-06-05T03:32:49 1780630369

The Chinchilla scaling laws give you a minimum for the number of tokens you should be using for a given size: if you can't meet what they suggest for that size, you should shrink the size, as, otherwise, the capacity of the model is going to waste.

I do agree that it is a datapoint, but GP's point is that this model was undertrained, so it's hard to draw the same conclusions from it that we would from other research.

janalsncm · 2026-06-02T09:15:03 1780391703

A helpful middle ground I’ve found is to build out the architecture you want, but stub out the tedious function implementations you don’t want to do yourself.

And by stub out I mean write the function signature yourself, including parameters it’ll accept and return types. Add a comment if necessary about what it will do.

janalsncm · 2026-06-01T06:01:30 1780293690

Newspapers are struggling/dying. A counterexample is services like HBO/Netflix which have ad-free tiers.

janalsncm · 2026-05-30T02:03:52 1780106632

Really practical teaching approach. I clicked in to see how safetensors are loaded and just kept reading. Thanks for sharing.