More

Majromax · 2026-03-10T13:56:45 1773151005

> 1) Anthropic doesn't want their subsidised plans being used outside of CC, which would imply that the money their making off it isn't enough, a

Claude Code use-cases also differ somewhat from general API use, where the former is engineered for high cache utilization. We know from overall API costs (both Anthropic and OpenRouter) that cached inputs cost an order of magnitude less than uncached inputs, but OpenCode/pi/OpenClaw don't necessarily have the same kind of aggressive cache-use optimizations.

Vertically integrated stacks might also be able to have a first layer of globally shared KV cache for the system prompts, if the preamble is not user specific and changes rarely.

> 2) last time I checked, API spending is capped at $5000 a month

Per https://platform.claude.com/docs/en/api/rate-limits, that seems to only be true for general credit-funded accounts. If you contact Anthropic's sales team and set up monthly invoicing, there's evidently no fixed spending limit.

Majromax · 2026-03-08T13:31:57 1772976717

As far as the timezone file is concerned, it's two changes but one shift. This is covered more fully in the complete news blob rather than the snippet shown at the top. Today, British Columbia moved from Pacific Standard Time (UTC-8) to Pacific Daylight Time (UTC-7); tomorrow the timezone is renamed to Pacific Time.

Unfortunately, the "PT" abbreviation is too short for the timezone database, so while they decide on another form they will temporarily use a bare -7 offset.

tempestn · 2026-03-08T21:16:09 1773004569

Ah, I see. They're reclassifying it as standard time without changing the offset. Yeah, it was my misunderstanding.

Majromax · 2026-03-05T17:50:37 1772733037

> “Changing the equation” by boldly breaking the law.

Is it? I think the law is truly undeveloped when it comes to language models and their output.

As a purely human example, suppose I once long ago read through the source code of GCC. Does this mean that every compiler I write henceforth must be GPL-licensed, even if the code looks nothing like GCC code?

There's obviously some sliding scale. If I happen to commit lines that exactly replicate GCC then the presumption will be that I copied the work, even if the copying was unconscious. On the other hand, if I've learned from GCC and code with that knowledge, then there's no copyright-attaching copy going on.

We could analogize this to LLMs: instructions to copy a work would certainly be a copy, but an ostensibly independent replication would be a copy only if the work product had significant similarities to the original beyond the minimum necessary for function.

However, this is intuitively uncomfortable. Mechanical translation of a training corpus to model weights doesn't really feel like "learning," and an LLM can't even pinky-promise to not copy. It might still be the most reasonable legal outcome nonetheless.

pornel · 2026-03-07T02:03:25 1772849005

Laws don't have to treat humans and machines equally. They can be "unfairly" biased for humans.

People have needs like "freedom of artistic expression" that we don't need to grant to machines.

Machines can operate at speeds and scales way beyond human abilities, so they can potentially create much more damage.

We can ban air pollution from machines without making it illegal to fart.

leecommamichael · 2026-03-05T19:50:11 1772740211

Non-sequitur. It can be both.

Majromax · 2026-03-04T03:02:14 1772593334

En-dashes, set off with spaces, are an acceptable substitute for unspaced em-dashes in some style guides. See for example this Canadian government guide: https://nos-langues.canada.ca/en/writing-tips-plus/en-dash.

The use seems to be more common in British than in American English.

Majromax · 2026-03-01T18:34:01 1772390041

Prompting all the way down? Have the AI create tests that document existing, known-good behaviours, then refactor while ensuring those tests pass.

sarchertech · 2026-03-01T19:25:37 1772393137

That doesn’t work because tests for Luke and feel are difficult at best and nearly impossible when the code wasn’t designed for it. It’s a chicken and egg problem that you need to refactor to be able to test things reasonably.

It’s not an impossible problem to solve. I could probably setup a test harness that uses the existing game as an oracle that checks to see if the same sequence of inputs produces the same outputs. But by the time one done all, got it to clean up the code, and then diagnosed and fixed the issue, I doubt I would have saved very much time at all if any.

Majromax · 2026-02-25T15:21:43 1772032903

For all of the recent talk about how Anthropic relies on heavy cache optimization for claude-code, it certainly seems like session-specific information (the exact datestamp, the pid-specific temporary directory for memory storage) enters awfully early in the system prompt.

Majromax · 2026-02-24T15:01:44 1771945304

> Have we not learned anything about technical debt and how it bites back hard?

I think LLMs are changing the nature of technical debt in weird ways, with trends that are hard to predict.

I've found LLMs surprisingly useful in 'research mode', taking an old and badly-documented codebase and answering questions like "where does this variable come from, and what are its ultimate consumers?" Its answers won't be as natural as a true expert's, but its answers are nonetheless useful. Poor documentation is a classic example of technical debt, and LLMs make it easier to manage.

They're also useful at making quick-and-dirty code more robust. I'm as guilty as anyone else of writing personal-use bash scripts that make all kinds of unjustified assumptions and accrete features haphazardly, but even in "chat mode" LLMs are capable of reasonable rewrites for these small problems.

More systematically, we also see now-routine examples of LLMs being useful at code de-obfuscation and even decompilation. These forward processes maximize technical debt compared to the original systems, yet LLMs can still extract meaning.

Of course, we're not now immune to technical debt. Vibe coding will have its own hard-to-manage technical debt, but I'm not quite sure that we have the countours well defined. Anecdotally, LLMs seem to have their biggest problem in the design space, missing the forest of architecture for the trees of implementation such that they don't make the conceptual cuts between units in the best place. I would not be so confident as to call this problem inherent or structural rather than transitory.

skydhash · 2026-02-24T16:48:15 1771951695

> taking an old and badly-documented codebase and answering questions like "where does this variable come from, and what are its ultimate consumers?"

Why do you even need an LLM for this? Code is formal notation, it’s not magic. Unless the code is obfuscated, even bad code is pretty clean on what they’re doing and how various symbols are created and used. What is not clear is “why” and the answer is often a business or a technical decision.

signatoremo · 2026-02-25T04:48:45 1771994925

Once you are dealing with legacy codebase older than you, with very little comments and confusing documentation, you'd understand that LLM is god send for untangling the mess and troubleshooting issues.

simonw · 2026-02-24T17:28:06 1771954086

> Why do you even need an LLM for this?

Once you get above a few hundred thousand lines of legacy undocumented code having a good LLM to help dig through it is really useful.

jimbokun · 2026-02-24T22:03:36 1771970616

None of what you describe is free.

After the LLM helps untangle the mess, if you leave the mess in place, you will have to ask the LLM untangle it for you every time you need to make a change.

Better to work with the LLM to untangle the technical debt then and there and commit the changes, so neither you nor the LLM have to work so hard in the future.

I’ve even seen anecdotal evidence that code that’s easier for humans to work with is easier for LLMs to work with as well.

Majromax · 2026-02-17T14:34:36 1771338876

Thinking happens in latent space, but the thinking trace is then the projection of that thinking onto tokens. Since autoregressive generation involves sampling a specific token and continuing the process, that sampling step is lossy.

However, it is a genuine question whether the literal meanings of thinking blocks are important over their less-observable latent meanings. The ultimate latent state attributable to the last-generated thinking token is some combination of the actual token (literal meaning) and recurrent thinking thus far. The latter does have some value; a 2024 paper (https://arxiv.org/abs/2404.15758) noted that simply adding dots to the output allowed some models to perform more latent computation resulting in higher-skill answers. However, since this is not a routine practice today I suspect that genuine "thinking" steps have higher value.

Ultimately, your thesis can be tested. Take the output of a reasoning model inclusive of thinking tokens, then re-generate answers with:

1. Different but semantically similar thinking steps (i.e. synonyms, summarization). That will test whether the model is encoding detailed information inside token latent space.

2. Meaningless thinking steps (dots or word salad), testing whether the model is performing detailed but latent computation, effectively ignoring the semantic context of

3. A semantically meaningful distraction (e.g. a thinking trace from a different question)

Look for where performance drops off the most. If between 0 (control) and 1, then the thinking step is really just a trace of some latent magic spell, so it's not meaningful. If between 1 and 2, then thinking traces serve a role approximately like a human's verbalized train of thought. If between 2 and 3 then the role is mixed, leading back to the 'magic spell' theory but without the 'verbal' component being important.

Majromax · 2026-02-17T14:08:30 1771337310

> I really hate that the anthropomorphizing of these systems has successfully taken hold in people's brains. Asking it why it did something is completely useless because you aren't interrogating a person with a memory or a rationale, you’re querying a statistical model that is spitting out a justification for a past state it no longer occupies.

"Thinking meat! You're asking me to believe in thinking meat!"

While next-token prediction based on matrix math is certainly a literal, mechanistic truth, it is not a useful framing in the same sense that "synapses fire causing people to do things" is not a useful framing for human behaviour.

The "theory of mind" for LLMs sounds a bit silly, but taken in moderation it's also a genuine scientific framework in the sense of the scientific method. It allows one to form hypothesis, run experiments that can potentially disprove the hypothesis, and ultimately make skillful counterfactual predictions.

> By asking it why it did something wrong, it'll treat that as the ground truth and all future generation will have that snippet in it, nudging the output in such a way that the wrong thing itself will influence it to keep doing the wrong thing more and more.

In my limited experience, this is not the right use of introspection. Instead, the idea is to interrogate the model's chain of reasoning to understand the origins of a mistake (the 'theory of mind'), then adjust agents.md / documentation so that the mistake is avoided for future sessions, which start from an otherwise blank slate.

I do agree, however, that the 'theory of mind' is very close to the more blatantly incorrect kind of misapprehension about LLMs, that since they sound humanlike they have long-term memory like humans. This is why LLM apologies are a useless sycophancy trap.

Majromax · 2026-02-15T14:51:57 1771167117

> Several times in my high school shop class kids shorted out 9V batteries trying to build circuits because they didn't understand how electronics work. At no point did our teacher stop them from doing so

Yes, and that's okay because the classroom is a learning environment. However, LLMs don't learn; a model that releases the magic smoke in this session will be happy to release it all over again next time.

> LLMs are just surfacing the fact that assessing and managing risk is an acquired, difficult-to-learn skill.

Which makes the problem worse, not better. If risk management is a difficult skill, then that means we can't extrapolate from 'easy' demonstrations of said skill to argue that an LLM is generally safe for more sensitive tasks.

Overall, it seems like LLMs have a long tail of failures. Even while their mean or median performance is good, they seem exponentially more likely than a similarly-competent human to advise something like `rm -rf /`. This is a deeply unintuitive behaviour, precisely because our 'human-like' intuition is engaged with resepct to the average/median skill.