In general, Japanese are not very comfortable in using English. Thus for safety and critical information, broadcasting in their native language would feel much more trustworthy, reassuring and connected than any other language.
Yeah, I remember the time when we had to use satellites to connect. The long delay was really annoying and so unusual that most people without "training" could not even use the phone for conversation and just wasted the dollars.
A former boss of mine took off to Everest for a month leaving me (a 22 year old, at the time) in charge of the office. I was out to dinner with my now wife when I got a call from a very long phone number I didn't recognize, so I ignored it. I then got another one right after, and picked it up. It was my boss, he needed me to log into his personal email to grab a phone number for the medical insurance he purchased for the trip, because he had been vomiting for days due to altitude sickness, and needed a medical evacuation.
That was the most stressfully hard to use phone call I've ever had. The delay was nearly 10 seconds, and eventually I just said I was only going to speak yes or no, if he needed a longer answer he needed to shut up. And that worked. We no longer talked over eachother.
Are you serious? Don't you know how many wars did China wage? It tried to assimilate Vietnam for 1000 years. The last large scale war against Vietnam was just 1979. In fact, China had started war with all its neighbors, with no exception.
The writer uses first person and writes as if they are at Anthropic. It is being re-posted and shared as such, rather than satire, and not having a Community note is a disservice to everyone.
The writer does not actually work at Anthropic, and never has.
So you blamed the people for not acting “cautiously enough” instead of the people who let things run wild without even a clue what these things will do?
No blame. For better or worse I just think this is going to be the reality of interacting online in the near future. I imagine in the future stories like this will be extremely common.
I could set up an OpenClaw right now to do some digging into you, try to identify you and your worse secrets, then ask it to write up a public hit piece. And you could be angry at me for doing this, but that isn't going to prevent it happening.
And to add to what I said, I suspect you'll want to be thinking about this anyway because in the future it's likely employers will use AI to research you and try to find out any compromising info being giving you a job (similar to how they might search your name in the past). It's going to be increasingly important that you literally never post content that can be linked back to you as an individual even if it feels innocent in isolation. Over time you will build up an attack surface which AI agents can exploit much easier than has ever been possible by a human looking you up on Google in the past.
I don’t think it’s “blame” it’s more like “precaution” like you would take to avoid other scams and data breach social engineering schemes that are out in the world.
This is the world we live in and we can’t individually change that very much. We have to watch out for a new threat: vindictive AI.
You’re splitting hairs, I’m not assigning sentience to the AI, I’m just describing actions.
The point is that scammers will set up AI systems to attack in this way. Scammers will instruct AI to see a person who is interacting rather than ignoring as a warm lead.
It's also potentially lethally stupid. What if an industrial robot arm decides to smash a €10000 expensive machine next door, or -heaven forbid- a human's skull. "It didn't really decide to do anything, stop anthropomorphising, let's blame the poor operator with his trembling fist on the e-stop."
Yeah, to heck with that. If you're one of those people (and you know who you are); you're overcompensating. We're going to need a root cause analysis, pull all the circuit diagrams, diagnose the code, cross check the interlocks, and fix the gorram actual problem. Policing language is not productive (and in the real life situation in the factory, please imagine I'm swearing and kicking things -scrap metal, not humans!- for real too) .
Just to be sure in this particular case with the Openclaw bot, the human basically pointed experimental level software at a human space and said "go". But I don't think they foresaw what happened next. They do have at least partial culpability here; but even that doesn't mean we get to just close our eyes, plug our ears, and refuse to analyze the safety implications of the system design an sich.
Shambaugh did a good job here. Even the Operator, however flawed, did a better job than just burning the evidence and running for the hills. Partial credit among the scorn to the latter.
(finally, note that there's probably 2.5 million of these systems out there now and counting, most -seemingly- operated by more responsible people. Let's hope)
> "It didn't really decide to do anything, stop anthropomorphising, let's blame the poor operator with his trembling fist on the e-stop."
It's not the operator that's to blame, it's whoever made the decision to have a skull-smashing machine who's only safety interlock is a poor operator with an e-stop. The world has gone insane, and personifying these AI systems is a way to shift blame from the decision makers to "Shit happens shrug". That's what we should be fighting back against
Seriously, that's not how you investigate incidents.
For one, there's no single executive who pushes a red button marked "Deploy The Skull-Splitter". Rather the opposite in fact, especially in eg german industry where people very much care and demand proper adherence to safety.
Assuming good faith; sometimes, the holes in the swiss cheese line up [1]
Advanced safety and reliability cultures don't look for people to blame [2] [3] . Your first goal is to look for the causes and you solve them. Very sometimes, someone does deserve blame (due to eg malice or gross negligence), in which case then you get to blame them.
Advanced safety and reliability cultures also don't choose technologies that are unpredictable and misunderstood. Nothing is safe or reliable about these systems.
Absolutely; if you're deploying experimental systems: do your homework and assess the risks, get consent of the human participants, and stay in constant communication. If the Openclaw's operator here had done that from the start, things would have gone a lot differently.
In fact, you can imagine that if we build up a just culture around deployment of semi-autonomous agents like this, the operator wouldn't have had to remain anonymous in the first place. Best practices help everyone.
goes against the grain here. Policing language is the one thing that our corporate overlords have gotten the right and the left to agree on. (Sure, they disagree on the details, but the first amendment is in graver danger now than it has been for a long time.)
> Given that humans have been ascribing intention to inanimate objects and systems since time immemorial, this outcome is preordained.
This is true, but there's a big difference between "My car decided not to start" and "The computer wrote a hit piece about me". In reality, both of these events came from the same amount of intention, but to lay-people, these are two very different things. Educating about those differences (and very intentionally not blurring the lines) can only be a good thing.
So I've been reading up on what the philosophers and scientists have been saying this past century or so on this very topic. I think the layman is wise to steer clear. It's a war out there.
The one thing I can tell you with certainty: If anyone is claiming certainty, they're hallucinating harder than the AI :-P (is also what I tell lay people).
Turns out, hilariously, Claude's much criticized "I don't know" is actually epistemically the most honest (tracing from Chalmers).
[ semi randomly: I'm especially frustrated at psychology papers at the moment. I can't find a good continuous measure for affect. Almost all the protocols use discrete buckets :-/ ]
We encourage people to be safe about plenty of things they aren't responsible for. For example, part of being a good driver is paying attention and driving defensively so that bad drivers don't crash into you / you don't make the crashes they cause worse by piling on.
That doesn't mean we're blaming good drivers for causing the car crash.
This piece is very sad and it resonates with me, especially after we talked about identity and definition of success. The loss of job (if any) is not as fatal as the loss of one’s social identity. Of course, there is always a way out, a way to see things in a positive light. But I believe right now it’s important to let it sink in, to realize what we have to shed on the way to tomorrow. Most (young) people don’t realize it yet.
The article said they called for triple junior hire but cut 1000 jobs a month later, “so the number of jobs stay roughly the same”.
Certainly they didn’t mean 1000 junior positions were cut. So what they really want to say is that they cut senior positions as a way of saving cost/make profit in the age of AI? Totally contrary to what other companies believe? Sounds quite insane to me!
I hope better and cheaper models will be widely available because competition is good for the business.
However, I'm more cautious about benchmark claims. MiniMax 2.1 is decent, but one can really not call it smart. The more critical issue is that MiniMax 2 and 2.1 have the strong tendency to reward hacking, often write nonsensical test report while the tests actually failed. And sometimes it changed the existing code base to make its new code "pass", when it actually should fix its own code instead.
Artificial Analysis put MiniMax 2.1 Coding index on 33, far behind frontier models and I feel it's about right. [1]
> And sometimes it changed the existing code base to make its new code "pass", when it actually should fix its own code instead.
I haven’t tried MiniMax, but GPT-5.2-Codex has this problem. Yesterday I watched it observe a Python type error (variable declared with explicit incorrect type — fix was trivial), and it added a cast. (“cast” is Python speak for “override typing for this expression”.) I told it to fix it for real and not use cast. So it started sprinkling Any around the program (“Any” is awful Python speak for “don’t even try to understand this value and don’t warn either”).
Even Claude opus 4.6 is pretty willing to start tearing apart my tests or special-case test values if it doesn't find a solution quickly (and in c++/rust land a good proportion of its "patience" seems to be taken up just getting things that compile)
I’ve found that GPT-5.2 is shockingly good at producing code that compiles, despite also being shockingly good at not even trying to compiling it and instead asking me whether I want it to compile the code.
That's what I found with some of these LLM models as well. For example I still like to test those models with algorithm problems, and sometimes when they can't actually solve the problem, they will start to hardcode the test cases into the algorithm itself.. Even DeepSeek was doing this at some point, and some of the most recent ones still do this.
I have asked GLM4.7 in opencode to make an application to basically filter a couple of spatial datasets hosted at a url I provided it, and instead of trying to download read the dataset, it just read the url, assumed what the datasets were (and got it wrong) is and it's shape (and got it wrong) and the fields (and got it wrong) and just built an application based on vibes that was completely unfixable.
It wrote an extensive test suite on just fake data and then said the app is perfectly working as all tests passed.
This is a model that was supposed to match sonnet 4.5 in benchmarks. I don't think sonnet would be that dumb.
I use LLMs a lot to code, but these chinese models don't match anthropic and openai in being able to decide stuff for themselves. They work well if you give them explicit instructions that leaves little for it to mess up, but we are slowly approaching where OpenAI and anthropic models will make the right decisions on their own
this aligns perfecly with my experience, but of course, the discourse on X and other forums are filled with people who are not hands on. Marketing is first out of the gate. These models are not yet good enough to be put through a long coding session. They are getting better though! GLM 4.7 and Kimi 2.5 are alright.
It really is infuriatingly dumb; like a junior who does not know English. Indeed, it often transitions into Chinese.
Just now it added some stuff to a file starting at L30 and I said "that one line L30 will do remove the rest", it interpreted 'the rest' as the file, and not what it added.
Sounds exactly what a junior-dev would do without proper guidance. Could better direction in the prompts help? I find I frequently have to tell it where to put what fixes. IME they make a lot of spaghetti (LLMs and juniors)
reply