they're tools, you don't ascribe trust to them. you trust or distrust the user of the tool. It's like say you trust your terminal emulator. And from my experience, they will ask for permission over a directory before running. I would love to know how people are having this happen to them. If you tell it it can make changes to a directory, you've given it every right to destroy anything in that directory. I haven't heard of people claiming it exceeded those boundaries and started messing with things it wasn't permitted to mess with to begin with.
OK, but we learned decades ago about putting safety guards on dangerous machinery, as part of the machinery. Sure, you can run LLMs in a sandbox, but that's a separate step, rather than part of the machinery.
What we need is for the LLM to do the sandboxing... if we could trust it to always do it.
Again, the trust is for the human/self. it's auto-complete, it hallucinates and commits errors, that's the nature of the tool. It's for the tools users to put approprite safeguards around it. Fire burns you, but if you contain it, it can do amazing things. It isn't the fire being untrustworthy for failing to contain itself and start burning your cloth when you expose your arm to it. You're expecting a dumb tool to be smart and know better. I suspect that is because of the "AI" marketing term and the whole supposition that it is some sort of pseudo-intelligence. it's just auto-complete. When you have it run code in an environment, it could auto-complete 'rm -rf /'.
> Fire burns you, but if you contain it, it can do amazing things. It isn't the fire being untrustworthy for failing to contain itself and start burning your cloth when you expose your arm to it.
True. But I expect my furnace to be trustworthy to not burn my house down. I expect my circular saw to come with a blade guard. I expect my chainsaw to come with an auto-stop.
But you are correct that in the AI area, that's not the kind of tool we have today. We have dangerous tools, non-OSHA-approved tools, tools that will hurt you if you aren't very careful with them. There's been all this development in making AI more powerful, and not nearly enough in ergonomics (for want of a better word).
We need tools that actually work the way the users expect. We don't have that. (And, as you say, marketing is a big part of the problem. People might expect closer to what the tool actually does, if marketing didn't try so hard to present it as something it is not.)
OpenAI is implying that code may no longer be human readable in some circumstances.
> The resulting code does not always match human stylistic preferences, and that’s okay. As long as the output is correct, maintainable, and legible *to future agent runs*, it meets the bar.
> If you look at the code, you’ll notice it has a strong “translated from C++” vibe. That’s because it is translated from C++. The top priority for this first pass is compatibility with our C++ pipeline. The Rust code intentionally mimics things like the C++ register allocation patterns so that the two compilers produce identical bytecode. Correctness is a close second. We know the result isn’t idiomatic Rust, and there’s a lot that can be simplified once we’re comfortable retiring the C++ pipeline.
Does this still get you most of the memory-safety benefits of using Rust vs C++?
Agreed. The quality of life bar will be higher for sure. But it will still technically be a "subsistence" lifestyle, with no prospect of improvement. Perhaps that will suffice for most people? We're going to find out.
reply