Hacker Newsnew | past | comments | ask | show | jobs | submit | AlexCoventry's commentslogin

I'm not blaming you, but it's scary how many people are running these agents as if they were trusted entities.


they're tools, you don't ascribe trust to them. you trust or distrust the user of the tool. It's like say you trust your terminal emulator. And from my experience, they will ask for permission over a directory before running. I would love to know how people are having this happen to them. If you tell it it can make changes to a directory, you've given it every right to destroy anything in that directory. I haven't heard of people claiming it exceeded those boundaries and started messing with things it wasn't permitted to mess with to begin with.

That would be --dangerously-skip-permissions for Claude, and --dangerously-skip-permissions for codex.

Aka yolo mode. And yes, people (me) are stupid enough to actually use that.


It's a people problem then. not blaming here, I'm just saying it isn't the tool being untrustworthy. I too get burned badly when I play with fire.

OK, but we learned decades ago about putting safety guards on dangerous machinery, as part of the machinery. Sure, you can run LLMs in a sandbox, but that's a separate step, rather than part of the machinery.

What we need is for the LLM to do the sandboxing... if we could trust it to always do it.


Again, the trust is for the human/self. it's auto-complete, it hallucinates and commits errors, that's the nature of the tool. It's for the tools users to put approprite safeguards around it. Fire burns you, but if you contain it, it can do amazing things. It isn't the fire being untrustworthy for failing to contain itself and start burning your cloth when you expose your arm to it. You're expecting a dumb tool to be smart and know better. I suspect that is because of the "AI" marketing term and the whole supposition that it is some sort of pseudo-intelligence. it's just auto-complete. When you have it run code in an environment, it could auto-complete 'rm -rf /'.

> Fire burns you, but if you contain it, it can do amazing things. It isn't the fire being untrustworthy for failing to contain itself and start burning your cloth when you expose your arm to it.

True. But I expect my furnace to be trustworthy to not burn my house down. I expect my circular saw to come with a blade guard. I expect my chainsaw to come with an auto-stop.

But you are correct that in the AI area, that's not the kind of tool we have today. We have dangerous tools, non-OSHA-approved tools, tools that will hurt you if you aren't very careful with them. There's been all this development in making AI more powerful, and not nearly enough in ergonomics (for want of a better word).

We need tools that actually work the way the users expect. We don't have that. (And, as you say, marketing is a big part of the problem. People might expect closer to what the tool actually does, if marketing didn't try so hard to present it as something it is not.)


Not sure whether you're being sarcastic, either.

https://en.wikipedia.org/wiki/Business_Plot


OpenAI is implying that code may no longer be human readable in some circumstances.

> The resulting code does not always match human stylistic preferences, and that’s okay. As long as the output is correct, maintainable, and legible *to future agent runs*, it meets the bar.

https://openai.com/index/harness-engineering/


> If you look at the code, you’ll notice it has a strong “translated from C++” vibe. That’s because it is translated from C++. The top priority for this first pass is compatibility with our C++ pipeline. The Rust code intentionally mimics things like the C++ register allocation patterns so that the two compilers produce identical bytecode. Correctness is a close second. We know the result isn’t idiomatic Rust, and there’s a lot that can be simplified once we’re comfortable retiring the C++ pipeline.

Does this still get you most of the memory-safety benefits of using Rust vs C++?


I think this largely depends on how much unsafe Rust they produced.

He didn't even have to be the one buying them. Lots of people benefit from a tool like OpenClaw getting popular.

Are there any with a credible approach to security, privacy and prompt injections?

Does any credible approach to prompt injection even exist?

Anyone who figures out a reliable solution would probably never have to work again.

Not that I'm aware of, but I probably won't be interested in these kinds of assistants until there are.

He's also a great booster of Codex. Says he greatly prefers it to Claude. So his role might turn out to be evanglism.

Yup, hes highly delusional if he actually thinks Sam cares about him and the project. Its all about optics.

Who purported that Sam cares about him?

Why would he care if Sam cares about him?


Someone clearly hasnt watched the podcast. Do your research before posting.

These are comments on the posted article.

If you want to bring other sources into the conversation, you could link,

or at least reference them by name upfront, right?


Listen to him on a podcast? He said he liked Zuckerberg being more personal with him and Sam was colder

He’s not “highly delusional”

He literally said he doesn’t give a fuck about money and ”I will get the fuck out of there [openai] if I don’t like it“ [source friedman interview]

This guy is very smart, and very persistent. I really don’t get all the negativity about the acquisition and especially not about him.


There are actually books which recommend that organizations track employee tokens burned as a proxy for AI adoption. Surprised me a bit.

it's the only KPI available.

Humans don't have much capacity for systematic tree search. It's sort of amazing that humans can do as well as they can, given that limitation.

FWIW, you'd probably be able to buy a lot of goods and services for $7/day, if robots were doing literally all the work.


Agreed. The quality of life bar will be higher for sure. But it will still technically be a "subsistence" lifestyle, with no prospect of improvement. Perhaps that will suffice for most people? We're going to find out.


> if robots were doing literally all the work

Let me know when ChatGPT can do your laundry.


Give it five years.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: