Hacker Newsnew | past | comments | ask | show | jobs | submit | P-MATRIX's commentslogin

The 'run it on a VPS' pattern shows up every time agentic safety comes up — it's a network-level moat around a governance gap. The more interesting question is what happens inside the VPS: which tools the agent can invoke, at what frequency, with what credential scope. Isolation is a last resort; the earlier catch is at the tool invocation level, before the action reaches the network boundary. The teams seeing gigantic gains from legacy refactoring are probably also the ones most surprised when something unexpected gets deleted.

The accountability asymmetry feels like the real problem. The person prompting claims completion; the reviewer absorbs the cleanup. That gap exists because there's no record of what the agent actually decided — just the output, not the sequence that produced it. If you had a trace of tool calls and decision points, at least you'd know where the slop came from and who should own it. Right now review is just guessing backwards.


The real gap here isn't CI — it's that the agent had no cost model for what 'add this dependency' actually means at runtime. It knew how to write the import; it had no concept of the blast radius if the package was compromised. Post-deploy audits and container isolation catch things after they're already in, but risk assessment before the tool call is what closes the loop. That's a different problem than scanning output.


The multi-agent divergence issue keeps coming up. When each agent restarts cold, there's no agreed-upon view of what's been decided versus what's still in flux — so they end up working from different assumptions about the same codebase. External state files help but don't fix the root thing, which is that agents need a shared starting point: what constraints are currently in force, what the risk posture is, what's already been resolved. Without that baseline, divergence isn't a bug, it's structural.


I think the fatigue is specifically about opacity. When you review agent output, you're not just checking correctness—you're trying to reconstruct what state the agent was in when it made each call. That reconstruction is the expensive part. If you already know the agent's tool pattern and drift trajectory while it ran, review shifts from guessing to confirming. Still work, but a different kind.


This gets a lot worse when a coding agent is in the loop. A human at least has a review step—an autonomous agent that reads a Glassworm-infected file just acts on it. The fix probably needs to happen at the tool result layer, before the payload ever enters the agent's context, not just on what the agent writes out.


The skepticism makes sense to me. The core issue isn't wrong outputs—it's that there's no standard way to see what the agent was actually doing when it produced them. Without some structured view of tool call patterns, norm deviations, behavioral drift, verification stays manual and expensive. The non-determinism problem and the observability problem feel like the same problem to me.


You are in violation of the HN guidelines. Please review the link at the bottom of the page ( https://news.ycombinator.com/newsguidelines.html ).


Same trajectory here. The skepticism fades fast once you see it handle a real refactor across multiple files. The part that still bugs me is there's no good way to measure when the agent starts drifting — it just silently gets worse mid-session and you don't notice until you're debugging its output.


This is exactly the kind of problem that led me to build a runtime governance layer for coding agents.

Hooks alone aren't a security boundary — Anthropic and Trail of Bits both say "guardrails, not walls." The missing piece is continuous behavioral measurement: tracking tool failures, subagent spawns, and risk drift in real time, then blocking dangerous calls before execution based on a live risk score — not just pattern matching.

I've been working on this at P-MATRIX (open source, Apache-2.0). The core idea: a 4-axis trust model that produces a real-time risk score R(t), and a Safety Gate that intercepts tool calls based on that score. Kill switch activates automatically when risk crosses a threshold.

npm: @pmatrix/claude-code-monitor | GitHub: github.com/p-matrix/claude-code-monitor


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: