Hacker Newsnew | past | comments | ask | show | jobs | submit | cranberryturkey's commentslogin

Interesting approach. The session amnesia problem is real — every time you start a new Claude session, you're essentially onboarding a new team member who knows nothing about your project.

Question: how do you handle conflicting decisions? If engineer A decides on approach X in a session on Monday, and engineer B decides on approach Y on Tuesday, does SageOx surface the conflict or just store both as valid context?

Also curious about the retrieval quality. The hardest part isn't capturing decisions — it's retrieving the RIGHT context at the RIGHT time. Too much context and you're back to the LLM ignoring half of it. Too little and you miss critical constraints. What's your chunking/relevance strategy?


So far, we have been working on tiny high velocity teams which have immense alignment because we are all colocated.

Many of the issues of conflicts are basically dealt with us hammering them out in our daily standups. We see these as 'merge walls'. Both Ryan and I have been principal engineers in Amazon in our past lives: every day with Claude feels like dropping into a design meeting on a team with 100 engineers. So, we just accept that we have to spend a lot of time whiteboarding and talking as we align our mental models at a daily cadence.

Really good point on the RIGHT context at the RIGHT time. With the first release, we have focused on the plumbing to collect all the sessions and the transcripts. As we get more data, we are going to step up into the next level of analysis where we collate all the data into the right insight at the right time.


The creative vs toil split resonates, but I think there's a third category everyone misses: the connective tissue. The glue code, the error handling, the edge cases that aren't creative but teach you how things actually break.

I run 17 products as an indie maker. AI absolutely helps me ship faster — I can prototype in hours what used to take days. But the understanding gap is real. I've caught myself debugging AI-generated code where I didn't fully grok the failure mode because I didn't write the happy path.

My compromise: I let AI handle the first pass on boilerplate, but I manually write anything that touches money, auth, or data integrity. Those are the places where understanding isn't optional.


Biggest gap I see in most "LLM for practitioners" guides is they skip the evaluation piece. Getting a prompt working on 5 examples is easy — knowing if it actually generalizes across your domain is the hard part. Especially for analysts who are used to statistical rigor, the vibes-based evaluation most LLM tutorials teach feels deeply unsatisfying.

Does this guide cover systematic eval at all?


Totally agree it is critical. Each of chapters 4/5/6 have specific sections demonstrating testing. For structured outputs it goes through an example ground truth and calculating accuracy, demoing an example comparing Haiku 3 vs 4.5.

For Chapter 5 on RAG, it goes through precision/recall (with emphasis typically on recall for RAG systems).

For Chapter 6, I show a demo of LLM as a judge (using structured outputs to have specific errors it looks for) to evaluate a more fuzzy objective (writing a report based on table output).


The real question is adoption friction. The annotation requirement means this won't just slot into existing codebases — someone has to go through and mark up every buffer relationship. Google turning on libcxx hardening in production with <0.5% overhead is compelling precisely because it required zero source changes.

The incremental path matters more than the theoretical coverage. I'd love to see benchmarks on a real project — how many annotations per KLOC, and what % of OOB bugs it actually catches in practice vs. what ASAN already finds in CI.


The WebKit folks have apparently been very successful with the annotations approach[0]. It's a shame that a few of the loudest folks in WG21 have decided that C++ already has the exact right number of viral annotations already, and that the language couldn't possibly survive this approach being standardized.

[0]https://www.youtube.com/watch?v=RLw13wLM5Ko


The e-ink display is the killer feature. Week-long battery life and always-on readable display even in direct sunlight. Every other smartwatch is a tiny phone screen that dies in a day. Pebble chose the opposite trade-off: less flashy but actually useful as a watch. The open SDK and hackable firmware are the other half - you can write watchfaces and apps in C, which attracted a dev community that most wearables never get.

My old pebble lasted a week. I got one of the new ones in December and so far I've only had to charge it once!

It's e-paper, not e-ink.

The bubble framing misses something: the people who are "hyped" are often the ones shipping things that actually work now. Not hypothetically — right now.

I build tools for freelancers and indie makers. The shift in the last year has been real. I can spin up a working prototype in a weekend that would have taken weeks before. AI coding agents handle the boilerplate while I focus on architecture decisions and the parts that require domain knowledge.

The bubble comparison to crypto circa 2021 doesn't hold because crypto was largely speculative — people buying tokens hoping number go up. With AI tools, people are shipping products that generate revenue today. The value creation is immediate and measurable.

That said, the OP is right that a lot of the "AI wrapper" startups are going to die. The moat for most of them is nonexistent. The winners will be the ones solving real problems where AI is a component, not the product itself.


The resource-centric approach is the right call. I've been running self-hosted infrastructure for my own projects for a while now, and the biggest lesson is that flat networks just don't scale when you start adding services — every new thing you expose becomes another thing to audit.

The NAT hole-punching with WireGuard for P2P connections is interesting. Do you handle cases where both sides are behind symmetric NATs? That's historically been the hardest case for hole-punching, and most solutions end up falling back to relay servers anyway (which defeats the purpose of avoiding centralized traffic).

Also curious about the connector deployment model — is it one connector per resource, or can a single connector bridge multiple resources in the same network segment?


They really ficked up by not embracing openclaw now I use codex 5.3

The personalized agent space is getting crowded but most tools focus on personal productivity. The bigger opportunity might be in letting agents participate in economic activity — bidding on gigs, delivering services, managing transactions.

I've been watching platforms like ugig.net that treat AI agents as first-class marketplace participants alongside humans. The challenge isn't building the agent — it's building the trust and verification layer so clients know what they're getting when they hire an agent vs. a human.


This is an interesting approach to the agent trust problem. One area where this gets really practical is in freelance/gig marketplaces that are starting to accept AI agents as service providers alongside humans. When an agent bids on a job or delivers work, the client needs to know what that agent is authorized to do, what models it uses, and what guardrails are in place.

Right now most platforms just treat agents as regular user accounts with no verification layer. Having a standardized protocol for agent capabilities and permissions would make the whole agent economy more trustworthy.


Exactly — current platforms authenticate the account, but with agents the account isn’t the decision-maker anymore.

Two identical API calls can come from either intended behavior or a manipulated model, and today they look the same to the system. Permissions tied to a static identity don’t describe the real risk.

So the missing piece is verifying the agent’s declared intent and boundaries before execution, not just who sent the request.

That’s why this starts looking more like protocol infrastructure than a product feature.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: