More

cjonas · 2026-06-09T04:34:30 1780979670

These use cases will just be built as "open source" (openclawd) or even custom one off application in the future. I've been building apps to run the tedious parts of my life recently. Meal planning, personal finance, bills, tax organization... Why would I pay for services that will be enshiftified when I can build a app that does exactly what I want in an afternoon. Yes the code is shit and it wouldn't scale... But it doesn't need to

whywhywhywhy · 2026-06-09T09:06:39 1780995999

> Why would I pay for services that will be enshiftified when I can build a app that does exactly what I want in an afternoon

Because the problem now took a whole afternoon to solved and sapped your creative energy instead.

losteric · 2026-06-09T04:43:02 1780980182

> Why would I pay for services that will be enshiftified when I can build an app that does exactly what I want in an afternoon.

When we talk about “the market”, the customer base, remember it’s a market that typically doesn’t know how to or care to even install an adblocker.

cjonas · 2026-06-09T04:55:33 1780980933

I don't see any mention of "the market" anywhere in this thread. I'm just talking about the ability for a motivated user to solve real problems with these tools. Right now these solutions are available to software developers but over time it will become approachable to more users

cjonas · 2026-06-04T02:34:58 1780540498

> The system, which began operating in 2016, was designed to run for at least 25 years

It's likely that a majority of the cost to collect the data has already been paid for...

ufocia · 2026-06-04T02:39:25 1780540765

Evidence?

Rebelgecko · 2026-06-04T03:02:23 1780542143

The system cost like $350 million to build, and $40m/year to operate+maintain. Sending ships to remove 900+ pieces of hardware under 2 miles of ocean won't be cheap either.

cjonas · 2026-06-03T15:15:48 1780499748

I made the switch from premier to resolve a few years ago and it feels like such a breath of fresh air. Being able to do the same with Lightroom would be amazing so can't wait to check this out. I've been using the free version and honestly never needed the pro features but I think I'll make the one time purchase today just to support a non-subscription based product of this caliber

cjonas · 2026-05-27T16:46:12 1779900372

yea except one is a "dark pattern" to exploit customers for corporate profit while the other is to benefit society.

cjonas · 2026-05-19T12:55:41 1779195341

The problem is really more getting the agent to reliable relay a UUID. For example, we were creating files for visualizations and having the agent reference them in there response with a custom <visualization file=UUID /> and found that it would often fail to accurately return a UUID from a tool response it was previously provided (running sonnet 4.6).

For this use case, our solution was just to use a slug for the filename, but we can control the uniqueness constraint on our backend.

mrweasel · 2026-05-19T13:03:53 1779195833

Except that we don't yet know what would need in all cases, this seems like something that should be provided by the environment.

It feels much like the random number generators in your operating system. The OS is responsible for providing applications with a source of entropy. In the same line of thinking maybe IDEs, agent frameworks, whatever you want to call it, should be responsible for providing some base functionality.

cjonas · 2026-05-19T14:27:47 1779200867

Not sure I understand. If you generate a random string to use as a reference for something that the LLM interacts with... and the LLM cannot reliably recall the reference, then it's a problem that needs to be solved by simplify the random string.

mrweasel · 2026-05-19T19:01:01 1779217261

This might be my understanding that's wrong, but I assumed that the LLM itself actually can't produce like a UUID, but it can "predict" one, hence why it sometimes hallucinate IDs. So my thinking was, strip that bit out of the AI prompt and output and leave it to the "wrapper" e.g. Claude Code or your IDE to insert the actual ID.

So in the same way that your crypto library don't have its own randomness generator (ideally) and rely on the operating system to provide an API, the agents would rely on their "operating shell" / IDE / application to provide functionality that lies outside the score of an LLM.

cjonas · 2026-05-16T15:50:19 1778946619

Coding agents don't really need memory. Agent skills, rules, git history, documentation is all far more efficient, transparent and easier to manage. These memory frameworks only really makes sense if you are building a consumer facing agent with managed context and limited capabilities.

wren6991 · 2026-05-16T16:10:32 1778947832

There's an antipattern where everyone wants to invent new interfaces to connect things LLMs when CLI tools are already right there, transparent, and usable by humans as well as LLMs. I think it's partly the origins in web chat applications.

Beads kind of does "LLM memory over CLI", or there is https://github.com/wedow/ticket which is a minimal and sane implementation of the same idea.

cjonas · 2026-05-02T23:54:55 1777766095

Claude code not supporting specifying an alternate location to look for agent skills is another example.

cjonas · 2026-04-23T16:50:13 1776963013

It would be wild if they dropped below the "two 9's" metric. I think they would need an additional ~16hr of outage in the 90 day rolling period.

waiwai933 · 2026-04-23T16:51:35 1776963095

https://mrshu.github.io/github-statuses/ suggests that their combined uptime doesn't even meet 1 nine, let alone 2.

CamouflagedKiwi · 2026-04-23T17:44:16 1776966256

The intersection of uptime across every possible service they offer isn't a particularly great metric. I get the point that they are doing badly, but it makes it look worse than I think it really is.

What I would like to see is a combined uptime for "code services", basically Git+Webhooks+API+Issues+PRs, which corresponds to a set of user workflows that really should be their bread & butter, without highlighting things you might not care about (Codespaces, Copilot).

dijit · 2026-04-23T18:59:15 1776970755

Depends how integrated those features are.

A service's availability is capped by its critical dependencies; this is textbook SRE stuff (see Treynor et al., The Calculus of Service Availability). Copilot may well be on the side of it (and has the worst uptime, dragging everything down), but if Actions depends on Packages then Actions can be "up" while in reality the service is not functional. If your release pipeline depends on Webhooks, then you're unable to release.

The obvious one is git operations: if you don't have git ops then basically everything is down.

So; you're right about Copilot, but the subset you proposed (Git+Webhooks+API+Issues+PRs) has the exact same intersection problem. If git is at one nine, that entire subset is capped at one nine too, no matter how green the rest of it looks.

And to be clear: git operations is sitting at 98.98% on the reconstructed dashboard linked above[1]. That is one nine. Github stopped publishing aggregate numbers on their own status page, which.. tells you something.

[1]: https://mrshu.github.io/github-statuses/

CamouflagedKiwi · 2026-04-23T20:12:21 1776975141

Well yes you could do that on a status page, but it's basically just lying to put Actions as green if it's actually down because it depends on Packages which is red.

With that set, I wasn't proposing a set of totally independent services to be grouped together, I was talking about a set of things that I think represent pretty core services for Github users. If Git is dragging the rest of those down, fine; PRs are useless without it. In fact it is worse than some but it's not the worst of that group, and it is still a lot better then the dregs of Actions and Copilot.

Having said that, the numbers are of course terrible, two nines on a couple of things and one on everything else would be bad for a startup, it's an utter embarrassment for a company that's been doing this over a decade.

cjonas · 2026-04-23T17:07:42 1776964062

also I never had considered that breaking your up-time into a bunch of different components is just a strategy to make your SRE look better than it actually is. The combined up-time tells the real story (88%!). Thanks for the link

femiagbabiaka · 2026-04-23T17:18:05 1776964685

The number of nines assigned to a suite of services is not indicative of the quality of SRE at any given company, but rather a reflection of the tradeoffs a business has decided to make. Guaranteed there's a dashboard somewhere at Github looking at platform stickiness vs. reliability and deciding how hard to let teams push on various initiatives.

cjonas · 2026-04-24T02:51:58 1776999118

this is fair. I should have just said "Site Reliability", as it's almost certainly out of the engineers control.

cjonas · 2026-04-23T16:57:13 1776963433

ya i was just doing the math on their chart for the git operations. I added up 14.93 hours combined hours, which puts them WAY lower than the reported 99.7 metric they show right next to it.

So based on their own reporting, the uptime number should be 99.31. Which means only like 6 additional hours and they'd fall below 99.0%

roblh · 2026-04-23T21:58:36 1776981516

GitHub is going for “eight 8’s” at this rate.

cjonas · 2026-04-19T12:25:07 1776601507

Whats your actual tech experience?

Most enterprises that need consultants are using Salesforce, SAP, Hubspot, Dynamics, etc. If a company has an engineering department to build and run internal software, they very rarely need a consultant. And if they don't, they are very unlikely to higher a consultant to build it custom. They'd want "out of the box" because they think (often incorrectly these days), it will be easier to maintain.

cjonas · 2026-04-12T14:14:48 1776003288

Ya I've had this experience more than a few times recently. I've heard people claiming they are serving quantized models during high loads, but it happens in cursor as well so I don't think it's specific to Anthropics subscription. It could be that the context window has just gotten into a state that confuses the model... But that wouldn't explain why it appears to be temporary...

My best guess is this is the result of the companies running "experiments" to test changes. Or it's just all in my head :)

whywhywhywhy · 2026-04-12T14:20:30 1776003630

Cursor one is back to Claude 4 or 3.5+ at best. Struggles to do things it did effortlessly a few weeks ago.

It’s not under load either it’s just fully downgraded. Feels more they’re dialing in what they can get away with but are pushing it very far.

cjonas · 2026-04-12T23:00:56 1776034856

These days cursor feel more capable and reliable then Claude Code (at last for my workflow). For personal projects, I'm using cursor during planning and verification but run Claude code for just implementation to save $.