> You're interacting with an LLM, so correctness is already out the window.
With all due respect if you are prompting correctly and following approaches such as TDD / extensive testing then correctness is not out the window. That is a misunderstanding likely caused by older versions of these models.
Correctness can be as complete as any other new code, I've used the AI to port algorithms from Python to Rust which I've then tested against math oracles and published examples. Not only can I check my code mathematically but in several instances I've found and fixed subtle bugs upstream. Even in well reviewed code that has been around for many years and is well used. It is simply a tool.
All the code I work on now has an MCP interface so that the LLM can debug more easily. I'd argue it is as important as the UI these days. The amount of time it has saved me is unreal. It might be worth investing a very small amount of your time in it to see if it is a good fit. Even a poor protocol can provide useful functionality.
- Do you work in a team context of 10+ engineers?
- Do you all use different agent harnesses?
- Do you need to support the same behavior in ephemeral runtimes (GH Agents in Actions)?
- Do you need to share common "canonical" docs across multiple repos?
- Is it your objective to ensure a higher baseline of quality and output across the eng org?
- Would your workload benefit from telemetry and visibility into tool activation?
If none of those apply, then it's not for you. Server hosted MCP over streamable HTTP benefits orgs and teams and has virtually no benefit for individuals.
What I want to know is what's the difference between a remote mcp and an api with an openapi.json endpoint for self-discovery? It's just as centralized
It's instructive to skim the top level of the MCP spec to get a sense. But you can also scroll to the end of the post and see the three .gifs there and see why MCP: because it also defines interaction models with the clients and exposes MCP prompts as `/` (slash) commands and MCP resources as `@` (at) references among other things.
You are right: MCP tools are in essence OpenAPI specs with some niceties like standardized progress reporting. But MCP is more than tools.
I've managed to ignore MCP servers for a long time as well, but recently I found myself creating one to help the LLM agents with my local language (Papiamentu) in the dialect I want.
I made a prolog program that knows the valid words and spelling along with sentence conposition rules.
Via the MCP server a translated text can be verified. If its not faultless the agent enters a feedback loop until it is.
The nice thing is that it's implemented once and I can use it in opencode and claude without having to explain how to run the prolog program, etc.
I can't go into specifics about exactly what I'm doing but I can speak generically:
I have been working on a system using a Fjall datastore in Rust. I haven't found any tools that directly integrate with Fjall so even getting insight into what data is there, being able to remove it etc is hard so I have used https://github.com/modelcontextprotocol/rust-sdk to create a thin CRUD MCP. The AI can use this to create fixtures, check if things are working how they should or debug things e.g. if a query is returning incorrect results and I tell the AI it can quickly check to see if it is a datastore issue or a query layer issue.
Another example is I have a simulator that lets me create test entities and exercise my system. The AI with an MCP server is very good at exercising the platform this way. It also lets me interact with it using plain english even when the API surface isn't directly designed for human use: "Create a scenario that lets us exercise the bug we think we have just fixed and prove it is fixed, create other scenarios you think might trigger other bugs or prove our fix is only partial"
One more example is I have an Overmind style task runner that reads a file, starts up every service in a microservice architecture, can restart them, can see their log output, can check if they can communicate with the other services etc. Not dissimilar to how the AI can use Docker but without Docker to get max performance both during compilation and usage.
Last example is using off the shelf MCP for VCS servers like Github or Gitlab. It can look at issues, update descriptions, comment, code review. This is very useful for your own projects but even more useful for other peoples: "Use the MCP tool to see if anyone else is encountering similar bugs to what we just encountered"
Its very similar to the switch from a text editor + command line, to having an IDE with a debugger.
the AI gets to do two things:
- expose hidden state
- do interactions with the app, and see before/after/errors
it gives more time where the LLM can verify its own work without you needing to step in. Its also a bit more integration test-y than unit.
if you were to add one mcp, make it Playwright or some similar browser automation mcp. Very little has value add over just being able to control a browser
That's also one of the things that worries me the most. What kind of data is being sent to these random endpoints? What if they to rogue or change their behavior?
mcp is generally a static set of tools, where auth is handled by deterministic code and not exposed to the agent.
the agent sees tools as allowed or not by the harness/your mcp config.
For the most part, the same company that you're connecting to is providing the mcp, so its not having your data go to random places, but you can also just write your own. its fairly thin wrappers of a bit of code to call the remote service, and a bit of documentation of when/what/why to do so
I've just been discovering this pattern too. It's made a huge difference. Trying to get Claude to remote control an app for testing via the various other means was miserable and unreliable.
I got it to build an MCP server into the app that supported sending commands to allow Claude to interact with it as if it was a user, including keypresses and grabbing screenshots, and the difference was immediate and really beneficial.
Visual issues were previously one of the things it would tend to struggle with.
I assume that this is dependent on app, and it's quite possible that your approach is best in some cases.
In my case I started with something somewhat like Playwright, and claude had a habit of interacting with the app more directly than a user would be able to and so not spotting problems because of it. Forcing it to interact by pressing keys rather than delving into the dom or executing random javascript helped. In particular I wanted to be able to chat with it as it tried things interactively. This is more to help with manual tests or exploratory testing rather than classic automated testing.
My current app is a desktop app, so playwright isn't as applicable.
The M4 Max lacks the UltraFusion interconnect, making an M4 Ultra impossible. We might however see an M5 Ultra due to the new Fusion Architecture in the M5 Pro and M5 Max chips (just announced for the latest MacBook Pro), which uses a high-bandwidth die-to-die interconnect to bond two dies into a single unified SoC—similar in concept to UltraFusion but evolved for better scaling, efficiency, and features like per-GPU-core Neural Accelerators.
Reports and leaks strongly indicate Apple is preparing an M5 Ultra (likely fusing or scaling from the M5 Max using this advanced interconnect tech) for a Mac Studio refresh later in 2026, based on Bloomberg/Mark Gurman and other sources. This would bring back the top-tier "Ultra" option after skipping it entirely for M4.
I suspect that the cost/benefit isn't there. Those who need the "biggest Ultra" will be happy with the previous generation or so, and so they'll refresh that on a 2 or 3 year cycle.
Given that generation gains are not sufficient to make a Max twice as fast as the previous-gen Ultra, a longer cycle is rational. The M3 Ultra is still the fastest M-series system.
I've seen the stocks app take up 2GB of RAM before. Even Control Centre can be a RAM hog. If Apple were still slinging efficient software 8GB is one thing but their catalyst based crapware is far from efficient.
> I've seen the stocks app take up 2GB of RAM before. Even Control Centre can be a RAM hog. If Apple were still slinging efficient software 8GB is one thing but their catalyst based crapware is far from efficient.
Guessing based on your comments about 8GB of RAM that you have a lot more RAM than that. You should be aware that when you have a lot of unused RAM, many programs will cache data in RAM, and the OS won't really "clean up" paged memory, since there's very little memory pressure. In modern OS architecture, "free RAM is wasted RAM."
If you have 32GB of RAM for example, macOS will allow processes to keep decorative assets, pre-fetched data, and UI buffers in memory indefinitely because there’s no reason to flush them. This makes the system feel snappier. The metric that actually matters isn't "Used RAM," but Memory Pressure. A system can have 0GB of "Free" memory but still be performing perfectly because the OS is ready to reallocate that cached data the millisecond another app needs it.
Judging efficiency based on usage in a low-pressure environment is like complaining that a gas tank is "inefficient" just because it’s full.
That's good info thanks and you are right I didn't take that into account. I do however think that 8GB may be basically usable now but I'd like to see students able to use these machines for a decent length of time and to be able to become digital creators using them. I get it won't edit video or do 3d modelling the same way a Macbook Pro can but it needs to do enough to get students interested.
It was just an example of a simple app built by Apple themselves being a RAM hog. 375MB just for control centre on fresh open (15.7) but like I said I have seen it higher recently on multiple occasions. That's before we talk about a lot of their seemingly endless and inefficient background tasks. mds_stores anyone?
Hopefully the presence of a laptop like this will be beneficial to software quality. They should make their developers use it one day a week.
Is there any chance of a stable release that fixes the memory leak issue? I know I could run nightly but for something I spend all day every day using I'd much rather run a stable version.
StackOverflow was also full of knowledgeable but objectionable people. I'm very glad not to have that energy in my life any more. Those that hate LLMs are welcome to continue using StackOverflow but I shan't be.
> Fun that you had to caveat it with some hand wavy homework bull.
Not really. If AI is just copying someone else's code, it's not really designing it is it. If you want it to truly design something, it needs to be designing it using the same constraints that the human engineers would face, which means it doesn't get the luxury of copying from others, it has to design things like device drivers with the same level of information that human engineers get (e.g. device specifications and information gathered through trial and error).
Are you suggesting that a human being writes an OS in a vacuum without seeing any other OS or looking into how it is built. That feels a little facetious, no?
> Are you suggesting that a human being writes an OS in a vacuum without seeing any other OS or looking into how it is built. That feels a little facetious, no?
No, I'm suggesting in order for it to be a fair test, you need to impose the same restrictions that a human engineer would face.
For example, consider the work done by the Nouveau team in building a set of open source GPU drivers for NVIDIA GPUs. When they started out the specs were not so widely available. They could look at how GPU drivers were developed for other GPUs, but that is not going to be a substitute for exploratory work. Let's see how well AI does at that exploratory work. I think you'll find it's a lot harder than common uses for AI today.
reply