I consider the $10/mo to be an incredible value ... but only because of the unlimited 4.1 usage that can be provided to other compatible extensions (Roo Code, Cline support it) with the VS Code LM API.
Unlike some other workarounds this is a fully supported workflow and does not break Copilot terms of service with reasonable personal usage. (As far as I understand at least. Copilot has full visibility into which tools are using it to make chat requests so it isn't disguising or impersonating Copilot itself. When first setting it up there's a native VS Code approval prompt to allow tool access to Copilot and the LM API is publicly documented).
But anything unlimited in the LLM space feels like it's on borrowed time, especially with 3rd party tool support, so I wouldn't be surprised if they impose stricter quotas for the LM API in the future or remove the unlimited limit entirely).
It's my workhorse model with Roo Code given the cost - or lack thereof. I was about to cancel Copilot after they massively cut the premium limits until they swapped out 4o with 4.1 for the base model. 4.1 is just decent enough for simple, uncreative tasks and is pretty reliable as far as tool use (especially compared to 4o) so I have had a lot of success with it.
For any problem with a lot of reasoning or problem solving I use "architect" mode first with Gemini 2.5 Pro or Claude Sonnet 3.7/4 to break it into discrete subtasks that 4.1 can follow pretty successfully. This approach is very cost effective as Gemini can do a lot of high level planning quickly and cheaply.
I'm sure a lot of the experience depends on how 4.1 is being used, I've fine tuned my custom Roo code configuration to work around its limits without a lot of sacrifices, I'm sure using it out of the box with Copilot is asking a lot more from a weaker model on its own.
Sounds like you’ve figured out a good workflow for yourself. When you switch back and forth between models like that, do they know about all the previous interactions and context? (They must right?)
Yes, either through passing the full context of the current task/conversation to a new model to continue working from that point on, or through the intermediary step of the plan document generated by Gemini or whichever larger model that is then passed back to 4.1 to implement.
The latter is a commonly recommended strategy in general for any large task even with more powerful models to keep context manageable and allow recovering easily if on step 9/10 the LLM loses it and starts mangling all the previous work it did. That way you don't have to start all over from the last good checkpoint or commit.
Unlike some other workarounds this is a fully supported workflow and does not break Copilot terms of service with reasonable personal usage. (As far as I understand at least. Copilot has full visibility into which tools are using it to make chat requests so it isn't disguising or impersonating Copilot itself. When first setting it up there's a native VS Code approval prompt to allow tool access to Copilot and the LM API is publicly documented).
But anything unlimited in the LLM space feels like it's on borrowed time, especially with 3rd party tool support, so I wouldn't be surprised if they impose stricter quotas for the LM API in the future or remove the unlimited limit entirely).