For me it is something I can describe in a single casual prompt.
For example I wrote a fully working version of https://tools.nicklothian.com/llm_comparator.html in a single prompt. I refined it and added features with more prompts, but it worked from the start.
Good question. No strict line, and it's always going to be subjective and a little bit silly to categorize, but when I'm debating this argument I'm thinking: a product that does not exist today (obviously many parts of even a novel product will be completely derivative, and that's fine), with multiple views, controllers, and models, and a non-trivial amount of domain-specific business logic. Likely 50k+ lines of code, but obviously that's very hand-wavy and not how I'd differentiate.
Think: SaaS application that solves some domain specific problem in corporate accounting, versus "in-browser speadsheet", or "first-person shooter video game with AI, multi-player support, editable levels, networking and high-resolution 3D graphics" vs "flappy bird clone".
When you're working on a product of this size, you're probably solving problems like the ones cited by simonw multiple times a week, if not daily.
But re-reading your statement you seem to be claiming that there are no 50k SAAS apps that are build even using multi-shot techniques (ie, building a feature at a time).
- It's 45K of python code
- It isn't a duplicate of another program (indeed, the reason it isn't finished is because it is stuck between ISO Prolog and SWI Prolog and I need to think about how to resolve this, but I don't know enough Prolog!)
- Not a *single* line of code is hand written.
Ironically this doesn't really prove that the current frontier models are better because large amounts of code were written with non-frontier models (You can sort of get an idea of what models were used with the labels on https://github.com/nlothian/Vibe-Prolog/pulls?q=is%3Apr+is%3...)
But - importantly - this project is what convinced me that the frontier models are much better than the previous generation. There were numerous times I tried the same thing in a non-Frontier model which couldn't do it, and then I'd try it in Claude, Codex or Gemini and it would succeed.
For me it is something I can describe in a single casual prompt.
For example I wrote a fully working version of https://tools.nicklothian.com/llm_comparator.html in a single prompt. I refined it and added features with more prompts, but it worked from the start.