> My current expectation is that the Cowork/Codex set of "professional agents" f...

skydhash · 2026-04-16T18:03:49 1776362629

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

Most people are indifferent to computers. A computer to them is similar to the water pipeline or the electrical grid. It’s what makes some other stuff they want possible. And the interface they want to interact with should be as simple as possible and quite direct.

That is pretty much the 101 of UX. No deep interactions (a long list of steps), no DSL (even if visual), and no updates to the interfaces. That’s why people like their phone more than their desktops. Because the constraints have made the UX simpler, while current OS are trying to complicate things.

So Cowork/Codex would probably go where Siri is right now. Because they are not a simpler and consistent interface. They’ve only hidden all the controls behind one single point of entry. But the complexity still exists.

noelsusman · 2026-04-16T18:54:56 1776365696

Just yesterday my non-technical spouse had to solve a moderately complex scheduling problem at work. She gave the various criteria and constraints to Claude and had a full solution within a few minutes, saving hours of work. It ended up requiring a few hundred lines of Python to implement a scheduling optimization algorithm. She only vaguely knows what Python is, but that didn't matter. She got what she needed.

For now she was only able to do that because I set up a modified version of my agentic coding setup on her computer and told her to give it a shot for more complex tasks. It won't be trivial, but I do think there's a big opportunity for whoever can translate the experience we're having with agentic coding to a non-technical audience.

paganel · 2026-04-16T20:02:19 1776369739

There's no such big opportunity, as the number of programmers' spouses is quite limited. Again, and as the GP rightly suggested, some of the HN-ers here need to go and touch some normie grass, so to speak.

More to the point, nobody wants to be more efficient for the sake of being efficient, we all want to go to work, do our metaphorical 9 to 5 without consuming too much (intellectual and not only) energy, and then back home. In that regard AI is seen as an existential threat to that "lifestyle" and it will be treated as such by regular workers.

w2df · 2026-04-16T21:15:32 1776374132

correct. you cant trust this place for realistic takes - I had a post re. financial stuff downvoted when a former Investment Banker chimed in to back me up.

Comical. Truly comical.

troupo · 2026-04-16T20:36:52 1776371812

> Just yesterday my non-technical spouse

> It ended up requiring a few hundred lines of Python

And she knows those a hundred lines of python work correctly and give her correct result because in this instance Claude managed to produce a working result. What if it didn't? Would vague knowledge of Python have helped her?

> It won't be trivial, but I do think there's a big opportunity for whoever can translate the experience we're having with agentic coding to a non-technical audience.

Even though I agree with the sentiment, we've tried non-coding coding how many times now? Once every 5 years? Throwing LLMs into the mix won't help much when in the end you leave the end user hanging, debugging problems and hunting for solutions.

zozbot234 · 2026-04-16T20:39:23 1776371963

Scheduling solutions are easy to verify. For other problems, verification would be harder.

cjbarber · 2026-04-16T17:42:16 1776361336

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

What are you using today? In my experience LLMs are already pretty good at this.

> Please for the love of god actually go outside and talk to people outside of the tech bubble.

In the past week I've taught a few non-technical friends, who are well outside the tech bubble, don't live in the SF Bay Area, etc, how to use Cowork. I did this for fun and for curiosity. One takeaway is that people at startups working on these products would benefit from spending more time sitting with and onboarding users - they're very powerful and helpful once people get up and running, but people struggle to get up and running.

> People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

I obviously agree with this, I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks. I agree that users don't want something that changes all the time. But they do want something that fits them and fits their task. Artifacts on Claude and Canvas on ChatGPT are early versions of this.

troupo · 2026-04-16T17:46:58 1776361618

> What are you using today? In my experience LLMs are already pretty good at this.

LLMS are good at "find me a two week vacation two months from now"?

Or at "do my taxes"?

> how to use Cowork.

Yes, and I taught my mom how to use Apple Books, and have to re-teach her every time Apple breaks the interface.

Ask your non-tech friends what they do with and how they feel about Cowork in a few weeks.

> I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks.

How many users you see personalizing anything to their task? Why would they want every app to be personalized? There's insane value in consistency across apps and interfaces. How will apps personalize their UIs to every user? By collecting even more copious amounts of user data?

roel_v · 2026-04-17T08:23:39 1776414219

"LLMS are good at "find me a two week vacation two months from now"?"

Of course they are. I gave one a similar prompt a few weeks ago, albeit quite a bit more verbose (actually I just dictated it, train of thought, with couple of 'eh actually, forget what I just said about x, do y instead") and although I wasn't brave enough to give it my credit card and finalize the bookings, it would have paid for the bookings I had it set up for me, had I done that. I gave it some RL constraints, like "we're meeting friends in place xyz at such and such date, make sure we're there then" and it did everything from watching we wouldn't be spending too many hours driving per day to check that hotels are kid friendly to things to do and see and what public holidays there are so that we know when supermarkets close early and a bunch of details I wouldn't have thought of. It checked my (and my wife's) calendar, checked what I had going on work wise, etc.

That is a fully solved 'problem' man. LLMs will run the whole thing for you. Just provide it with the login details to booking websites and you're off to the races.

I did have it upgrade the car, even if that pushed the cost outside the budget I gave it. Next time it'll know LOL.

suddenlybananas · 2026-04-17T08:54:58 1776416098

>although I wasn't brave enough to give it my credit card and finalize the bookings

So it's not trustworthy enough for you, someone clearly interested in the hype of LLMs.

roel_v · 2026-04-17T10:19:46 1776421186

It's a matter of getting used to things. We're only a few weeks further, I maybe would have given it now. It'd need some way to keep it private I guess, maybe I could have used a one off CC number. Those are just technicalities at this point. It got me to the point where I just had to enter my details and click a few confirm buttons. Those are solved problems. I'm not sure why the denialists here are saying those things are 'impossible'. I mean I've seen them happen, what do you want me to say? Claiming this is 'just hype' is ostrich behavior. I've been playing with an abliterated Gemma 4 yesterday on my local machine. Yes it would take longer and require a bunch of harness fiddling, but even if OpenAI and Anthropic would collapse tomorrow, I'm confident I could still do the exact same thing the day after with with what I have right now on my hard disk. I'm not sure what you want me to tell you mate. Yes there's rough edges to work out or just in general workflows to improve but the ideas are way beyond 'proof of concept'. There's people like myself using these things for purposes that 6 months ago were science fiction. I don't care if you believe me or not, I'm just some dude on the internet, but level of delusion on how 'inferior' these models (with proper harnessing) are is mind boggling for someone like me who sees it happen literally 20 centimeters to the side on my screen from where I see people claim that those things are impossible.

baq · 2026-04-16T17:55:04 1776362104

> Or at "do my taxes"?

codex did my taxes this year (well it actually implemented a normalization pipeline and a tax computing engine which then did the taxes, but close enough)

William_BB · 2026-04-16T18:24:03 1776363843

> well it actually implemented a normalization pipeline and a tax computing engine which then did the taxes, but close enough

You can't seriously believe laymen will try to implement their own tax calculators.

baq · 2026-04-16T18:48:12 1776365292

of course not.

what I believe is that laymen will put all their tax docs into codex and tell it to 'do their taxes' and the tool will decide to implement the calculator, do the taxes and present only the final numbers. the layman won't even know there was a calculator implemented.

TeMPOraL · 2026-04-17T09:59:25 1776419965

> the layman won't even know there was a calculator implemented.

That's on company making the agentic harness. Hiding details of what computer does from the user is the original sin of this industry, and subsequent generations of developers and software companies keeps doubling down on it.

(Case in point - I just downloaded the Codex app for Windows, and in the options I see it has two UI modes of operating, one of which is meant for "non coding" and apparently this means hiding the details of what the agent is doing. This is precisely where the layman is betrayed by the tool.)

William_BB · 2026-04-16T18:53:13 1776365593

Yeah, good luck trusting the output!

baq · 2026-04-16T18:59:08 1776365948

check back in a couple of years!

William_BB · 2026-04-16T19:13:36 1776366816

Ah right! Reminds me of AGI by 2025 :D

tsimionescu · 2026-04-16T18:39:42 1776364782

If your prompt was more complex than "do my taxes", then this is irrelevant.

baq · 2026-04-16T18:51:28 1776365488

it was many hours of working with codex, guidance and comparing to known-good outputs from previous years, but a sufficiently smart model would be able to just do it without any steering; it'd still take hours, but my input wouldn't be necessary. a harness for getting this done probably exists today, gastown perhaps or something that the frontier labs are sitting on.

procaryote · 2026-04-17T07:07:20 1776409640

If you can assume "a sufficiently smart piece of technology" that doesn't exist now, a lot of problems become trivial

baq · 2026-04-17T11:50:36 1776426636

yes.

but then, respect the trendline, especially if it's exponential.

bavell · 2026-04-17T12:34:13 1776429253

Is it exponential or logistic?

troupo · 2026-04-16T19:01:48 1776366108

> but a sufficiently smart model would be able to just do it without any steering;

Yeah, yeah, we've heard "our models will be doing everything" for close to three years now.

> a harness for getting this done probably exists today, gastown perhaps

That got a chuckle and a facepalm out of me. I would at least consider you half-serious if you said "openclaw", at least those people pretend to be attempting to automate their lives through LLMs (with zero tangible results, and with zero results available to non-tech people).

ravenstine · 2026-04-16T19:30:46 1776367846

Sounds fascinating! If you wrote an article on this I bet it'd have a good shot at making it to the home page of HN.

jeffgreco · 2026-04-16T19:33:37 1776368017

> LLMS are good at "find me a two week vacation two months from now"?

Yes?

===

edit: Just tested it with that exact prompt on Claude. It asked me who I was traveling with, what type of trip and budget (with multiple choice buttons) and gave me a detailed itinerary with links to buy the flights ( https://www.kayak.com/flights/ORD-LIS/2026-06-13/OPO-ORD/202... )

troupo · 2026-04-16T21:32:05 1776375125

I'd love to try and replicate, but I'm not letting any of these tools anywhere near a real browser and capabilites :)

mazurnification · 2026-04-17T07:31:28 1776411088

Perfect - and this use case will be enshitificated first. LLM provider will charge small fee for proper recommendation placing. Got to recoup investment.

a1j9o94 · 2026-04-16T20:38:51 1776371931

This is effectively how I treat my AI agents. A lot of the reason this doesn't work well for people today is due to context/memory/harness management that makes it too complex for someone to set up if they don't want a full time second job or just like to tinker.

If you productize that it will be an experience a lot of people like.

And on the UI piece, I think most people will just interact through text and voice interfaces. Wherever they already spend time like sms, what's app, etc.