Not sure why you think Anthropic has not the same problems? Their version numbers across different model lines jump around too... for Opus we have 4.6, 4.5, 4.1 then we have Sonnet at 4.6, 4.5, and 4.1? No version 4.1 here, and there is Haiku, no 4.6, but 4.5 and no 4.1, no 4 but then we only have old 3.5...
Also their pricing based on 5m/1h cache hits, cash read hits, additional charges for US inference (but only for Opus 4.6 I guess) and optional features such as more context and faster speed for some random multiplier is also complex and actually quiet similar to OpenAI's pricing scheme.
To me it looks like everybody has similar problems and solutions for the same kinds of problems and they just try their best to offer different products and services to their customers.
It's much more consistent. Only 3 lines, numbered 4.6, 4.6, and 4.5, and it's clear they're tiers and not alternate product lines. It wasn't until recently that GPT seems to have any kind of naming convention at all and it's not intuitive if every version number is a whole different class of tool.
The pricing is more complex but also easy, Opus > Sonnet > Haiku no matter how you tweak those variables.
I love this! I use coding agents to generate web-based slide decks where “master slides” are just components, and we already have rules + assets to enforce corporate identity. With content + prompts, it’s straightforward to generate a clean, predefined presentation.
What I’d really want on top is an “improv mode”: during the talk, I can branch off based on audience questions or small wording changes, and the system proposes (say) 3 candidate next slides in real time. I pick one, present it, then smoothly merge back into the main deck.
Example: if I mention a recent news article / study / paper, it automatically generates a slide that includes a screenshot + a QR code link to the source, then routes me back to the original storyline.
With realtime voice + realtime code generation, this could turn the boring old presenter view into something genuinely useful.
There was a pre-LLM version of this called "battledecks" or "PowerPoint Karaoke"[0] where a presenter is given a deck of slides they've never seen and have to present on it. With a group of good public speakers it can be loads of fun (and really impressive the degree that some people can pull it off!)
There is a Jackbox game called "Talking Points" that's like this: the players come up with random ideas for presentations, your "assistant" (one of the other players) picks what's on each slide while you present: https://www.youtube.com/watch?v=gKnprQpQONw
Caro’s first LBJ biography tells of how the future president became a congressman in Texas in his 20s, by carting around a “claque” of his friends to various stump speeches and having them ask him softball questions and applauding loudly after
I guess you could have two people per presentation, one person who confirms whether to slide in the generated slide or maybe regenerate. And then of course, eventually that's just an agent
You're describing almost verbatim what we're building at Octigen [1]! Happy to provide a demo and/or give you free access to our alpha version already online.
I built something similar at a hackathon, a dynamic teleprompter that adjusts the speed of tele-prompting based on speaker tonality and spoken wpm. I can see extending the same to an improv mode. This is a super cool idea.
The end result would be a normal PPT presentation, check https://sli.dev as an easy start, ask Codex/Claude/... to generate the slides using that framework with data from something.md.
The interesting part here is generating these otherwise boring slide decks not with PowerPoint itself but with AI coding agents and a master slides, AGENTS.md context.
I’ll be showing this to a small group (normally members only) at IPAI in Heilbronn, Germany on 03/03. If you’re in the area and would like to join, feel free to send me a message I will squeeze you in.
In my AGENTS.md file i have a _rule_ that tells the model to use Apache ECharts, the data comes from the prompt and normally .csv/.json files.
Prompt would be like: "After slide 3 add a new content slide that shows a bar chart with data from @data/somefile.csv" ... works great and these charts can be even interactive.
You could try something like mermaid (or ASCII) -> nano banana. You can also go the other way and turn images into embedded diagrams (which can be interactive depending on how you're sharing the presentation)
Not my normal use-case, but you can always fall back and ask the AI coding agent to generate the diagram as SVG, for blocky but more complex content like your examples it will work well and still is 100% text based, so the AI coding agents or you manually can fix/adjust any issues.
An image generation skill is a valid fallback, but in my opinion it's hard to change details (json style image creation prompts are possible but hard to do right) and you won't see changes nicely in the git history.
In your use case you can ask the AI coding agent to run a script.js to get the newest dates for the project from a page/API, then it should only update the dates in the roadmap.svg file on slide x with the new data.
This way you will automagically have the newest numbers and can track everything within git in one prompt. Save this as a rule in AGENTS.md and run this every month to update your slides with one prompt.
Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.
> Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I offered to do it, but without a deadline (I work f/time for money), only a cost estimation based on how many hours I think it should take me: https://news.ycombinator.com/item?id=46909310
The poster I responded to had claimed that it was not possible to produce a compiler capable of compiling a bootable Linux kernel within the $20k cost, nor for double that ($40k).
I offered to do it for $40k, but no takers. I initially offered to do it for $20k, but the poster kept evading, so I settled on asking for the amount he offered.
This will actually work well with my current workflow: dictation for prompts, parallel execution, and working on multiple bigger and smaller projects so waiting times while Codex is coding are fully utilized, plus easy commits with auto commit messages. Wow, thank you for this. Since skills are now first class tools, I will give it a try and see what I can accomplish with them.
I know/hope some OpenAI people are lurking in the comments and perhaps they will implement this, or at least consider it, but I would love to be able to use @ to add files via voice input as if I had typed it. So when I say "change the thingy at route slash to slash somewhere slash page dot tsx", I will get the same prompt as if I had typed it on my keyboard, including the file pill UI element shown in the input box. Same for slash commands. Voice is a great input modality, please make it a first class input. You are 90% there, this way I don't need my dictation app (Handy, highly recommended) anymore.
Also, I see myself using the built in console often to ls, cat, and rg to still follow old patterns, and I would love to pin the console to a specific side of the screen instead of having it at the bottom and pls support terminal tabs or I need to learn tmux.
So much this. I'm eagerly waiting to see what anthropic and OpenAI do to make dictation-first interaction a first class citizen instead of requiring me to use a separate app like Super Whisper. It would dramatically improve complex, flow-breaking interactions when adding files, referencing users or commands, etc.
Importantly I want full voice control over the app and interactions not just dictating prompts.
Also their pricing based on 5m/1h cache hits, cash read hits, additional charges for US inference (but only for Opus 4.6 I guess) and optional features such as more context and faster speed for some random multiplier is also complex and actually quiet similar to OpenAI's pricing scheme.
To me it looks like everybody has similar problems and solutions for the same kinds of problems and they just try their best to offer different products and services to their customers.
reply