Hacker Newsnew | past | comments | ask | show | jobs | submit | beklein's commentslogin

Not sure why you think Anthropic has not the same problems? Their version numbers across different model lines jump around too... for Opus we have 4.6, 4.5, 4.1 then we have Sonnet at 4.6, 4.5, and 4.1? No version 4.1 here, and there is Haiku, no 4.6, but 4.5 and no 4.1, no 4 but then we only have old 3.5...

Also their pricing based on 5m/1h cache hits, cash read hits, additional charges for US inference (but only for Opus 4.6 I guess) and optional features such as more context and faster speed for some random multiplier is also complex and actually quiet similar to OpenAI's pricing scheme.

To me it looks like everybody has similar problems and solutions for the same kinds of problems and they just try their best to offer different products and services to their customers.


With Anthropic you always have 3 models to choose from: Opus-latest, Sonnet-latest, and Haiku-latest, from the best/slowest to the worst/fastest.

The version numbers are mostly irrelevant as afaik price per token doesn't change between versions.


Three random names isn't ideal. I'm often need to double check which is which. This is why we use numbers

They aren't random. Opus's are very long poems, haikus are very short ones (3 lines), sonnets are in between (~14 lines)

What's next? Claude Iliad?

How are the names random?

https://en.wikipedia.org/wiki/Masterpiece

https://en.wikipedia.org/wiki/Sonnet

https://en.wikipedia.org/wiki/Haiku

They dropped the magnum from opus but you could still easily deduce the order of the models just from their names if you know the words.


It's much more consistent. Only 3 lines, numbered 4.6, 4.6, and 4.5, and it's clear they're tiers and not alternate product lines. It wasn't until recently that GPT seems to have any kind of naming convention at all and it's not intuitive if every version number is a whole different class of tool.

The pricing is more complex but also easy, Opus > Sonnet > Haiku no matter how you tweak those variables.


Perhaps useful, I discovered: https://github.com/agent-infra/sandbox

> All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.


Some more info here: https://developers.openai.com/api/docs/models/gpt-realtime-1...

- $4 input, $0.4 cached input, $16 output

- 32,000 context window

- 4,096 max output tokens

- Sep 30, 2024 knowledge cutoff

Love the models, speed, and capabilities. Just sad that they are not getting the publicity and adoption right now, but hopefully in the future.


Sound on!

Song name is: Windowdipper from ꪖꪶꪶ ꪮꪀ ꪗꪖꪶꪶ by Jib Kidder

https://jibkidder.bandcamp.com/track/windowdipper


what's the symbols for that `ꪖꪶꪶ ꪮꪀ ꪗꪖꪶꪶ ` font? where can I find these



thank you!


also, seems like they have another project https://feel.thatsh.it/ and I'd love to find that song as well if you can help haha


That song is also pretty nice. I wonder if that was an earlier project?



Do opus 4.6 or gemini deep think really use test time adaptation ? How does it work in practice?


I love this! I use coding agents to generate web-based slide decks where “master slides” are just components, and we already have rules + assets to enforce corporate identity. With content + prompts, it’s straightforward to generate a clean, predefined presentation. What I’d really want on top is an “improv mode”: during the talk, I can branch off based on audience questions or small wording changes, and the system proposes (say) 3 candidate next slides in real time. I pick one, present it, then smoothly merge back into the main deck. Example: if I mention a recent news article / study / paper, it automatically generates a slide that includes a screenshot + a QR code link to the source, then routes me back to the original storyline. With realtime voice + realtime code generation, this could turn the boring old presenter view into something genuinely useful.


I love the probabilistic nature of this. Presentations could be anywhere from extremely impressive to hilariously embarrassing.


It would be so cool if it generated live in the presentation and adjusted live as you spoke, so you’d have to react to whatever popped on screen!


There was a pre-LLM version of this called "battledecks" or "PowerPoint Karaoke"[0] where a presenter is given a deck of slides they've never seen and have to present on it. With a group of good public speakers it can be loads of fun (and really impressive the degree that some people can pull it off!)

0. https://en.wikipedia.org/wiki/PowerPoint_karaoke


There is a Jackbox game called "Talking Points" that's like this: the players come up with random ideas for presentations, your "assistant" (one of the other players) picks what's on each slide while you present: https://www.youtube.com/watch?v=gKnprQpQONw


That is very cool. Thanks for posting this - I think I’m going to put on a PowerPoint karaoke night. This will rule! :)


If you like this, search on YouTube for "Harry Mack". Mindblowing


Some consulting firms do this, one guy is giving the presentation live while others are in the next meeting room still banging out the slides.


That would make a great Wii game.

PowerPoint Hero.


Every presentation becomes improv


I had a butterfly take over my live DreamScape slide show demo at the 1995 WWDC.

https://youtu.be/5NytloOy7WM?t=321


Isn't that such a great outcome. No more robotic presentations. The best part is that you can now practice Improv at the comfort of your home.


And this product will work great for any industry... can I get a suggestion for an industry from the crowd?

Audience: Transportation... Education... Insurance...

Speaker: Great! I heard "Healthcare".

Right... as we can see from this slide, this product fits the "Healthcare" industry great because of ...


Caro’s first LBJ biography tells of how the future president became a congressman in Texas in his 20s, by carting around a “claque” of his friends to various stump speeches and having them ask him softball questions and applauding loudly after

Well, hey, who needs friends?


and with neuralink it would generate slides of the audience naked


I guess you could have two people per presentation, one person who confirms whether to slide in the generated slide or maybe regenerate. And then of course, eventually that's just an agent


You're describing almost verbatim what we're building at Octigen [1]! Happy to provide a demo and/or give you free access to our alpha version already online.

[1] https://octigen.com


Claude Code is pretty good at making slides already. What’s your differentiator?


* ability to work with your own PowerPoint native templates; none of the AI slide makers I've seen have any competency in doing that.

* ability to integrate your corporate data.

* repeatable workflows for better control over how your decks look like.


As an associate professor who spends a ridiculous amount of time preparing for lectures, I would love to try this in one of my courses


Try Claude Code too. It’s surprisingly good at this.


I built something similar at a hackathon, a dynamic teleprompter that adjusts the speed of tele-prompting based on speaker tonality and spoken wpm. I can see extending the same to an improv mode. This is a super cool idea.


Can you show one?


The end result would be a normal PPT presentation, check https://sli.dev as an easy start, ask Codex/Claude/... to generate the slides using that framework with data from something.md. The interesting part here is generating these otherwise boring slide decks not with PowerPoint itself but with AI coding agents and a master slides, AGENTS.md context. I’ll be showing this to a small group (normally members only) at IPAI in Heilbronn, Germany on 03/03. If you’re in the area and would like to join, feel free to send me a message I will squeeze you in.


How do you handle the diagrams?


In my AGENTS.md file i have a _rule_ that tells the model to use Apache ECharts, the data comes from the prompt and normally .csv/.json files. Prompt would be like: "After slide 3 add a new content slide that shows a bar chart with data from @data/somefile.csv" ... works great and these charts can be even interactive.


What about other ad hoc diagrams like systems architecture, roadmaps, mind maps, etc.

These are the bane of any staff engineers life - lol. Because people above need to know a plan in art form.

So seriously interested on how I can make it easier


You could try something like mermaid (or ASCII) -> nano banana. You can also go the other way and turn images into embedded diagrams (which can be interactive depending on how you're sharing the presentation)


Not my normal use-case, but you can always fall back and ask the AI coding agent to generate the diagram as SVG, for blocky but more complex content like your examples it will work well and still is 100% text based, so the AI coding agents or you manually can fix/adjust any issues. An image generation skill is a valid fallback, but in my opinion it's hard to change details (json style image creation prompts are possible but hard to do right) and you won't see changes nicely in the git history. In your use case you can ask the AI coding agent to run a script.js to get the newest dates for the project from a page/API, then it should only update the dates in the roadmap.svg file on slide x with the new data. This way you will automagically have the newest numbers and can track everything within git in one prompt. Save this as a rule in AGENTS.md and run this every month to update your slides with one prompt.


Claude code can output Excalidraw format files which can be imported directly into the webapp. You can MCP it too if you want.


I have it do svgs.

The layout isn't always great on first shot, but you iterate on that.

They can also natively generate e.g. github markdown mermaid diagrams (github markdown has a lot of extensions like that)


I love the idea of a living slide deck. This feels like a product that needs to exist!


Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?

I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.


> Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?

I offered to do it, but without a deadline (I work f/time for money), only a cost estimation based on how many hours I think it should take me: https://news.ycombinator.com/item?id=46909310

The poster I responded to had claimed that it was not possible to produce a compiler capable of compiling a bootable Linux kernel within the $20k cost, nor for double that ($40k).

I offered to do it for $40k, but no takers. I initially offered to do it for $20k, but the poster kept evading, so I settled on asking for the amount he offered.



https://www.spacex.com/updates#xai-joins-spacex additionally the longer article on SpaceX site


This will actually work well with my current workflow: dictation for prompts, parallel execution, and working on multiple bigger and smaller projects so waiting times while Codex is coding are fully utilized, plus easy commits with auto commit messages. Wow, thank you for this. Since skills are now first class tools, I will give it a try and see what I can accomplish with them.

I know/hope some OpenAI people are lurking in the comments and perhaps they will implement this, or at least consider it, but I would love to be able to use @ to add files via voice input as if I had typed it. So when I say "change the thingy at route slash to slash somewhere slash page dot tsx", I will get the same prompt as if I had typed it on my keyboard, including the file pill UI element shown in the input box. Same for slash commands. Voice is a great input modality, please make it a first class input. You are 90% there, this way I don't need my dictation app (Handy, highly recommended) anymore.

Also, I see myself using the built in console often to ls, cat, and rg to still follow old patterns, and I would love to pin the console to a specific side of the screen instead of having it at the bottom and pls support terminal tabs or I need to learn tmux.


So much this. I'm eagerly waiting to see what anthropic and OpenAI do to make dictation-first interaction a first class citizen instead of requiring me to use a separate app like Super Whisper. It would dramatically improve complex, flow-breaking interactions when adding files, referencing users or commands, etc.

Importantly I want full voice control over the app and interactions not just dictating prompts.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: