Hacker Newsnew | past | comments | ask | show | jobs | submit | pbmango's commentslogin

I imagine a huge proportion of their users are under 30. The prompt examples included even use the tell tale all lowercase (though apparently sama types like this too).

This is probably less pandering to genz and more speaking their users language.


This is very interesting. I don't see much discussion of interpretability in day to the day discourse of AI builders. I wonder if everyone assumes it to either be solved, or to be too out of reach to bother stopping and thinking about.


Mostly out of reach. There is a ton of research on figuring out how to do this coming out every day, including both proposals of new ways to do things and (often strong) critiques of old or recently proposed ways of doing things. Interpretability (esp. for large, modern models) is very, very far from being a solved problem.


Most interpretability techniques haven't yet to be shown to be useful for everyday model pipelines. However, the field is working hard to change this.


Along these same lines, I have been trying to become better at knowing when my work could benefit from reversion to the "boring" and general mean and when outsourcing thought or planning would cause a reversion to the mean (downwards).

This echos the comments here about enjoying not writing boilerplate. The there is that our minds are programmed to offload work when we can and redirecting all the saved boilerplate to going even deeper on parts of the problem that benefit from original hard thinking is rare. It is much easier to get sucked into creating more boilerplate, and all the gamification of Claude code and incentives of service providers increase this.


As the founder of another product in this space - this is super impressive and well built. Great demo video and congrats on top of HN! Getting this smooth UX and data behind the scenes is not easy.



It is also possible that this "world view tuning" may have just been the manifestation of how these models gained public attention. Whether intentional or not, seeing the Tiananmen Square reposts across all social feeds may have done more to spread awareness of these models technical merits than the technical merits themselves would have. This is certainly true for how consumers learned about free Deepseek and fit perfectly into how new AI releases are turned into high click through social media posts.


I'm curious if there's any data to come to that conclusion, its hard for me to do "They did the censor training to DeepSeek because they knew consumers would love free DeepSeek after seeing screenshots of Tiananmen censorship in screenshots of DeepSeek"

(the steelman here, ofc, is "the screenshots drove buzz which drove usage!", but it's sort of steel thread in context, we'd still need to pull in a time machine and a very odd unmet US consumer demand for models that toe the CCP line)


> Whether intentional or not

I am not claiming it was intentional, but it certainly magnified the media attention. Maybe luck and not 4d chess.


I think an under appreciated reality is that all of the large AI labs and OpenAI in particular are fighting multiple market battles at once. This is coming across in both the number of products and the packaging.

1, to win consumer growth they have continued to benefit on hyper viral moments, lately that was was image generation in 4o, which likely was technically possible a long time before launched. 2, for enterprise workloads and large API use, they seem to have focused less lately but the pricing of 4.1 is clearly an answer to Gemini which has been winning on ultra high volume and consistency. 3, for full frontier benchmarks they pushed out 4.5 to stay SOTA and attract the best researchers. 4, on top of all they they had to, and did, quickly answer the reasoning promise and DeepSeek threat with faster and cheaper o models.

They are still winning many of these battles but history highlights how hard multi front warfare is, at least for teams of humans.


On that note, I want to see benchmarks for which LLM's are best at translating between languages. To me, it's an entire product category.


There are probably many more small battles being fought or emerging. I think voice and PDF parsing are growing battles too.


I would love to see a stackexchange-like site where humans ask questions and we get to vote on the reply by various LLMs.


is this like what you're thinking of? https://lmarena.ai


Kind of. But lmarena.ai has no way to see results to questions people asked and it only lets you look at two responses side by side.


I agree. 4.1 seems to be a release that addresses shortcomings of 4o in coding compared to Claude 3.7 and Gemini 2.0 and 2.5


Growing up in Buffalo New York, I only once as a kid saw one flying while on a camping trip in a remote state park. Now, you see one almost every day on the coastline of lake Erie. They are so much bigger than other birds that you will notice even if you are not on the lookout. Their scale is astounding compared to sea gulls.

They have also come back to the Potomac and Washington DC which is nice.


Great demo! Going to try this with a huge folder of call recordings that I would love to search across. Does it work with Google Docs as well as PDF?


yep - Google Doc and MS Word files exported from Drive API as text and PDF files' text extracted with PDF.js


Congrats! Big fan of Fig and excited to see the UX brought to aws CLIs


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: