Hacker Newsnew | past | comments | ask | show | jobs | submit | oof-baroomf's commentslogin

74.9 SWEBench. This increases the SOTA by a whole .4%. Although the pricing is great, it doesn't seem like OpenAI found a giant breakthrough yet like o1 or Claude 3.5 Sonnet


I'm pretty sure 3.5 sonnet always benchmarked poorly, despite it being the clear programming winner of it's time.


That would assume there is a giant breakthrough to be found.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: