oof-baroomf's comments

oof-baroomf · 2025-08-07T17:13:51 1754586831

74.9 SWEBench. This increases the SOTA by a whole .4%. Although the pricing is great, it doesn't seem like OpenAI found a giant breakthrough yet like o1 or Claude 3.5 Sonnet

Workaccount2 · 2025-08-07T17:46:27 1754588787

I'm pretty sure 3.5 sonnet always benchmarked poorly, despite it being the clear programming winner of it's time.

iLoveOncall · 2025-08-07T21:32:33 1754602353

That would assume there is a giant breakthrough to be found.