Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Benchmarks are nothing more than highly contextual specs (in traditional code). They demonstrate your code works in a certain way in certain use cases, but they do not prove your code works as expected in all use cases.


> Program testing can be used to show the presence of bugs, but never to show their absence. Edsger W. Dijkstra

Maybe we need something similar for benchmarks, and updated for today's LLMs, like:

> LLM benchmarks can be used to show what tasks they can do, but never to show what tasks they cannot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: