Benchmarks are nothing more than highly contextual specs (in traditional code). ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		SkyPuncher 70 days ago \| parent \| context \| favorite \| on: Study identifies weaknesses in how AI systems are ... Benchmarks are nothing more than highly contextual specs (in traditional code). They demonstrate your code works in a certain way in certain use cases, but they do not prove your code works as expected in all use cases.

embedding-shape 70 days ago [–]

> Program testing can be used to show the presence of bugs, but never to show their absence. Edsger W. Dijkstra

Maybe we need something similar for benchmarks, and updated for today's LLMs, like:

> LLM benchmarks can be used to show what tasks they can do, but never to show what tasks they cannot.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact