Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
swyx
13 days ago
|
parent
|
context
|
favorite
| on:
Claude Code daily benchmarks for degradation track...
chill out, ofir does not work for anthropic. he's just saying there's inherent variability in LLMs and you need to at least 30x the samples that OP is doing in order to make any form of statistically significant conclusions.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: