chill out, ofir does not work for anthropic. he's just saying there's inherent v...

		swyx 13 days ago \| parent \| context \| favorite \| on: Claude Code daily benchmarks for degradation track... chill out, ofir does not work for anthropic. he's just saying there's inherent variability in LLMs and you need to at least 30x the samples that OP is doing in order to make any form of statistically significant conclusions.