Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Haven’t done Evals yet but measured on few real world situations where projects got stuck and the brainstorm mode solved it. Definitely running evals is something worth doing and contributions are welcomed

I think what really degrades the output is the context length vs context window limits, check out NoLima





https://www.arxiv.org/abs/2512.08296

> coordination yields diminishing or negative returns once single-agent baselines exceed ~45%

This is going to be the big thing to overcome, and without actually measuring it all we're doing is AI astrology.


This is why context optimization is going to be critical and thank you so much for sharing this paper as this also validates what we are trying to do. So if we managed to keep the baseline below 40% through context optimization then coordination might actually work very well and helps at scaling agentic systems.

I agree on measuring and it is planned especially once we integrate the context optimization. I think the value of context optimization will go beyond just avoiding compacting and reducing cost to providing more reliable agents.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: