Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, that's basically the point. You get 'free' continuous learning just by throwing the new data into the pool. Needing an explicit training step is a weakness that makes CL hard to make work for many other approaches.

For any practical application KNN will need some kind of accelerated search structure (eg Kd-tree for < ~7 dimensions) which then requires support for dynamic insertions. But this is an engineering problem, not a data science problem, it works and is practical. For example this has been used by the top systems in Robocode for 15+ years at this point, it's just academia that doesn't find this approach novel enough to bother pursuing.



>Needing an explicit training step is a weakness that makes CL hard to make work for many other approaches.

On the other hand, not having an explicit training step is a huge weakness of KNN.

Training-based methods scale better because the storage and runtime requirements are independent of dataset size. You can compress 100TB of training data down into a 70GB LLM.

A KNN on the same data would require keeping around the full 100TB, and it would be intractably slow.


Feature engineering is a thing, you don't need the full data source for KNN to do the search in. It is already used extensively in RAG type lookup systems, for example.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: