Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is a random matrix theory derived diagnostic of training that relies on the spectral density of the correlation matrix of the weights. Each layer's spectral density is fit to a truncated power law, and deemed properly trained if the power law exponent alpha is just above two.

https://jmlr.org/beta/papers/v22/20-410.html



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: