Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Same here, I guess it's a complex function of hardware and what you're trying to do. Personally I ended up writing my own autodiffing tensor library in C++, because all existing solutions had abysmal performance on my problem (lots of local updates in large tensors). The speedup is >50x compared to TF, pytorch, julia, jax.


> Personally I ended up writing my own autodiffing tensor library in C++

But this will not be a general library right ? You must have only included certain subset of functions of TF or PyTorch or whatever. Autodiffing is also included in certain proprietary libraries like ones from NAG. But I doubt its possible to achieve 50x speedup without compromising on functionality.


Of course it's a herculean task to write a library with that many features. But I don't think that's the issue, it's more that the devs of TF can't possibly optimize for every use case. For me, I knew what kind of ops I needed, so I could focus on getting those as fast as possible.


> lots of local updates in large tensors

It mean you have efficient "crud" operators? This interest me because I'm building a relational language and toy with the idea of tensors as the "table" structure, but get rid of that because updates...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: