It's nice to see some high-performance linear algebra code done in a modern lanugage! Would love to see more!
Is your approach specific to the case where the matrix fits inside cache, but the memory footprint of the basis causes performance issues? Most of the communication-avoiding Krylov works I've seen, e.g [0,1] seem to assume that if the matrix fits, so will its basis, and so end up doing some partitioning row-wise for the 'large matrix' case; I'm curious what your application is.
You might be interested in ExponentialUtilities.jl then. Julia has a really unique ability to make high performance linear algebra look like the math. See https://github.com/SciML/ExponentialUtilities.jl (specifically src/kiops.jl and src/krylov_phiv.jl) for an example of a good matrix exponential operator in ~600 lines of code+comments.
When did you look and what tooling was missing? Julia's package manager Pkg was pretty heavily inspired by cargo, and IMO it does a very good job. Also in the past 2-3 years Juliaup (modeled after rustup) has become the primary way of installing and managing Julia versions
Is your approach specific to the case where the matrix fits inside cache, but the memory footprint of the basis causes performance issues? Most of the communication-avoiding Krylov works I've seen, e.g [0,1] seem to assume that if the matrix fits, so will its basis, and so end up doing some partitioning row-wise for the 'large matrix' case; I'm curious what your application is.
[0] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-..., e.g. page 25. [1] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-...