Made some dynamic GGUFs for those interested! https://huggingface.co/unsloth/gra...

CMay · 2025-10-03T02:58:33 1759460313

Thanks! Any idea why I'm getting such poor performance on these new models? Whether Small or Tiny, on my 24GB 7900XTX I'm seeing like 8 tokens/s using the latest llama.cpp with vulkan. Even if it was running 4x faster than this I would be asking why I'm getting so few tokens/s when it sounds like the models are supposed to bring increased inference efficiency.

danielhanchen · 2025-10-04T11:59:54 1759579194

Oh I think its a Vulcan backend issue - someone raised it with me and said the rocm backend is much faster