Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Made some dynamic GGUFs for those interested! https://huggingface.co/unsloth/granite-4.0-h-small-GGUF (32B Mamba Hybrid + MoE)


Thanks! Any idea why I'm getting such poor performance on these new models? Whether Small or Tiny, on my 24GB 7900XTX I'm seeing like 8 tokens/s using the latest llama.cpp with vulkan. Even if it was running 4x faster than this I would be asking why I'm getting so few tokens/s when it sounds like the models are supposed to bring increased inference efficiency.


Oh I think its a Vulcan backend issue - someone raised it with me and said the rocm backend is much faster




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: