Thanks! Any idea why I'm getting such poor performance on these new models? Whether Small or Tiny, on my 24GB 7900XTX I'm seeing like 8 tokens/s using the latest llama.cpp with vulkan. Even if it was running 4x faster than this I would be asking why I'm getting so few tokens/s when it sounds like the models are supposed to bring increased inference efficiency.