Keep in mind that this is a comparison of base models, not chat tuned models, si...

Hugsun · on May 13, 2024

The model type is a good point. It's hard to track all the variables in this very fast paced field.

Thank you for sharing your CodeGemma experience. I haven't found a emacs setup I'm satisfied with, using a local llm, but it will surely happen one day. Surely.

attentive · on May 13, 2024

for me, CodeGemma is super slow. I'd say 3-4 times slower than llama3. I am also looking forward to CodeLlama3 but I have a feeling Meta can't improve on llama3 it. Was there anything official from Meta?

coder543 · on May 13, 2024

CodeGemma has fewer parameters than Llama3, so it absolutely should not be slower. That sounds like a configuration issue.

Meta originally released Llama2 and CodeLlama, and CodeLlama vastly improved on Llama2 for coding tasks. Llama3-8B is okay at coding, but I think CodeGemma-1.1-7b-it is significantly better than Llama3-8B-Instruct, and possibly a little better than Llama3-70B-Instruct, so there is plenty of room for Meta to improve Llama3 in that regard.

> Was there anything official from Meta?

https://ai.meta.com/blog/meta-llama-3/

"The text-based models we are releasing today are the first in the Llama 3 collection of models."

Just a hint that they will be releasing more models in the same family, and CodeLlama3 seems like a given to me.

attentive · on May 14, 2024

I suppose it could be quantization issue, but both are done by lmstudio-community. Llama3 does have a different architecture and bigger tokenizer which might explain it.

coder543 · on May 14, 2024

You should try ollama and see what happens. On the same hardware, with the same q8_0 quantization on both models, I'm seeing 77 tokens/s with Llama3-8B and 72 tokens/s with CodeGemma-7B, which is a very surprising result to me, but they are still very similar in performance.

attentive · on May 14, 2024

You're right, ollama does perform the same on both models. Thanks.