Keep in mind that this is a comparison of base models, not chat tuned models, since Falcon-11B does not have a chat tuned model at this time. The chat tuning that Meta did seems better than the chat tuning on Gemma.
Regardless, the Gemma 1.1 chat models have been fairly good in my experience, even if I think the Llama3 8B chat model is definitely better.
CodeGemma 1.1 7B is especially underrated compared to my testing of other relevant coding models. The base CodeGemma 7B base model is one of the best models I’ve tested for code completion, and the chat model is one of the best models I’ve tested for writing code. Some other models seem to game the benchmarks better, but in real world use, don’t hold up as well as CodeGemma for me. I look forward to seeing how CodeLlama3 does, but it doesn't exist yet.
The model type is a good point. It's hard to track all the variables in this very fast paced field.
Thank you for sharing your CodeGemma experience. I haven't found a emacs setup I'm satisfied with, using a local llm, but it will surely happen one day. Surely.
for me, CodeGemma is super slow. I'd say 3-4 times slower than llama3.
I am also looking forward to CodeLlama3 but I have a feeling Meta can't improve on llama3 it. Was there anything official from Meta?
CodeGemma has fewer parameters than Llama3, so it absolutely should not be slower. That sounds like a configuration issue.
Meta originally released Llama2 and CodeLlama, and CodeLlama vastly improved on Llama2 for coding tasks. Llama3-8B is okay at coding, but I think CodeGemma-1.1-7b-it is significantly better than Llama3-8B-Instruct, and possibly a little better than Llama3-70B-Instruct, so there is plenty of room for Meta to improve Llama3 in that regard.
I suppose it could be quantization issue, but both are done by
lmstudio-community. Llama3 does have a different architecture and bigger tokenizer which might explain it.
You should try ollama and see what happens. On the same hardware, with the same q8_0 quantization on both models, I'm seeing 77 tokens/s with Llama3-8B and 72 tokens/s with CodeGemma-7B, which is a very surprising result to me, but they are still very similar in performance.
Regardless, the Gemma 1.1 chat models have been fairly good in my experience, even if I think the Llama3 8B chat model is definitely better.
CodeGemma 1.1 7B is especially underrated compared to my testing of other relevant coding models. The base CodeGemma 7B base model is one of the best models I’ve tested for code completion, and the chat model is one of the best models I’ve tested for writing code. Some other models seem to game the benchmarks better, but in real world use, don’t hold up as well as CodeGemma for me. I look forward to seeing how CodeLlama3 does, but it doesn't exist yet.