Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Keep in mind that this is a comparison of base models, not chat tuned models, since Falcon-11B does not have a chat tuned model at this time. The chat tuning that Meta did seems better than the chat tuning on Gemma.

Regardless, the Gemma 1.1 chat models have been fairly good in my experience, even if I think the Llama3 8B chat model is definitely better.

CodeGemma 1.1 7B is especially underrated compared to my testing of other relevant coding models. The base CodeGemma 7B base model is one of the best models I’ve tested for code completion, and the chat model is one of the best models I’ve tested for writing code. Some other models seem to game the benchmarks better, but in real world use, don’t hold up as well as CodeGemma for me. I look forward to seeing how CodeLlama3 does, but it doesn't exist yet.



The model type is a good point. It's hard to track all the variables in this very fast paced field.

Thank you for sharing your CodeGemma experience. I haven't found a emacs setup I'm satisfied with, using a local llm, but it will surely happen one day. Surely.


for me, CodeGemma is super slow. I'd say 3-4 times slower than llama3. I am also looking forward to CodeLlama3 but I have a feeling Meta can't improve on llama3 it. Was there anything official from Meta?


CodeGemma has fewer parameters than Llama3, so it absolutely should not be slower. That sounds like a configuration issue.

Meta originally released Llama2 and CodeLlama, and CodeLlama vastly improved on Llama2 for coding tasks. Llama3-8B is okay at coding, but I think CodeGemma-1.1-7b-it is significantly better than Llama3-8B-Instruct, and possibly a little better than Llama3-70B-Instruct, so there is plenty of room for Meta to improve Llama3 in that regard.

> Was there anything official from Meta?

https://ai.meta.com/blog/meta-llama-3/

"The text-based models we are releasing today are the first in the Llama 3 collection of models."

Just a hint that they will be releasing more models in the same family, and CodeLlama3 seems like a given to me.


I suppose it could be quantization issue, but both are done by lmstudio-community. Llama3 does have a different architecture and bigger tokenizer which might explain it.


You should try ollama and see what happens. On the same hardware, with the same q8_0 quantization on both models, I'm seeing 77 tokens/s with Llama3-8B and 72 tokens/s with CodeGemma-7B, which is a very surprising result to me, but they are still very similar in performance.


You're right, ollama does perform the same on both models. Thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: