I welcome open models, although the Falcon model is not super open, as noted here. I will say that the original Falcon did not perform as well as its benchmark stats indicated -- it was pushed out as a significant leap forward, and I didn't find it outperformed competitive open models at release.
The PR stating an 11B model outperforms 7B and 8B models 'in the same class' feels like it might be stretching a bit. We'll see -- I'll definitely give this a go for local inference. But, my gut is that finetuned llama 3 8B is probably best in class...this week.
> I will say that the original Falcon did not perform as well as its benchmark stats indicated
Yea I saw that as well. I believe it was undertrained in terms of parameters vs tokens because they really just wanted to have a 40bn parameter model (like pre chinchilla optimal)
It's hard to know if there's any special sauce here, but the internet so far has decided "meh" on these models. I think it's an interesting choice to put it out as tech competitive. Stats say this one was trained on 5T tokens. For reference, Llama 3 so far was reported at 15T.
There is no way you get back what you lost in training by expanding parameters 3B.
If I were in charge of UAE PR and this project, I'd
a) buy a lot more H100s and get the training budget up
b) compete on a regional / messaging / national freedom angle
c) fully open license it
I guess I'm saying I'd copy Zuck's plan, with oil money instead of social money and play to my base.
Overstating capabilities doesn't give you a lot of benefit out of a local market, unfortunately.
The PR stating an 11B model outperforms 7B and 8B models 'in the same class' feels like it might be stretching a bit. We'll see -- I'll definitely give this a go for local inference. But, my gut is that finetuned llama 3 8B is probably best in class...this week.