> I will say that the original Falcon did not perform as well as its benchmark stats indicated
Yea I saw that as well. I believe it was undertrained in terms of parameters vs tokens because they really just wanted to have a 40bn parameter model (like pre chinchilla optimal)
It's hard to know if there's any special sauce here, but the internet so far has decided "meh" on these models. I think it's an interesting choice to put it out as tech competitive. Stats say this one was trained on 5T tokens. For reference, Llama 3 so far was reported at 15T.
There is no way you get back what you lost in training by expanding parameters 3B.
If I were in charge of UAE PR and this project, I'd
a) buy a lot more H100s and get the training budget up
b) compete on a regional / messaging / national freedom angle
c) fully open license it
I guess I'm saying I'd copy Zuck's plan, with oil money instead of social money and play to my base.
Overstating capabilities doesn't give you a lot of benefit out of a local market, unfortunately.
Yea I saw that as well. I believe it was undertrained in terms of parameters vs tokens because they really just wanted to have a 40bn parameter model (like pre chinchilla optimal)