> I will say that the original Falcon did not perform as well as its benchmark s...

vessenes · on May 13, 2024

It's hard to know if there's any special sauce here, but the internet so far has decided "meh" on these models. I think it's an interesting choice to put it out as tech competitive. Stats say this one was trained on 5T tokens. For reference, Llama 3 so far was reported at 15T.

There is no way you get back what you lost in training by expanding parameters 3B.

If I were in charge of UAE PR and this project, I'd

a) buy a lot more H100s and get the training budget up

b) compete on a regional / messaging / national freedom angle

c) fully open license it

I guess I'm saying I'd copy Zuck's plan, with oil money instead of social money and play to my base.

Overstating capabilities doesn't give you a lot of benefit out of a local market, unfortunately.