Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I will say that the original Falcon did not perform as well as its benchmark stats indicated

Yea I saw that as well. I believe it was undertrained in terms of parameters vs tokens because they really just wanted to have a 40bn parameter model (like pre chinchilla optimal)



It's hard to know if there's any special sauce here, but the internet so far has decided "meh" on these models. I think it's an interesting choice to put it out as tech competitive. Stats say this one was trained on 5T tokens. For reference, Llama 3 so far was reported at 15T.

There is no way you get back what you lost in training by expanding parameters 3B.

If I were in charge of UAE PR and this project, I'd

a) buy a lot more H100s and get the training budget up

b) compete on a regional / messaging / national freedom angle

c) fully open license it

I guess I'm saying I'd copy Zuck's plan, with oil money instead of social money and play to my base.

Overstating capabilities doesn't give you a lot of benefit out of a local market, unfortunately.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: