It would be interesting to hear more about the compute used to build this.

mmoskal · on May 13, 2024

From model card:

Falcon2-11B was trained on 1024 A100 40GB GPUs for the majority of the training, using a 3D parallelism strategy (TP=8, PP=1, DP=128) combined with ZeRO and Flash-Attention 2.

Doesn't say how long though.

kouteiheika · on May 13, 2024

It does say how long on Huggingface:

> The model training took roughly two months.