OP here: one thing that surprised me in this experiment was that the model train...

		gpjt 4 months ago \| parent \| context \| favorite \| on: LLM from scratch, part 28 – training a base model ... OP here: one thing that surprised me in this experiment was that the model trained on the more curated FineWeb-Edu dataset was worse than the one trained on FineWeb. That is very counterintuitive to me.