Personally I've found it faster to build using mongo cause you don't need to worry about schemas. You get 32mb per document and you can work out your downstream processing later, e.g. cleanup and serve to postgres, file, wherever. This data is a big data dump that's feeding ML models so relational stuff is not that important.
I used to build personal projects like this, but after Postgres got JSONB support I haven't found any reason to not just start with Postgres. There's usually a couple of tables/columns you want a proper schema for, and having it all in Postgres to begin with makes it much easier to migrate the schemaless JSONB blobs later on.