Yeah it's hard to tell. It looks like the input is itself a 3D mesh? So the mode...

Yeah it's hard to tell.

It looks like the input is itself a 3D mesh? So the model is doing "shape completion" (e.g. they show generating a chair from just some legs)... or possibly generating "variations" when the input shape is more complete?

But I guess it's a starting point... maybe you could use another model that does worse quality text-to-mesh as the input and get something more crisp and coherent from this one.