It looks like the input is itself a 3D mesh? So the model is doing "shape completion" (e.g. they show generating a chair from just some legs)... or possibly generating "variations" when the input shape is more complete?
But I guess it's a starting point... maybe you could use another model that does worse quality text-to-mesh as the input and get something more crisp and coherent from this one.
It looks like the input is itself a 3D mesh? So the model is doing "shape completion" (e.g. they show generating a chair from just some legs)... or possibly generating "variations" when the input shape is more complete?
But I guess it's a starting point... maybe you could use another model that does worse quality text-to-mesh as the input and get something more crisp and coherent from this one.