I spent a few hours trying to get this to work, and I couldn’t get it to produce usable results on anything except the training data, even with very simple drawings.
I noticed in the GitHub that they mention it is only around 60% reliable even on their own training data, but the image shown on the front page feels pretty misleading. I made 10 images that were very similar in complexity to the examples shown, and even after running it around 50 times on each image, not a single one worked correctly. In the rare cases where it produced something, the output was completely wrong.
This seems pretty misleading in its current state and definitely needs more work.
When skimming a research paper in addition to abstract don't forget to read "limitations and future work" and "conclusions".
In the limitation section they are quite direct with it: "images used .. are mostly isometric and noise-free CAD-images" and "limited CAD vocabulary used in this study needs to be extended by including more sophisticated CAD tokens such as revolve operation, edge operation (e.g., fillets/chamfers)". It currently supports only series of pads (can be subtractive). Many simple beginner exercises of seemingly similar complexity don't fit those constraints. Some of the parts shown could be more naturally created using wider set of tools, but technically can be created using only pads.
So even if you tried with clean screenshot of simple 3d model, it will likely fail if camera settings aren't right, and it will fail if model can't be represented using series of pad operations. Anything containing spheres, cones, nearly all lathe parts, fillets which can't be included in 2d sketches will fail. In theory arbitrary extrusion angles are supported, but all examples showed only axis aligned parts.
That said I wouldn't be surprised if it failed even if you considered exact limitations not just similar complexity in your attempts.
Assuming images in the googledrive are the training data, "mostly isometric and noise-free CAD images" is a bit of understatement. All of them are at very specific angles using and single style. Specific solid gray infill color for "images", and white infill with weird shading around perimeter for "sketches". Both with white background and pixelated non antialised 1px lines. No reason to expect it capable of processing anything which doesn't have exactly same visual style. For practical product that would have to be solved but that wasn't really the point of this research. All the style transference papers have show that it's more or less solvable problem, but for a paper exploring what model architecture and 3d representation could work best for AI cad it seems like unnecessary distraction that would only bloat the training costs and time. Most annoying part is that it makes hard to test the model with your own inputs.
I’m also confused by their examples. All of them seem to be perfect renders/exports from 3D models — this is not the use case where I see it the most useful. Making a parametrized CAD model out of a hand-drawn sketch — yes, please
I noticed in the GitHub that they mention it is only around 60% reliable even on their own training data, but the image shown on the front page feels pretty misleading. I made 10 images that were very similar in complexity to the examples shown, and even after running it around 50 times on each image, not a single one worked correctly. In the rare cases where it produced something, the output was completely wrong.
This seems pretty misleading in its current state and definitely needs more work.