Same for me. I fed it a few requirements and test objectives and its comments were pretty reasonable. With a little specialized training it will probably do better than most systems engineers or testers I know.
Okay so it generated a response which was “reasonable”
How do you know it was correct? Because you checked it’s entire output manually and determined it probably wasn’t too wrong?
So what happens if you now trust it to write firmware for some difficult old timey hardware that nobody understands anymore. It seems correct. But then it actually was just making it up and the coolant system of the power plant breaks and kills 20,000 people.
By trying to run it usually. It is sometimes wrong, and I amend things. But I’ve had more occasions where I thought I was right and it was wrong and after a long debugging I realized I had failed to grok some edge in the language and it was indeed correct and I learned something new.
But I would suggest not using a LLM to make nuclear reactor control system code, just like Java.