Definitely. But I also tried with a picture of an absurdist cartoon drawn by a family member, complete with (carefully) handwritten text, and the analysis was absolutely perfect.
A simple test - take one of your own photos, something interesting, and put in into a LLM, let it describe it in words. Then use a image generator to create the image back. It works like back-translation image->text->image. It proves how much the models really understand images and text.