Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's easy to make something work when the example goes from being out of the training data to into the training data.


Definitely. But I also tried with a picture of an absurdist cartoon drawn by a family member, complete with (carefully) handwritten text, and the analysis was absolutely perfect.


A simple test - take one of your own photos, something interesting, and put in into a LLM, let it describe it in words. Then use a image generator to create the image back. It works like back-translation image->text->image. It proves how much the models really understand images and text.


I wouldn't blame a machine to fail something that a first glance looks like an optical illusion...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: