Well, I don't know but many LLMs are multimodal and understand pictures and imag...

		ubercow13 4 months ago \| parent \| context \| favorite \| on: I failed to recreate the 1996 Space Jam website wi... Well, I don't know but many LLMs are multimodal and understand pictures and images. You can upload videos to Gemini and they're tokenised and fed into the LLM. If some programming blog post has a screenshot with the result of some UI code, why would that not be scraped and used for training? Is there some reason that wouldn't be possible?