Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think most of the SOTA models could probably handle this but you'd probably get better results using a pipeline:

1. Reduce article to a synopsis using an LLM

2. Generate 4-5 varying description prompts from the synopsis

3. Feed the prompts to an imagegen model

Though I'd wager that gpt-image-1 (in the ChatGPT) being multimodal could probably managed it as well.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: