Asked for a solution of a photographed Ubongo puzzle: https://gemini.google.com/...

dktp · 2025-12-29T13:21:30 1767014490

Deep research, from my experience, will always add lectures.

I'm trying to create a comprehensive list of English standup specials. Seems like a good fit! I've tried numerous times to prompt it "provide a comprehensive list of English standup specials released between 2000 and 2005. The output needs to be a csv of verified specials with the author, release date and special name. I do not want any other lecture or anything else. Providing anything except the csv is considered a failure". Then it creates it's own plan and I go further clarifying to explicitly make sure I don't want lectures...

It goes on to hallucinate a bunch of specials and provide a lecture on "2000 the era of X on standup comedy" (for each year)

I've tried this in 2.5 and 3. Numerous time ranges and prompts. Same result. It gets the famous specials right (usually), hallucinates some info on less famous ones (or makes them up completely) and misses anything more obscure

geon · 2025-12-29T13:39:44 1767015584

I tried asking for a list of the most common gameboy color games not compatible with the original dmg gameboy. Chatgpt would over and over list dmg compatible games instead. I asked it to cross reference lists of dmg games to remove them and it ”reasoned” for a long time before it showed what sources it used for cross references, and then gave me the same list again.

It also insisted on including ”Shantae” in the list, which is expensive specifically because it is uncommon. I eventually forbid it from including the game in the list, and that actually worked, but it would continue mentioning it outside the list.

Absolute garbage.

tessierashpool9 · 2025-12-29T13:31:45 1767015105

I mean, isn't that a little ridiculous? Aren't those language models already solving complicated exam questions and mathematical problems?

geon · 2025-12-29T13:42:26 1767015746

According to the creators, the models are on a phd level of intelligence, but they can’t get the simplest thing right.

tessierashpool9 · 2025-12-29T13:50:19 1767016219

Overselling is only the tip of the iceberg. The real problem is that a lot of managers base their decision to introduce language models into business processes on cutting edge Pro edition demos, but what is, of course, actually used in production is some cheap Nano/Flash/Mini version.

DANmode · 2025-12-29T17:04:50 1767027890

Too easy.

Workaccount2 · 2025-12-29T15:20:09 1767021609

LLM's are bad at anything with images.

There is something fucky about tokenizing images that just isn't as clean as tokenizing text. It's clear that the problem isn't the model being too dumb, but rather that model is not able to actually "see" the image presented. It feels like a lower-performance model looks at the image, and then writes a text description of it for the "solver" model to work with.

To put it another way, the models can solve very high level text-based problems while struggling to solve even low level image problems - even if underneath both problems use a similar or even identical solving frameworks. If you have a choice between showing a model a graph or feeding it a list of (x,y) coordinates, go with the coordinates every time.