I spent the last few weeks exploring whether AI systems could benefit from generating video predictions before making decisions—like how humans mentally simulate "what happens if I pour this coffee?" before acting.
The idea: Show an AI an image, ask "what happens if I push this?", have it generate a video prediction, then compare that prediction to reality. If the prediction looks wrong, maybe the AI could catch its own mistakes.
The result: Current models can't do this. But I learned some interesting things along the way.
What I tested:
- 7 different architectures for predicting future video frames from VLM latent space
- Whether perceptual similarity (LPIPS) between predicted and actual video correlates with correctness
- Self-correction loops where the model gets feedback on its predictions
Key findings:
1. VLMs can't predict the future – Every architecture I tried performed worse than just copying the current frame as the "prediction." The model understands what's in an image but can't predict what will change.
2. Visual similarity ≠ semantic correctness – This one surprised me. Wrong predictions often looked MORE similar to reality than correct ones (LPIPS correlation: 0.106). You can't use "does it look right?" to catch mistakes.
3. Some things worked – Hybrid encoders (DINOv2 + VLM) preserve spatial information that VLMs lose. VLMs understand generated video well (93% semantic retention). Small adapters (10M params) work better than large ones (100M).
I'm releasing this as a benchmark proposal. Video generation is improving fast—capabilities that don't exist today might emerge in future models. Seems worth tracking.
Links:
- Demo video: https://youtu.be/YJxDt_zCrUI
- Code + paper: https://github.com/a1j9o94/foresight
- Live demo: https://foresight-demo-kappa.vercel.app
Built with Qwen2.5-VL, LTX-Video, Modal (GPUs), and the Something-Something v2 dataset.
Happy to answer questions about the experiments or methodology.
I know I'm an outlier on HN, but I really don't care if AI was used to write something I'm reading. I just care whether or not the ideas are good and clear. And if we're talking about work output 99% of what people were putting out before AI wasn't particularly good. And in my genuine experience AI's output is better than things people I worked with would spend hours and days on.
I feel like more time is wasted trying to catch your coworkers using AI vs just engaging with the plan. If it's a bad plan say that and make sure your coworker is held accountable for presenting a bad plan. But it shouldn't matter if he gave 5 bullets to Chat gpt that expanded it to a full page with a detailed plan.
>But it shouldn't matter if he gave 5 bullets to Chat gpt that expanded it to a full page with a detailed plan.
The coworker should just give me the five bullet points they put into ChatGPT. I can trivially dump it into ChatGPT or any other LLM myself to turn it into a "plan."
I feel the same way, if all one is doing is feeding stuff into AI without doing any actual work themselves, just include your prompt and workflow into how you got AI to spit this content out, it might be useful for others to learn how to use these LLMs and shows train of thought.
I had a coworker schedule a meeting to discuss a technical design of an upcoming feature, I didn't have much time so I only checked the research doc moments before the meeting, it was 26 pages long with over 70 references, of which about 30+ were reddit links. This wasn't a huge architectural decision so I was dumbfounded, seemed he barely edited the document to his own preferences, the actual meeting was maybe my most awkward meeting I've ever attended as we were expected to weigh in on the options presented but no one had opinions, not even the author, on the whole thing. It was just too much of an AI document to even process.
If ChatGPT can make a good plan for you from 5 bullet points, why was there a ticket for making a plan in the first place? If it makes a bad plan then the coworker submitted a bad plan and there's already avenues for when coworkers do bad work.
How do you know the coworker didn't bully the LLM for 20 minutes to get the desired output? It isn't often trivial to one-shot a task unless it's very basic and you don't care about details.
Asking for the prompt is also far more hostile than your coworker providing LLM-assisted word docs.
Honestly if you have a working relationship/communication norms where that's expected, I agree just send the 5 bullets.
In most of my work contexts, people want more formal documents with clean headings titles, detailed risks even if it's the same risks we've put on every project.
Agreed! I've reached the conclusion that a lot of people have completely misunderstood why we work.
It's all about the utility provided. That's the only thing that matters in the end.
Some people seem to think work is an exchange of suffering for money, and omg some colleagues are not suffering as much as they're supposed to!
The plan(or any other document) has to be judged on its own merits. Always. It doesn't matter how it was written. It really doesn't.
Does that mean AI usage can never be problematic? Of course not! If a colleague feeds their tasks to a LLM and never does anything to verify quality, and frequently submits poor quality documents for colleagues to verify and correct, that's obviously bad. But think about it: a colleague who submits poor quality work is problematic regardless of if they wrote it themselves or if they had an AI do it.
A good document is a good document. And a bad one is a bad one. Doesn't matter if it was written using vim, Emacs or Gemini 3
Ever since some non-native-English-speaking people within my company started using LLMs, I've found it much easier to interact and communicate with them in Jira tickets. The LLM conveys what they intend to say more clearly and comprehensively. It's obviously an LLM that's writing but I'm overall more productive and satisfied by talking to the LLM.
If it's fiction writing or otherwise an attempt at somewhat artful prose, having an LLM write for you isn't cool (both due to stolen valor and the lame, trite style all current LLMs output), but for relatively low-stakes white collar job tasks I think it's often fine or even an upgrade. Definitely not always, and even when it's "fine" the slopstyle can be grating, but overall it's not that bad. As the LLMs get smarter it'll be less and less of an issue.
> I just care whether or not the ideas are good and clear
That's the thing. It actually really matters whether the ideas presented are coming from a coworker, or the ideas are coming from LLM.
I've seem way too many scenarios where I'm asking a coworker, if we should do X or Y, and all I get is a useless wall of spewed text, with a complete disregard to the project and circumstances on hand. I need YOUR input, from YOUR head right now. If I could ask Copilot I'd do that myself, thanks.
I would argue that's just your coworker giving you a bad answer. If you prompt a chatbot with the right business context, look at what it spits out, and layer in your judgement before you hit send, then it's fine if the AI typed it out.
If they answer your question with irrelevant context, then that's the problem, not that it was AI
Not the person you're responding to, but I think there's a non trivial argument to make that our thoughts are just auto complete. What is the next most likely word based on what you're seeing. Ever watched a movie and guessed the plot? Or read a comment and know where it was going to go by the end?
And I know not everyone thinks in a literal stream of words all the time (I do) but I would argue that those people's brains are just using a different "token"
There's no evidence for it, nor any explanation for why it should be the case from a biological perspective. Tokens are an artifact of computer science that have no reason to exist inside humans. Human minds don't need a discrete dictionary of reality in order to model it.
Prior to LLMs, there was never any suggestion that thoughts work like autocomplete, but now people are working backwards from that conclusion based on metaphorical parallels.
There actually was quite a lot of suggestion that thoughts work like autocomplete. A lot of it was just considered niche, e.g. because the mathematical formalisms were beyond what most psychologist or even cognitive scientists would deem usefull.
Predictive coding theory was formalized back around 2010 and traces it roots up to theories by Helmholtz from 1860.
Predictive coding theory postulates that our brains are just very strong prediction machines, with multiple layers of predictive machinery, each predicting the next.
There are so many theories regarding human cognition that you can certainly find something that is close to "autocomplete". A Hopfield network, for example.
Roots of predictive coding theory extend back to 1860s.
Natalia Bekhtereva was writing about compact concept representations in the brain akin to tokens.
> There are so many theories regarding human cognition that you can certainly find something that is close to "autocomplete"
Yes, you can draw interesting parallels between anything when you're motivated to do so. My point is that this isn't parsimonious reasoning, it's working backwards from a conclusion and searching for every opportunity to fit the available evidence into a narrative that supports it.
> Roots of predictive coding theory extend back to 1860s.
This is just another example of metaphorical parallels overstating meaningful connections. Just because next-token-prediction and predictive coding have the word "predict" in common doesn't mean the two are at all related in any practical sense.
You, and OP, are taking an analogy way too far. Yes, humans have the mental capability to predict words similar to autocomplete, but obviously this is just one out of a myriad of mental capabilities typical humans have, which work regardless of text. You can predict where a ball will go if you throw it, you can reason about gravity, and so much more. It’s not just apples to oranges, not even apples to boats, it’s apples to intersubjective realities.
I feel the link between humans and autocomplete is deeper than that an ability to predict.
Think about an average dinner party conversation. Person A talks, person B thinks about something to say that fits, person C gets an association from what A and B said and speaks...
And what are people most interested in talking about? Things they read or watched during the week perhaps?
Conversations would not have had to be like this. Imagine a species from another planet who had a "conversation" where each party simply communicated what it most needed to say/was most benefitial to say and said it. And where the chance of bringing up a topic had no correlation at all with what previous person said (why should it?) or with what was in the newspapers that week. And who had no "interest" in the association game.
Humans saying they are not driven by associations is to me a bit like fish saying they are not noticing the water. At least MY thought processes works like that.
I don't think I am. To be honest, as ideas goes and I swirl it around that empty head of mine, this one ain't half bad given how much immediate resistance it generates.
Other posters already noted other reasons for it, but I will note that you are saying 'similar to autocomplete, but obviously' suggesting you recognize the shape and immediately dismissing it as not the same, because the shape you know in humans is much more evolved and co do more things. Ngl man, as arguments go, it sounds to me like supercharged autocomplete that was allowed to develop over a number of years.
Fair enough. To someone with a background in biology, it sounds like an argument made by a software engineer with no actual knowledge of cognition, psychology, biology, or any related field, jumping to misled conclusions driven only by shallow insights and their own experience in computer science.
Or in other words, this thread sure attracts a lot of armchair experts.
> with no actual knowledge of cognition, psychology, biology
... but we also need to be careful with that assertion, because humans do not understand cognition, psychology, or biology very well.
Biology is the furthest developed, but it turns out to be like physics -- superficially and usefully modelable, but fundamental mysteries remain. We have no idea how complete our models are, but they work pretty well in our standard context.
If computer engineering is downstream from physics, and cognition is downstream from biology ... well, I just don't know how certain we can be about much of anything.
> this thread sure attracts a lot of armchair experts.
"So we beat on, boats against the current, borne back ceaselessly into our priors..."
Look up predictive coding theory. According to that theory, what our brain does is in fact just autocomplete.
However, what it is doing is layered autocomplete on itself. I.e. one part is trying to predict what the other part will be producing and training itself on this kind of prediction.
What emerges from this layered level of autocompletes is what we call thought.
Having one tool that you can use to do all of these things makes a big difference. If I'm a financial analyst at a company I don't need to know how to implement and use 5 different specialized ML models, I can just ask one tool (that can still use tools on the backend to complete the task efficiently)
I‘m sorry but this may come across as condescending, but if you are a financial analysis, isn’t doing statistics a part of your job. And doesn’t your expertise involve knowing which kinds of statistical analysis are available to tackle a given problem? It just seems weird to me that you would opt to not use your expertise and instead use a generalized model which is both more expensive and has poorer results as traditional models.
I am not doubting 95% acceptance rate all. I've pure vibecoded many toy projects myself.
> in line with what they would have written,
point i am making is that they didn't know what they would've written. they had a rough overall idea but details were being accepted on the fly. They were trying out bunch of things and see what looks good based on a rough idea of what output should be.
In a real world project you are not both product owner and coder.
To be clear I did not have a 95% acceptance rate. I'm saying that in the final published repo, 95% of the lines of code were written by AI, not by me. I discarded and refactored code along the way many times, but I did that by also using the AI. My end goal was to keep my hands off the code as much as possible and get better at describing exactly what I wanted from the AI.
Hypothetically, if you codified the architecture as a form of durable meta tests, you might be able to significantly raise the ceiling.
Decomposing to interfaces seems to actually increase architectural entropy instead of decrease it when Claude Code is acting on a code base over a certain size/complexity.
I agree with this completely. I get the impression that a lot of people here think of software development as a craft, which is great for your own learning and development but not relevant from the company's perspective. It just has to work good enough.
Your point about management being vibe coding is spot on. I have hired people to build something and just had to hope that they built it the way I wanted. I honestly feel like AI is better than most of the outsourced code work I do.
One last piece, if anyone does have trouble getting value out of AI tools, I would encourage you to talk to/guide them like you would a junior team member. Actually "discuss" what you're trying to accomplish, lay out a plan, build your tests, and only then start working on the output. Most examples I see of people trying to get AI to do things fail because of poor communication.
> I get the impression that a lot of people here think of software development as a craft, which is great for your own learning and development but not relevant from the company's perspective. It just has to work good enough.
Building the thing may be the primary objective, but you will eventually have to rework what you've built (dependency changes, requirement changes,...). All the craft is for that day, and whatever that goes against that is called technical debt.
You just need to make some tradeoffs between getting the thing out the faster possible and being able to alter it later. It's a spectrum, but instead of discussing it with the engineers, most executive suites (and their manager) wants to give out edicts from high.
> Building the thing may be the primary objective, but you will eventually have to rework what you've built (dependency changes, requirement changes,...). All the craft is for that day, and whatever that goes against that is called technical debt.
This is so good I just wanted to quote it so it showed up in this thread twice. Very well said.