> They find the probability of every word that could come next.
If we're being pedantic, they find a* probability for every token (which are sometimes words) that could come next.
What actually ends up being chosen depends on what the rest of the system does, but generally it will just choose the most probable token before continuing.
* Saying the probability would be giving a bit too much credit. And really calling it a probability at all when most systems would be choosing the same word every time is a bit of a misnomer as well. During inference the number generally is priority, not probability.
If we're being pedantic, they find a* probability for every token (which are sometimes words) that could come next.
What actually ends up being chosen depends on what the rest of the system does, but generally it will just choose the most probable token before continuing.
* Saying the probability would be giving a bit too much credit. And really calling it a probability at all when most systems would be choosing the same word every time is a bit of a misnomer as well. During inference the number generally is priority, not probability.