If I understand it correctly, that's a valid concern but the way structured generation library like outlines[1] work is that they can generate multiple variants of the inference (which they call beam search).
One beam could be "This is a way to solv-". With no obvious "good" next token.
Another beam could be "This way is solv-". With "ing" as the obvious next token.
One beam could be "This is a way to solv-". With no obvious "good" next token. Another beam could be "This way is solv-". With "ing" as the obvious next token.
It will select the best beam for the output.
[1]:https://github.com/dottxt-ai/outlines