I think because returning two different types requires dynamic dispatch when using the returned objects, which requires Box; if the function only returns one type then that type can be determined at runtime and further function calls can be implemented with static dispatch instead.
The proxy object would have statically known size (maximum of the size of the types it dispatches between, plus some metadata such as a vtable pointer or an enum discriminant). Now, because you know the size statically, you can store it in the stack.
Tangentially, do you know why Rust never included an Either type? We have Result, which is great, but Either is useful for more things than errors (though that seems to be it's main use).