IMHO Rust got OSString wrong – it indirectly promised that UTF-8 can always be cast to it without any copying or allocations, so on Windows it has to use WTF-8 rather than UCS2/UTF-16. Instead of being OS’s preferred string type, it’s merely a trick to preserve unpaired surrogates.
I wouldn't say it's necessarily wrong; it may be (accidental) foresight. Windows has added a UTF-8 code page, which means you can get as near enough as makes no difference full support with the A functions instead of the W functions.
That said, even now in 2024, it's not clear how much of a bet Windows is making on UTF-8 versus UTF-16.
I would be surprised if the UTF-8 support in Windows was anything deeper than the A functions creating a W string and calling W functions, which is what Rust is doing already.
The question is whether that's all they ever do, or whether gradually some parts of Windows begin to decide to be UTF-8 first and their W functions now translate the opposite way. This must be tempting for some networking stuff, where you're otherwise taking a perf hit compared to Linux.
Huh. Where is the "indirect promise" ? Is there a (conventionally hinted as "free", so it's not good if there's one which isn't very cheap) as function? Like as_os_string or something?
It's because str implements AsRef<OsStr> [0]. The function signature promises that whenever you have a borrowed &'a str, the standard library can give you a borrowed &'a OsStr with the same data.
Since references can't have destructors (they don't own the data like an OsString does), it means that the standard library can't give you a newly-allocated string without leaking it. Since obviously it isn't going to do that, the &OsStr must instead just act as a view into the underlying &str. And the conversion can't enforce any extra restrictions on the input string without breaking backward compatibility.
The overall effect is that whatever format OsStr uses, it has to be a superset of UTF-8.