Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That said… I feel that Rust’s use of WTF-8 for OsString on Windows has resulted in some really nasty problems, especially since OsString doesn’t expose any useful methods for string manipulation. As far as I can tell, Rust’s approach fails to hide any of the complexity, and then adds the additional complexity of a new encoding and conversions on top. I can see that there’s some end goal of being able to work with OsString in Rust code but at the moment the API is missing everything except a couple functions to convert it into something else.

It’s a truly cursed problem that we have three separate notions of strings. We have Unicode strings, we have bytestrings, and we have wchar_t strings on Windows. No two of these are completely interoperable. This has a ton of direct consequences which cannot be completely avoided. For example, if I want to make a version of “ls” that gives a result in JSON, I’m already fucked and I have to change my requirements.



Isn't the point of OsString that it's essentially a faux-union type that is intended to be immediately converted to some concrete representation which /is/ richly manipulable?


The problem is that this is not good enough. It’s not uncommon to need to do a small amount of manipulation of OsString and there is no good way to do it.

In C++, it’s fairly easy. In Rust, it’s a damn nightmare.

In theory, in Rust, since OsString is basically Vec<u8> on the inside (like String), you could implement e.g. Path::has_extension in the same way as str::ends_with. However, anyone who has gone in and tried to implement this for OsString or Path has apparently gotten buried in the complexity and given up.


Is that not the point? That you ought to map out that complexity in a type whose constraints must be satisfied in order to have a valid instance?

If your string is invalid to start with and you need to correct it, then yes, you need to wrangle that complexity yourself. If you need some tools from another toolset - eg. String functions that can help you make a valid Path - then you will make multiple type conversion hops to arrive at your destination. But trying to use String methods on something that may not be a valid string is no solution to the original problem, and would merely be hoping you could get away with the assumption.


The point is that should be part of the standard library, like wstring in C++.


https://doc.rust-lang.org/std/ffi/struct.OsString.html

I'm puzzled what you think is missing. What should be part of the stdlib that isn't currently?


Basic stuff like starts_with() is missing. You cannot slice an OsString into parts or iterate over its components. Almost everything you want to do with a string is missing.

If you are curious for yourself, try to write an argument parser that will parse something like "--output=<path>" and store the path as an OsString, and make it work on both Linux and Windows. The OsString abstraction breaks, and you have to write platform-specific code or use "unsafe", even though internally OsString is just a Vec<u8> and you should be able to strip off the "--output" as it is encoded the same on both platforms.

E.g., fill in the blank:

    /// Split an arg "--<name>=<value>" into (<name>, <value>).
    fn parse_arg(arg: &OsStr) -> Option<(&str, &OsStr)> {
        // What goes here?
    }
This is trivial with &str.


Why not transfer it over to a full String instead? How many other libraries and core functions expect an OsString? Why rely on an abstraction that's intentionally been given minimal functionality?

Of course `starts_with` is missing: you haven't resolved what underlying type the value actually is yet, and you'd be trying to compare apples and oranges for all you know! Move the OsString to a concrete type and you'll have all that functionality and more. The only time that will fail you is if you don't have a valid string to begin with, under which case `starts_with` should fail, correct?

Everything about OsString makes it a type you convert to and from, but it's not intended to be one you work /in/, since that would make require you to make assumptions about which platform you are running on. You really want to manipulate it? Go to String and back, and pay the cost. This should also encourage you to use OsString as little as possible, at the edges.


I have felt this pain for sure, but only really once. This is because, in languages with strings as paths I tend to use string manipulation to do operations on paths, but given that virtually all of my OsString usage is paths, which have specific manipulation functions already it’s lesser.

This is also why the interface isn’t so rich, there just hasn’t been a lot of demand. That said I think some things are in the pipeline?


The most common use for string manipulation on OsString is parsing CLI args like --output=/some/path. See eg. clap/#1524.


Do you have any more info about what's in the pipeline?



I thought that someone had recently sent in a PR adding some convenience functions, but I can't find them now, I must be imagining things :(


Rust still doesn't get this right. If I'm calling an NFS library, say, on Windows I need to use UNIX paths. Rust needs WindowsString and UnixString on every platform, with OsString as a synonym for whichever is most useful locally.


In that case... You wouldn't be using the rust file system libraries though, right?

It seems like the simplest definition of an OsString is "the type used to interact with the OS file system API as implemented in rust".


Rust has a policy of keeping the standard library minimal and this is completely reasonable. But sometimes they overdo it. In this case it's nuts that I need to implement my own UnixString because the standard library doesn't expose it, and when I run on Linux I have two incompatible versions of the same thing.

Another example: I wrote a command line app which takes a hostname/ip address plus an optional port number after a colon. And the whole thing's async using tokio. The way the hostname/IP address parsing is structured in tokio and the standard library meant I had to reimplement all of it to add the port number. This all feels like more effort than it should be.


Doesn't address the OsString complexity- but the author has another post on strings in Rust that you might be interested in [1]. It at least addresses how Rust does hide a lot of regular `String` complexity that `C` doesn't.

[1] https://fasterthanli.me/blog/2020/working-with-strings-in-ru...


Not a Rust guy, but I did have to look up WTF-8. Had to chuckle at the 'wobbly transformation format' as I was translating it something else. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: