Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> imagine handing a Unicode string to a human. They could without any knowledge look at the characters they see and produce the correct string reversal.

I really highly doubt it.

How do you reverse this?: مرحبًا ، هذه سلسلة.

Can you do it without any knowledge about whether what looks like one character is actually a special case joiner between two adjacent codepoints that only happens in one direction? Can you do it without knowing that this string appears wrongly in the HN textbbox due to an apparent RTL issue?

It's just not well-defined to reverse a string, and the reason we say it's not meaningful is that no User Story ever starts "as a visitor to this website I want to be able to see this string in opposite order, no not just that all the bytes are reversed, but you know what I mean."



You can even demonstrate a similar concept with English and Latin characters. There is no single thing called a "grapheme" linguistically. There are actually two different types of graphemes. The character sequence "sh" in English is a single referential grapheme but two analogical graphemes. Depending on what the specification means, "short" could be reversed as either "trosh" or "trohs". That's without getting into transliteration. The word for Cherokee in the Cherokee language is "Tsalagi" but the "ts" is a Latin transliteration of a single Cherokee character. Should we count that as one grapheme or two?

Of course, if an interviewer is really asking you how to do this, they're probably either 1) working in bioinformatics, in which case there are exactly four ASCII characters they really care about and the problem is well-defined, or 2) it's implementing something like rev | cut -d '-' -f1 | rev to get rid of the last field and it doesn't matter how you implement "rev" just so long as it works exactly the same in reverse and you can always recover the original string.


The fact that how to reverse a piece of text is locale dependent doesn't mean it's impossible. Basically and transformation on text will be locale dependent. Hell, length is locale dependent.


>what looks like one character is actually a special case joiner between two adjacent codepoints

Are you referring to a grouping not covered by the definition of grapheme clusters (which I am only passingly familiar with)? If so, then I don't think it's any more non-meaningful to reverse it than to reverse an English string. The result is gibberish to humans either way - it sounds more like you're saying that there is no universally "meaningful to humans" way to reverse some text in potentially any language, which is true regardless of what encoding or written language you're using. I was thinking of it more from the programmer side - i.e. that Unicode provides ways to reverse strings that are more "meaningful" (as opposed to arbitrary) than e.g. just reversing code points.


I mean no but only because I don’t understand the characters. Someone who reads Arabic (I assume based on the shape) would have no trouble. You’re nitpicking cases where for some readers visual characters might be hard to distinguish but it doesn’t change the fact that there exists a correct answer for every piece of text that will be obvious to readers of that text which is the definition of a grapheme cluster.


> the fact that there exists a correct answer for every piece of text that will be obvious to readers of that text which is the definition of a grapheme cluster.

No, I insist there is not a single "correct answer," even if a reader has perfect knowledge of the language(s) involved. Now remember, this is already moving the goalposts, since it was claimed that a human needed "no knowledge" to get to this allegedly "correct answer."

You already admit that people who don't speak Arabic will have trouble finding the "grapheme clusters," but even two people who speak Arabic may do your clustering or not, depending on some implicit feeling of "the right way to do it" vs taking the question literally and pasting the smallest highlight-able selection of the string in reverse at a time.

Anyway, take a string like this: "here is some Arabic text: <RLM> <Arabic codepoints> <LRM> And back to English"

Whether you discard the ordering mark[0], keep them, or inverse them is an implementation decision that already produces three completely different strings. Unless we want to write a rulebook for the right way to reverse a string, it remains an impossibility to declare anything the correct answer, and because there is no reason to reverse such a string outside of contrived interview questions and ivory tower debates, it is also meaningless.

[0]: https://en.m.wikipedia.org/wiki/Right-to-left_mark https://en.m.wikipedia.org/wiki/Left-to-right_mark


You added the requirement that it be a single correct answer. I just asserted that there existed a correct answer. You're being woefully pedantic -- a human who can read the text presented to them but no knowledge of unicode was my intended meaning. Grapheme clusters are language dependent and chosen for readers of languages that use the characters involved. There's no implicit feeling, this is what the standards body has decided is the "right way to do it." If you want to use different grapheme clusters because you think the Unicode people are wrong then fine, use those. You can still reverse the string.

Like what are you even arguing? You declared that something was impossible and then ended with that it's not only possible but it's so possible that there are many reasonable correct answers. Pick one and call it a day.


> Like what are you even arguing?

It is impossible to "correctly reverse a string" because "reverse a string" is not well defined. We explored many different potential definitions of it, to show that there is no meaningful singular answer.

> You added the requirement that it be a single correct answer.

Your original post says "they could produce the correct string reversal"?


Is a RTL character string already "reversed" from a LTR POV?

Is an absolute value signed as positive?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: