Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Funnily enough, I can still understand your message without white space, it is just harder to parse. Doesn't that undercut your statement?


Being harder to parse is not a trivial detail, it's the whole ballgame. Written language has a lot of redundancy. That's one of the things that makes it work. The redundancy makes it an error-correcting code so that the information doesn't get dstryed by miner errers. That is what allows you to glean the meaning of a message even without whitespace. Y cn d th sm trck by lmntng vwls (u o eiiai ooa).

Just because you don't completely destroy the message by eliminating whitespace doesn't mean that the whitespace wasn't semantically significant.

[UPDATE] Whitespace is actually a pretty modern innovation, and it is not universal. In old manuscripts the text is often allframmedtogetherlikethis. Also, even in modern German it is common to compose verylongcompoundwords.


See also: segmentation in many Asian languages.

Back around 2006, there was an internal collection of quotes from senior Google engineers. There was one from a guy working on a particularly thorny issue with Russian search. He had just finished a project on Thai segmentation, and his response to how things were going with Russian was something like "Great! At least they have words!" (Obviously, he was being deliberately imprecise and understood that obviously Thai has words, just not whitespace-delimited.)

On a side note, the Korean alphabet is well-designed. In particular, teaching children syllabification rules is trivial. Syllables are pre-arranged into squares. Unfortunately, the lack of distinction between r-l, and p-b-f, and restrictions on consonant clusters makes it unworkable as a replacement alphabet for English.


BTW ancient Egyptians didn't use vowels (hieroglyphs describe consonants)


That's true, but ancient Egyptian hieroglyphics are only phonetic when writing names.

A better example would be modern Hebrew. It has vowels, but they are an optional annotation and often omitted as a matter of course.


ivegottogivemyupmostrespecttotherapists

Doesn't that undercut yours?


The real trick would be to come up with a string that is actually ambiguous without whitespace. That turns out to be surprisingly challenging.


That was ambiguous because I had to really double-guess whether it was "therapists" or "the rapists"....


Ah, I totally missed that. (I guess I had a mental block against having utmost respect for rapists.)


“Sean Connery” also misread it on a SNL Celebrity Jeopardy spoof.


https://www.youtube.com/watch?v=hElOag-1a0k

illtaketherapistsfortwohundred :-)


Isit?


Nicely done!


I have to say, that is not a great example. Your earlier point still holds good that there is enough redundancy in English to distinguish these things.

Here "Isit?" means "Is it?" and that is obvious from the context.


Yes, that's true, but it's still ambiguous out of context, and the fact that it is also a direct response to what I said adds some unexpected humor so I give it bonus points for that.

It inspired me to come up with:

Fromwhereisitgoodideascome?


The therapist - the rapist ambiguity already does that no?


Yes. I just missed it. I guess I have a mental block against having utmost respect for rapists.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: