I'm studying Japanese at the moment and what struck me is how important context is, particularly in reading. You need to know where to read 1-3 letters ahead to read a word and interpret it. That's not really a thing in English - a word is a word, and the individual letters that it's composed of are almost always pronounced the same way.
I think digital is a big crutch for Japanese/Chinese because you have input methods that help you write what you want to say, so you don't actually need to remember how to write kanji as much in daily life.
> You need to know where to read 1-3 letters ahead to read a word and interpret it. That's not really a thing in English
It happens in a English too, where you see a chunk of letters and mis-predict which word they represent in a way which affects its meaning [0], and sometimes that will also affect pronunciation. [1]
An example from the link:
> "The complex houses married and single soldiers and their families."
A reader linearly scanning along doesn't know whether "complex" is an adjective or a noun, and then whether "houses" is a noun or a verb. I'm pretty sure all human languages have similar problems where a certain amount of look-ahead or backtracking is necessary.
For another example to highlight pronunciation changes, consider the ambiguity of:
"I saw the rhino live in the zoo."
That could mean that the rhino was doing the verb of living, in which it rhymes with "give", or it could also mean that the speaker was seeing it in-person, in which case it rhymes with "drive".
Both rely on intonation (in addition to volume and pauses) for disambiguation, but the fun trick is that in the Chinese version the intonation is an integral part of the lexeme (i.e. it distinguishes between "words").
But I have to say, these kind of sentences (and full-fledged poems) are quite a different beast from simple cases of garden path sentences or syntactic ambiguity[1]. The poem lion-eating poet and the "buffalo buffalo buffalo..." sentence are both highly contrived and unlikely to be understood correctly on the first few goes even with the perfect prosody. They are cool "language hacks", but they do not occur in daily language and I personally believe (although I guess die-hard generative linguists would disagree) that they don't teach us very much about the language itself (except for what are the cool artistic possibilities it opens).
When this happens in English, teachers will label this as "bad English" and ask you to rewrite. That's how the formal language deals with this problem.
If anything, isn't that an informal solution? It relies on other people to complain that they dislike the sentence, without being able to point to any hard-and-fast rule.
The hard and fast rule is that repeating a word right next to itself is generally frowned up. It comes up with “that” a lot, like “he said that, that led to something else”. Sometimes people are doing something clever with the words, but it’s usually just poor English.
Yes, this happens in English too, but to find examples like this you have to go to Wikipedia, or wrack your brain and see if you remember one. In Japanese, almost every other word is like this.
As is usual for Japanese, this sentence contains a mix of Chinese(-origin) ("kanji", e.g. 袋 小 路 文 法 的) as well as Japanese phonetic ("kana", e.g. ふくろこうじぶん) characters. Usually, when in a multi-kanji word, kanji are pronounced with (a time-changed version of) Chinese pronunciation. For example, 文法 is "bun-pou", not "fumi-nori" or something else. However, the first character of the article title (fukurokoubunji), 袋, is "fukuro" here despite being in a four-kanji word. Further, 小 is "kou" here, which is nonstandard enough that its dictionary entry does not even list it as a possible pronunciation! [1] Then 路文 are both in Chinese pronunciation (ji-bun), but this does not necessarily make sense because the word is not split in two down the middle, but instead as 袋-小路-文 (bag-lane-sentence, where bag-lane is English cul-de-sac / blind alley). [2]
Now fukurokoubunji is a bit of a specialised word, so it might not be a great example. But in the rest of the sentence, we find 文, which is always pronounced "bun" (sentence) here, even when appearing separately, but could also (though more rarely) have been "fumi" (letter) — nothing but semantical context helps distinguish. Then we have 正しい "tada-shi-i", where 正 could have been "sei" as in 正確 "sei-kaku" (accurate) or "shou" as in 正直 "shou-jiki" (honest), but it isn't just because しい come after. Similarly, 生 in 生じやすい is "shou"(-ji-ya-su-i), which is conjugated from the base form 生じる "shou-ji-ru" and could have been "u" (生まれる "u-ma-re-ru") or "sei" (先生 "sen-sei") or "i" (生きる "i-ki-ru") or more (生 is somewhat infamous for having many readings). And I could go on: 書 could be "syo" (文書 "bun-syo") but is "ka" (書き出して "ka-ki-da-shi-te" conjugated from 書く "ka-ku").
This is a bit like the comments elsewhere here noting that the Chinese word for "sneeze" is a bad example because it happens to have so uncommon characters in it — and then people point to examples like "onomatopoeia" and "diarrhoea" as similar tricky examples in English. I can't comment on Chinese, but existence does not necessarily say much about frequency.
[2]: This analysis of 袋小路文 is not completely etymologically honest. By the etymology ( https://en.wiktionary.org/wiki/%E5%B0%8F%E8%B7%AF#Etymology_... ), we see that the "kouji" pronunciation of 小路 is really a corruption of ancient "ko-michi", which is a consistent Japanese-Japanese reading of the two characters. However, because "ji" is also an (uncommon) Chinese reading of 路, if you don't know the etymology of the word, the re-analysis is appropriate in the context of how hard it is to read the written language.
> However, because "ji" is also an (uncommon) Chinese reading of 路,
It's not a Chinese reading at all (as you can tell because it's ... wildly out of place with the the actual Chinese-derived readings ろ・る, onyomi are supposed to have semi-regular correspondences with each other and with Chinese Chinese readings). It's really just rendaku of ち, the basic root of fossilized compound みち (with still-salient prefix "honorific" み).
But most importantly, you never really see either 袋 or 小路 and expect them to have any other readings; maybe you'd expect しょうろ if you don't know the latter, but unless you're already literate in a Chinese or are blindly memorizing kanji tables, the other reading of 袋 (たい) probably isn't even salient, because it's one of those kanji that almost always takes its kunyomi even in compounds.
Side note, the line about u-onbin kind of buries the implication that this is a loanword from western Japanese, which is the culprit of several quasi-systematic but unevenly distributed divergences from regular sound changes.
Maybe because I've seen a similar example used before, but I immediately read it correctly the first time. Honestly these sort of 'problems' only ever seem to occur when specifically created to demonstrate this problem and almost never happen in regular writing.
And yet, given the definition and language of origin, most high-level spelling bee participants can make a pretty good guess at spelling a word they may have never seen before.
English is phonetic, it just borrows its pronunciation rules from many differing (and sometimes directly opposed) other languages.
Very true - and every demonstration of “English is hard to spell/pronounce” focuses directly on the exceptions which exaggerates the problem. One analysis I’ve seen puts it that with a single set of rules, 59% of a sample corpus of 5000 English words can be pronounced perfectly from the spelling (of course, there will be regional accent and dialect differences so that percentage will be a bit different for each one) and up to 85% can be pretty close with only slight errors.
Then there’s a percentage where they’re just direct borrowings from other languages and you need to have an idea of how that language pronounces words (especially French), so really only 10-15% or so of English words end up being true exceptions.
And you still only get 59% of the way to the correct pronunciation.
As a non native speaker of English, and a native speaker of a phonetic language, I strongly object to the notion that it's easy to guess English word pronunciation by just reading it.
And that's another reason why there are so many English speakers who don't know how to read properly. It is so much harder to read compared to more sensible languages line German (and many others).
Those numbers are very bad, given that proper phonemic orthographies can give you a 90+% confidence with far fewer rules.
There's a simple and consistent way to compare languages in this way too, too: train a neural net to map spelling to pronunciation on one half of the dictionary, then test it on the other half. The more complicated and less consistent the orthography is, the more mistakes it'll make. People have in fact done this exact experiment, and English scores extremely poorly in it; for spelling, closer to Chinese, in fact, than many other European languages: https://aclanthology.org/2021.sigtyp-1.1/
Huge difference is: English is pretty much THE language that you can butcher and still have people perfectly understand (and hopefully politely correct) you. Even other European (stay mad) languages don't hold up to just how flexible English is in this regard.
Well yes, that's (I believe) the reason English actually works as an international language, despite being horrible in so many respects (pronunciation, tons of exceptions, etc etc): It also has so much redundancy that even if you get all the grammar wrong the meaning is still there. "I is strongs". When someone knows a tiny bit of English it's often easier to communicate in English than in that person's language, even if you're studying said language. Unfortunately, kind of, but that's how it is.
Yeah exactly. "Me arms big power" would make me go "Oh yeah you do have mighty biceps my dude".
And to the latter point I got that all the time in Japan, but I think main reasons are: they wanna practice, but even more they wanna practice with a native English speaker bc it's a novel experience for em!
Oh hurrah, I think that link is what I've been looking for for nearly a decade. I ran across it, or something like it, a long time ago and could never find it again. I don't remember all the special syntax, I think the one I found was written more in plain English with more examples (and I don't think the one I found back then mentioned ghoti either), but can't be sure it's been so long - maybe it was just that page and I don't remember it. It does have around the same number of rules I remember though.
This is satire, right? 56 rules to get 59% correct pronunciation on a corpus of 5000 words? And these rules don't even include the base sounds - it doesn't tell you how to actually pronounce "m", or "e". So in fact there are more than 70 rules required to get to a base pronunciation (you need to add at least one rule for each letter).
>"ough" has at least 9 different possible pronunciations, how is that phonetic?
Does a language stop being phonetic when you have to include other information provided by the rest of the word? I'm not a linguist by any means, but "ough" being pronounced a couple different ways depending how it's used doesn't seem like it'd preclude the language from being considered phonetic in general.
9 is not a couple, unless you're in a very open relationship - which English words might be - but a language stops being phonetic at the point that the mappings between symbols and sounds are no longer clear and reliable. The most phonetic languages have one-to-one mappings with very few exceptions e.g. Japanese, Spanish, Italian, Finnish.
English, on the other hand, has silent letters, inconsistent mappings even within the same word, exceptions, irregularities, and sounds that are represented by multiple letters and spellings.
English is not a phonetic language except in the sense that it does have mappings between sounds and characters, which would make sense if one were to compare it to a wholly written language like Python, but not any human language.
Fruit flies like a banana. English has its own ambiguity, so it isn’t really that different.
I can only write Chinese via an IME these days. For one, I’m left handed so writing characters was always a struggle since stroke order worked against me, but it’s mostly how I only use Chinese anyways.
I told my wife our kid should learn to write via an IME as well and she was just horrified about that, though. None of the teaching material really supports it.
I've been (very) casually learning Japanese for a couple years, and almost every time I think I find something "weird" that Japanese does, I almost immediately think of a very similar example in English.
The alphabet is a pretty awesome invention (alphabet > kana-style syllabary > kanji-style logography) but English writing is at least as complex as JP writing, just in different dimensions.
JP's phonetics, for example, are dead simple compared to English's, but they do a good job making up for it by having a few thousand Kanji.
I'm not a native English speaker, so I don't really know why, or if, there's a problem for native English speakers to learn or "get" pitch accent. For speakers of many other European languages Japanese pitch accent is not tricky. You listen, and then you speak. Just as you would listen to English, and repeat it the same way.
Japanese, despite being extremely logical and so beautiful in so many ways, is still hard to learn for me, and of course learning the writing system is not done in the blink of an eye (unlike the Latin-based writing system we use), but pitch accent isn't really the problem here.
Is that any more complicated than English stress, though? And regardless, Japanese has a very small number of phonemes (compared to English) and extremely restricted phonotactics.
Yeah, but I don't expect this to be substantively harder than learning most regional accents (could be wrong), and afaik it's also not critical for legibility.
In English you have to know a word in order to pronounce it.
The “ou” diphthong in “hound” and “double” or “would” is pronounced differently. Or “ieu” in “lieutenant” vs “lieu”.
Or “oo” in “poor” vs “root”
Or “berry” in “berry” vs “strawberry”
I could go on forever. There’s no other western language I know of that behaves like that.
English is a quasi-phonetic language in that most words can be mostly pronounced how they're written, but in some cases it inherits the pronunciation of the language the word came from. I'd imagine many English speakers would consider this an undesirable quirk, though.
Indeed, there has been a tendency over the centuries, particularly in the US, to move towards writing words how they sound or pronouncing words how they're written. Lieutenant is an interesting example, since in the UK we pronounce that "lef-tenant" traditionally, but the US moved to the (IMO superior) "lieu-tenant". Nowadays, most young people would probably use the US pronunciation.
I do take some slight umbrage with the implication that some people seem to be making in this thread that language features can't be criticised or that one language can't be better than another. I'm don't see why this would necessarily be true. Even with spoken languages. There are a ton of annoying aspects to English that simply aren't issues in other languages, and I think it's fair to criticise other languages for their failings too. This is especially true of writing systems, which are human inventions rather than something we learn intuitively.
Logographic/logo-syllabic orthographies are harder to learn and remain proficient at than alphabets/abjads, for native speakers and second language learners alike. Alphabets are an innovation that improved on ancient orthographies and enabled a wider range of people to be able to communicate as easily by writing as they do by speaking. Besides the issue mentioned in the article, the writing systems in China/Japan are associated with other issues we rarely see here. Even dictionaries are a non-obvious challenge with logographic languages, which has resulted in several competing ways to sort words.
I don't think one can reasonably claim that in English "words are mostly pronounced how they're written". I mean, "i" can stand for /i/, /ɪ/, or /aɪ/, for example (and also for /ə/ if you don't count "ir" as a distinct grapheme). Although vowels at least (mostly) follow some predictable patterns based on syllables - but e.g. it's impossible to say whether "ch" stands for /k/, /tʃ/, /ʃ/, or /x/ without knowing the etymology of the word.
French can be pretty bad. Not as bad as English for reading, but it's much worse for writing because there are so many spelling options for the same thing.
Posted up above, here's a collection of English pronunciation rules that English speakers have internalized so well they can't generally explain them: https://www.zompist.com/spell.html
"Ghoti" is mentioned a few times there, but basically "fish" is a nonsensical pronunciation that breaks several rules. There's a reason (well, a few reasons) why if you ask English speakers how to pronounce "ghoti" and they've never seen it before, they'll probably all guess some variation of "go-tee" or "go-tie".
That's such a dumb example because it claims to follow english rules for those letters while ignoring the actual rules. It makes a somewhat humorous joke, but people pretending that it means anything linguistically are either ignorant or intentionally trying to confuse people.
Not so much in terms of meaning but in terms of pronunciation, sometimes you also need to read ahead in English to know how a certain word is pronounced. For example:
"I read a book yesterday."
and
"I read a book every night."
Depending on the context that follows, "read" is pronounced differently. The same thing happens for "present" and "record". Admittedly, these are exceptions to the rule.
I think digital is a big crutch for Japanese/Chinese because you have input methods that help you write what you want to say, so you don't actually need to remember how to write kanji as much in daily life.